Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Invalid VCF output? #15

Closed
aeonsim opened this Issue Aug 9, 2011 · 4 comments

Comments

Projects
None yet
3 participants

aeonsim commented Aug 9, 2011

Hi
I've been trying out freebayes (date: 2011-07-20, version: 0.8.7, on ubuntu 10.04.2 LTS) and have been having issues using the resulting VCF files with IGV, due to error messages about being unable to parse header.

Running vcftools (current release) vcf-validator generates a large number of errors and warnings, see below for examples:

Is this a known problem with freebayes, and if so is there a workaround or a recommended stable commit/version to use? As looking at the data in the resulting VCF files the calls look good it's just the formatting of the VCF output that appears to be causing the issues.

Thanks


Commands used:
freebayes --fasta-reference Btau/umd31MT.fa 17144784_mated.bam 99591_mated.bam 10841297_mated.bam --vcf output.vcf

Also used the indel example from the readme piping output through stout, same errors.

vcf-validator output.vcf
INFO field at Chr1:30 .. INFO tag [PAIREDR] not listed in the header
column 10841297 at Chr1:571 .. FORMAT tag [GL] expected different number of values (expected 3, found 4)
column 17144784 at Chr1:571 .. FORMAT tag [GL] expected different number of values (expected 3, found 4)
column 99591 at Chr1:571 .. FORMAT tag [GL] expected different number of values (expected 3, found 4)
...
INFO field at Chr1:15366 .. INFO tag [RUN=1,3] expected different number of values (1),INFO tag [technology.ILLUMINA=1,1] expected different number of values (1),INFO tag [PAIRED=1,1] expected different number of values (1)
Chr1:15366 .. AN is 6,6, should be 6
...

Owner

ekg commented Aug 17, 2011

Are you sure that the vcf validator can handle poly-allelic "G"enotype
numbered fields? With indels and MNPs we see situations in which there are
more than 2 alleles at a given position (e.g. reference and 2
non-reference).

Otherwise these are recent errors I've introduced, thanks for the bug
report. The next commit will have fixes.

What is the error message produced by IGV? Also, do the error messages
you've listed cover the full range of errors reported by vcf-validator?

On Mon, Aug 8, 2011 at 9:09 PM, aeonsim <
reply@reply.github.com>wrote:

Hi
I've been trying out freebayes (date: 2011-07-20, version: 0.8.7, on ubuntu
10.04.2 LTS) and have been having issues using the resulting VCF files with
IGV, due to error messages about being unable to parse header.

Running vcftools (current release) vcf-validator generates a large number
of errors and warnings, see below for examples:

Is this a known problem with freebayes, and if so is there a workaround or
a recommended stable commit/version to use? As looking at the data in the
resulting VCF files the calls look good it's just the formatting of the VCF
output that appears to be causing the issues.

Thanks


Commands used:
freebayes --fasta-reference Btau/umd31MT.fa 17144784_mated.bam
99591_mated.bam 10841297_mated.bam --vcf output.vcf

Also used the indel example from the readme piping output through stout,
same errors.

vcf-validator output.vcf
INFO field at Chr1:30 .. INFO tag [PAIREDR] not listed in the header
column 10841297 at Chr1:571 .. FORMAT tag [GL] expected different number of
values (expected 3, found 4)
column 17144784 at Chr1:571 .. FORMAT tag [GL] expected different number of
values (expected 3, found 4)
column 99591 at Chr1:571 .. FORMAT tag [GL] expected different number of
values (expected 3, found 4)
...
INFO field at Chr1:15366 .. INFO tag [RUN=1,3] expected different number of
values (1),INFO tag [technology.ILLUMINA=1,1] expected different number of
values (1),INFO tag [PAIRED=1,1] expected different number of values (1)
Chr1:15366 .. AN is 6,6, should be 6
...

Reply to this email directly or view it on GitHub:
#15

aeonsim commented Jan 8, 2012

Hi Erik

Sorry it took me so long to get back to you on this one.

Right so vcf-validator and IGV are still having issues with the VCF output of freebayes (Using the commit that fixed issue 26).

The output from the validator shows 2 error types and a warning

vcf-validator -u freebayes-pop-ChrMT.vcf.gz
The header tag 'contig' not present for CHROM=ChrMT. (Not required but highly recommended.)
ChrMT:13 .. Could not parse the allele(s) [ACTTCTCCTTA], first base does not match the reference
column 17108826 at ChrMT:215 .. Could not validate the float [nan]


Summary:
119 errors total

    109     ..      column 17108826 at ChrMT:215 .. Could not validate the float [nan]
    9       ..      ChrMT:13 .. Could not parse the allele(s) [ACTTCTCCTTA], first base does not match the reference
    1       ..      The header tag 'contig' not present for CHROM=ChrMT. (Not required but highly recommended.)

Looking at the VCF file it appears in some places freebayes is placing an 'nan' instead of a numeric value probably '0' in this case, where the first animal has GQ=50000, while the second has GQ=nan (the highly different DP's are expected it's female blood vs male semen samples and is a region of ChrMT).
GT:GQ:DP:RO:QR:AO:QA:GL 0/0:50000:9592:9554:351885:14:463:-113.732,-2907.53,-31717.1 0/0:nan:15:14:531:1:34:-3.4,-3.33936,-48.1693

This 'nan' is what is causing the issue with IGV as well, if I use sed to replace the 'nan' with 0 IGV is happy to load the resulting VCF file.

The second set of errors about the first base not matching the reference I'm not sure about, but it doesn't cause any issues with IGV so is less troublesome from a visualization pov, but is potentially a bigger issue.

If needed I can send you a subset of the VCF, and BAMS.

Same errors when running on a different Chromosome:

vcf-validator -u freebayes-pop-Chr19.vcf.gz
The header tag 'contig' not present for CHROM=Chr19. (Not required but highly recommended.)
column 10841297 at Chr19:50 .. Could not validate the float [nan]
Chr19:397 .. Could not parse the allele(s) [G], first base does not match the reference
less -S freebayes-pop-Chr19.vcf.gz


Summary:
105851 errors total

    65881   ..      column 10841297 at Chr19:50 .. Could not validate the float [nan]
    39969   ..      Chr19:397 .. Could not parse the allele(s) [G], first base does not match the reference
    1       ..      The header tag 'contig' not present for CHROM=Chr19. (Not required but highly recommended.)

Freebayes was run with these options:

freebayes --region Chr19 -j --min-coverage 10 --left-align-indels -v freebayes-pop-Chr19.vcf -f ...

Thanks
Chad

I have hit this GQ=50000 and GQ=nan issue with my data as well. Does anyone know what is going on here?

Owner

ekg commented Aug 7, 2014

The GQ=0, nan, or 50000 should be resolved as of the current HEAD.

@ekg ekg closed this Aug 7, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment