Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent INFO and FORMAT lines IDX numbers between call and view subcommands #208

Closed
wm75 opened this issue Feb 9, 2015 · 0 comments

Comments

@wm75
Copy link

wm75 commented Feb 9, 2015

Hi,
a few days ago I switched from samtools/bcftools 1.0 to 1.2 .
One immediate problem I ran into was that 'bcftools concat' stopped working for mixed bcf input files, some generated with 'bcftools call', some with 'bcftools view'. With stopped working I mean that 'concat' actually does produce output without complaining, but when trying to 'view' the resulting file 'bctools view' invariably seg faults on the first record in the file.

After quite some debugging attempts, my only explanation for the observed behaviour is that the 'call' command seems to use different IDX numbers for certain INFO and FORMAT lines in the bcf header.
Here's a element-by-element comparison (I had to trim all lines to prevent github from ignoring '<>' flanked content) for the relevant header lines from a file generated with 'bcftools call' and one obtained by taking that same file through a bcf -> vcf -> bcf cycle with 'bcftools view'. Note how several IDXs are changing.

The INFO records:
ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.",IDX=1
ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.",IDX=1
ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel",IDX=2
ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel",IDX=2
ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel",IDX=3
ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel",IDX=3
ID=DP,Number=1,Type=Integer,Description="Raw read depth",IDX=4
ID=DP,Number=1,Type=Integer,Description="Raw read depth",IDX=4
ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3",IDX=5
ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3",IDX=5
ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)",IDX=6
ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)",IDX=6
ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)",IDX=7
ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)",IDX=7
ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)",IDX=8
ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)",IDX=8
ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)",IDX=9
ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)",IDX=9
ID=SGB,Number=1,Type=Float,Description="Segregation based metric.",IDX=10
ID=SGB,Number=1,Type=Float,Description="Segregation based metric.",IDX=10
ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)",IDX=11
ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)",IDX=11
ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)",IDX=18
ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)",IDX=16
ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)",IDX=19
ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)",IDX=17
ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed",IDX=20
ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed",IDX=18
ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=21
ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=19
ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases",IDX=22
ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases",IDX=20
ID=MQ,Number=1,Type=Integer,Description="Average mapping quality",IDX=23
ID=MQ,Number=1,Type=Integer,Description="Average mapping quality",IDX=21

and the FORMAT entries:
ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods",IDX=14
ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods",IDX=12
ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases",IDX=4
ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases",IDX=4
ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele",IDX=15
ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele",IDX=13
ID=GT,Number=1,Type=String,Description="Genotype",IDX=16
ID=GT,Number=1,Type=String,Description="Genotype",IDX=14
ID=GQ,Number=1,Type=Integer,Description="Phred-scaled Genotype Quality",IDX=17
ID=GQ,Number=1,Type=Integer,Description="Phred-scaled Genotype Quality",IDX=15

The rewritten bcf (containing the second lines above) then concats perfectly with other 'bcftools view' generated bcfs.

Just in case it is relevant, here is the samtools mpileup | bcftools call pipe used to generate the 'call' bcf example:

samtools mpileup -d 250 -r MtDNA: -t DP,DPR -gu -f genome4217.fa bam8128.bam | bcftools call -m -A -f GQ -O b -

That exact same command run with samtools/bcftools 1.0 generated output that worked just fine with bcftools concat. I never tried samtools 1.1 so I don't know about the behaviour there.

Hope my description makes kind of sense and I'm not totally wrong with my diagnosis.
Best wishes,
Wolfgang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant