You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
a few days ago I switched from samtools/bcftools 1.0 to 1.2 .
One immediate problem I ran into was that 'bcftools concat' stopped working for mixed bcf input files, some generated with 'bcftools call', some with 'bcftools view'. With stopped working I mean that 'concat' actually does produce output without complaining, but when trying to 'view' the resulting file 'bctools view' invariably seg faults on the first record in the file.
After quite some debugging attempts, my only explanation for the observed behaviour is that the 'call' command seems to use different IDX numbers for certain INFO and FORMAT lines in the bcf header.
Here's a element-by-element comparison (I had to trim all lines to prevent github from ignoring '<>' flanked content) for the relevant header lines from a file generated with 'bcftools call' and one obtained by taking that same file through a bcf -> vcf -> bcf cycle with 'bcftools view'. Note how several IDXs are changing.
The INFO records:
ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.",IDX=1
ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.",IDX=1
ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel",IDX=2
ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel",IDX=2
ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel",IDX=3
ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel",IDX=3
ID=DP,Number=1,Type=Integer,Description="Raw read depth",IDX=4
ID=DP,Number=1,Type=Integer,Description="Raw read depth",IDX=4
ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3",IDX=5
ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3",IDX=5
ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)",IDX=6
ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)",IDX=6
ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)",IDX=7
ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)",IDX=7
ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)",IDX=8
ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)",IDX=8
ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)",IDX=9
ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)",IDX=9
ID=SGB,Number=1,Type=Float,Description="Segregation based metric.",IDX=10
ID=SGB,Number=1,Type=Float,Description="Segregation based metric.",IDX=10
ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)",IDX=11
ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)",IDX=11
ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)",IDX=18
ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)",IDX=16
ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)",IDX=19
ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)",IDX=17
ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed",IDX=20
ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed",IDX=18
ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=21
ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=19
ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases",IDX=22
ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases",IDX=20
ID=MQ,Number=1,Type=Integer,Description="Average mapping quality",IDX=23
ID=MQ,Number=1,Type=Integer,Description="Average mapping quality",IDX=21
and the FORMAT entries:
ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods",IDX=14
ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods",IDX=12
ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases",IDX=4
ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases",IDX=4
ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele",IDX=15
ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele",IDX=13
ID=GT,Number=1,Type=String,Description="Genotype",IDX=16
ID=GT,Number=1,Type=String,Description="Genotype",IDX=14
ID=GQ,Number=1,Type=Integer,Description="Phred-scaled Genotype Quality",IDX=17
ID=GQ,Number=1,Type=Integer,Description="Phred-scaled Genotype Quality",IDX=15
The rewritten bcf (containing the second lines above) then concats perfectly with other 'bcftools view' generated bcfs.
Just in case it is relevant, here is the samtools mpileup | bcftools call pipe used to generate the 'call' bcf example:
That exact same command run with samtools/bcftools 1.0 generated output that worked just fine with bcftools concat. I never tried samtools 1.1 so I don't know about the behaviour there.
Hope my description makes kind of sense and I'm not totally wrong with my diagnosis.
Best wishes,
Wolfgang
The text was updated successfully, but these errors were encountered:
pd3
added a commit
to pd3/htslib
that referenced
this issue
Apr 29, 2015
Hi,
a few days ago I switched from samtools/bcftools 1.0 to 1.2 .
One immediate problem I ran into was that 'bcftools concat' stopped working for mixed bcf input files, some generated with 'bcftools call', some with 'bcftools view'. With stopped working I mean that 'concat' actually does produce output without complaining, but when trying to 'view' the resulting file 'bctools view' invariably seg faults on the first record in the file.
After quite some debugging attempts, my only explanation for the observed behaviour is that the 'call' command seems to use different IDX numbers for certain INFO and FORMAT lines in the bcf header.
Here's a element-by-element comparison (I had to trim all lines to prevent github from ignoring '<>' flanked content) for the relevant header lines from a file generated with 'bcftools call' and one obtained by taking that same file through a bcf -> vcf -> bcf cycle with 'bcftools view'. Note how several IDXs are changing.
The INFO records:
ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.",IDX=1
ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.",IDX=1
ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel",IDX=2
ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel",IDX=2
ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel",IDX=3
ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel",IDX=3
ID=DP,Number=1,Type=Integer,Description="Raw read depth",IDX=4
ID=DP,Number=1,Type=Integer,Description="Raw read depth",IDX=4
ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3",IDX=5
ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3",IDX=5
ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)",IDX=6
ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)",IDX=6
ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)",IDX=7
ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)",IDX=7
ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)",IDX=8
ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)",IDX=8
ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)",IDX=9
ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)",IDX=9
ID=SGB,Number=1,Type=Float,Description="Segregation based metric.",IDX=10
ID=SGB,Number=1,Type=Float,Description="Segregation based metric.",IDX=10
ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)",IDX=11
ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)",IDX=11
ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)",IDX=18
ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)",IDX=16
ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)",IDX=19
ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)",IDX=17
ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed",IDX=20
ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed",IDX=18
ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=21
ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes",IDX=19
ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases",IDX=22
ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases",IDX=20
ID=MQ,Number=1,Type=Integer,Description="Average mapping quality",IDX=23
ID=MQ,Number=1,Type=Integer,Description="Average mapping quality",IDX=21
and the FORMAT entries:
ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods",IDX=14
ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods",IDX=12
ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases",IDX=4
ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases",IDX=4
ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele",IDX=15
ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele",IDX=13
ID=GT,Number=1,Type=String,Description="Genotype",IDX=16
ID=GT,Number=1,Type=String,Description="Genotype",IDX=14
ID=GQ,Number=1,Type=Integer,Description="Phred-scaled Genotype Quality",IDX=17
ID=GQ,Number=1,Type=Integer,Description="Phred-scaled Genotype Quality",IDX=15
The rewritten bcf (containing the second lines above) then concats perfectly with other 'bcftools view' generated bcfs.
Just in case it is relevant, here is the samtools mpileup | bcftools call pipe used to generate the 'call' bcf example:
samtools mpileup -d 250 -r MtDNA: -t DP,DPR -gu -f genome4217.fa bam8128.bam | bcftools call -m -A -f GQ -O b -
That exact same command run with samtools/bcftools 1.0 generated output that worked just fine with bcftools concat. I never tried samtools 1.1 so I don't know about the behaviour there.
Hope my description makes kind of sense and I'm not totally wrong with my diagnosis.
Best wishes,
Wolfgang
The text was updated successfully, but these errors were encountered: