Open
Description
When I join SNP and INDEL entries using bcftools norm -m +any
, one of the AN
("Total number of alleles in called genotypes") value is discard.
Here is a reproducible example:
1.vcf
(merged from two vcf files using bcftools merge --no-index part1.vcf part2.vcf
, see below)
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=Chr1,length=100>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##bcftools_mergeVersion=1.19+htslib-1.19
##bcftools_mergeCommand=merge --no-index part1.vcf part2.vcf; Date=Sun Mar 24 15:29:30 2024
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A001 A002 A003 A004 A005 A006 A007 A008 A009 A010 A011 A012 A013 C001 C002 C003 C004 C005 C006 C007 C008 C009 C010 C011 C012 C013 C014 C015 C016 C017 C018 C019 C020 C021 C022 C023 C024 C025 C026 C027 C028 C029 C030 C031 C032 C033 C034 C035 C036 C037 C038 C039 C040 C041 C042 C043 C044 C045 C046 C047 C048 C049 C050 C051 C052 C053 C054 C055 C056 C057 C058 C059 C060
Chr1 1 . T A 228.246 PASS AN=24;AC=8 GT:DP 1/1:20 0/0:2 0/1:8 1/1:6 ./.:0 0/0:8 0/0:2 0/0:5 0/0:1 0/0:1 1/1:17 0/0:2 0/1:13 ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:.
Chr1 1 . T TAAAAA,TAAA,TAA,TAAAA,TA 228.401 PASS INDEL;AN=120;AC=28,43,10,8,30 GT:DP ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. ./.:. 1/1:11 2/3:16 4/4:11 4/1:11 2/2:18 2/5:29 5/5:23 1/1:35 2/2:36 0/3:14 3/5:19 3/2:15 2/2:18 3/2:14 2/2:17 2/2:9 1/2:16 5/1:15 5/2:17 1/1:11 2/1:9 5/5:5 5/5:44 1/5:21 2/2:18 5/3:19 1/1:19 5/1:19 2/1:48 2/5:31 1/5:23 2/2:12 2/4:11 1/1:20 2/4:10 1/2:9 1/2:14 1/1:17 5/2:10 3/2:12 2/2:16 1/2:14 1/5:55 1/2:41 5/3:47 1/4:39 5/5:13 5/2:11 5/2:37 3/5:43 2/1:27 5/5:30 4/2:30 5/5:35 4/2:12 2/1:10 5/5:13 2/3:23 5/2:14 2/2:12
After bcftools norm -m +any 1.vcf
...
##bcftools_normVersion=1.19+htslib-1.19
##bcftools_normCommand=norm -m +any c.vcf; Date=Sun Mar 24 15:32:45 2024
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A001 A002 A003 A004 A005 A006 A007 A008 A009 A010 A011 A012 A013 C001 C002 C003 C004 C005 C006 C007 C008 C009 C010 C011 C012 C013 C014 C015 C016 C017 C018 C019 C020 C021 C022 C023 C024 C025 C026 C027 C028 C029 C030 C031 C032 C033 C034 C035 C036 C037 C038 C039 C040 C041 C042 C043 C044 C045 C046 C047 C048 C049 C050 C051 C052 C053 C054 C055 C056 C057 C058 C059 C060
Chr1 1 . T A,TAAAAA,TAAA,TAA,TAAAA,TA 228.401 PASS AN=24;AC=8,28,43,10,8,30 GT:DP 1/1:20 0/0:2 0/1:8 1/1:6 ./.:0 0/0:8 0/0:2 0/0:5 0/0:1 0/0:1 1/1:17 0/0:2 0/1:13 2/2:. 3/4:. 5/5:. 5/2:. 3/3:. 3/6:. 6/6:. 2/2:. 3/3:. ./4:. 4/6:. 4/3:. 3/3:. 4/3:. 3/3:. 3/3:. 2/3:. 6/2:. 6/3:. 2/2:. 3/2:. 6/6:. 6/6:. 2/6:. 3/3:. 6/4:. 2/2:. 6/2:. 3/2:. 3/6:. 2/6:. 3/3:. 3/5:. 2/2:. 3/5:. 2/3:. 2/3:. 2/2:. 6/3:. 4/3:. 3/3:. 2/3:. 2/6:. 2/3:. 6/4:. 2/5:. 6/6:. 6/3:. 6/3:. 4/6:. 3/2:. 6/6:. 5/3:. 6/6:. 5/3:. 3/2:. 6/6:. 3/4:. 6/3:. 3/3:.
As you can see, the AN
value for the norm
ed entry is 24
, instead of the correct 144
(120+24
).
This leads to an error when I ran bcftools norm -m +any 1.vcf | bcftools view -q 0.1:nonmajor
:
[E::bcf_calc_ac] Incorrect AN/AC counts at Chr1:1
On the other hand, bcftools merge -m any --no-index part1.vcf part2.vcf
gives the correct AN
value:
...
##bcftools_mergeVersion=1.19+htslib-1.19
##bcftools_mergeCommand=merge --no-index -m any part1.vcf part2.vcf; Date=Sun Mar 24 15:39:19 2024
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A001 A002 A003 A004 A005 A006 A007 A008 A009 A010 A011 A012 A013 C001 C002C003 C004 C005 C006 C007 C008 C009 C010 C011 C012 C013 C014 C015 C016 C017 C018 C019 C020 C021 C022 C023 C024 C025 C026C027 C028 C029 C030 C031 C032 C033 C034 C035 C036 C037 C038 C039 C040 C041 C042 C043 C044 C045 C046 C047 C048 C049 C050C051 C052 C053 C054 C055 C056 C057 C058 C059 C060
Chr1 1 . T A,TAAAAA,TAAA,TAA,TAAAA,TA 228.401 PASS INDEL;AN=144;AC=8,28,43,10,8,30 GT:DP 1/1:20 0/0:2 0/1:8 1/1:6 ./.:0 0/0:8 0/0:2 0/0:5 0/0:10/0:1 1/1:17 0/0:2 0/1:13 2/2:11 3/4:16 5/5:11 5/2:11 3/3:18 3/6:29 6/6:23 2/2:35 3/3:36 0/4:14 4/6:19 4/3:15 3/3:18 4/3:14 3/3:17 3/3:9 2/3:16 6/2:15 6/3:17 2/2:11 3/2:9 6/6:5 6/6:44 2/6:21 3/3:18 6/4:19 2/2:19 6/2:19 3/2:48 3/6:31 2/6:23 3/3:12 3/5:11 2/2:20 3/5:10 2/3:9 2/3:14 2/2:17 6/3:10 4/3:12 3/3:16 2/3:14 2/6:55 2/3:41 6/4:47 2/5:39 6/6:13 6/3:11 6/3:37 4/6:43 3/2:27 6/6:30 5/3:30 6/6:35 5/3:12 3/2:10 6/6:13 3/4:23 6/3:14 3/3:12
part1.vcf and part2.vcf
part1.vcf
##fileformat=VCFv4.2
##contig=<ID=Chr1,length=100>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A001 A002 A003 A004 A005 A006 A007 A008 A009 A010 A011 A012 A013
Chr1 1 . T A 228.246 PASS AN=24;AC=8 GT:DP 1/1:20 0/0:2 0/1:8 1/1:6 ./.:0 0/0:8 0/0:2 0/0:5 0/0:1 0/0:1 1/1:17 0/0:2 0/1:13
part2.vcf
##fileformat=VCFv4.2
##contig=<ID=Chr1,length=100>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT C001 C002 C003 C004 C005 C006 C007 C008 C009 C010 C011 C012 C013 C014 C015 C016 C017 C018 C019 C020 C021 C022 C023 C024 C025 C026 C027 C028 C029 C030 C031 C032 C033 C034 C035 C036 C037 C038 C039 C040 C041 C042 C043 C044 C045 C046 C047 C048 C049 C050 C051 C052 C053 C054 C055 C056 C057 C058 C059 C060
Chr1 1 . T TAAAAA,TAAA,TAA,TAAAA,TA 228.401 PASS INDEL;AN=120;AC=28,43,10,8,30 GT:DP 1/1:11 2/3:16 4/4:11 4/1:11 2/2:18 2/5:29 5/5:23 1/1:35 2/2:36 0/3:14 3/5:19 3/2:15 2/2:18 3/2:14 2/2:17 2/2:9 1/2:16 5/1:15 5/2:17 1/1:11 2/1:9 5/5:5 5/5:44 1/5:21 2/2:18 5/3:19 1/1:19 5/1:19 2/1:48 2/5:31 1/5:23 2/2:12 2/4:11 1/1:20 2/4:10 1/2:9 1/2:14 1/1:17 5/2:10 3/2:12 2/2:16 1/2:14 1/5:55 1/2:41 5/3:47 1/4:39 5/5:13 5/2:11 5/2:37 3/5:43 2/1:27 5/5:30 4/2:30 5/5:35 4/2:12 2/1:10 5/5:13 2/3:23 5/2:14 2/2:12
Metadata
Metadata
Assignees
Labels
No labels