Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools not calling high depth ALT allele #1276

Open
wesleymarin opened this issue Aug 4, 2020 · 3 comments
Open

bcftools not calling high depth ALT allele #1276

wesleymarin opened this issue Aug 4, 2020 · 3 comments

Comments

@wesleymarin
Copy link

wesleymarin commented Aug 4, 2020

Hello, I am having trouble getting bcftools to call the ALT allele with 61 DP4 depth, it instead is only showing the REF call with 0 DP4 read depth.

The ALT call is correctly identified in the 'bcftools mpileup' output (shown below) where the I16 field shows 0 REF reads and 61 ALT reads, and based on what I understand about the PL output, mpileup identified the correct genotype with the best match being G/G, but these results do not carry over to the 'bcftools call' output, which says the genotype is C/C even though there are 0 DP4 reads supporting this genotype.

I am using bcftools version 1.10.2-105-g7cd83b7

I have tried fiddling with many settings, but have not been able to resolve this issue. At this point I am not sure if I am missing a setting that causes 'bcftools call' to completely ignore the ALT call, or if it is a bug or something else. Any help would be greatly appreciated.

I can supply sequence and reference files to replicate.

BCFTOOLS MPILEUP
command: bcftools mpileup -m 3 --threads 60 -d 500 -F 0.0002 -f alleleReference.fasta IND00040.sorted.bam -T alleleReference.bed -O v -o IND00040.vcf

----- mpileup output
KIR2DL1*0020101 4522 . C G,T,<*> 0 . DP=92;I16=0,0,33,28,0,0,2055,80321,0,0,186,826,0,0,1193,27631;QS=0,0.971631,0.0283688,0;VDB=0.999999;SGB=-0.693147;MQSB=0.910046;MQ0F=0.0434783 PL 14,181,0,17,166,14,14,181,17,14

BCFTOOLS CALL
command: bcftools mpileup -m 3 --threads 60 -d 500 -F 0.0002 -f alleleReference.fasta IND00040.sorted.bam -T alleleReference.bed | bcftools call --multiallelic-caller -O v -o IND00040.vcf
----- call output
KIR2DL1*0020101 4522 . C . 15.3772 . DP=92;VDB=0.999999;SGB=-0.693147;MQSB=0.910046;MQ0F=0.0434783;AN=2;DP4=0,0,33,28;MQ=3 GT 0/0

Again, the 'G' in the mpileup output should be the correct base for this position, but it is completely missing from the bcftools call output. I also tried without the -m3 and -F0.0002 options, but the results were the same.

** edit **
It looks like the 0/0 genotype calls are specific to positions with low MQ, but I do not see any settings in bcftools call to adjust that behavior.

@pd3
Copy link
Member

pd3 commented Aug 17, 2020

I agree this looks odd. Going backwards starting with the information that is fully available from your description:

bcftools call relies on the QS and PL annotations. There the main problem is that the likelihood of the homozygous ref genotype (14) is not sufficiently big to override the default prior 1.1e-3. The prior can be increased e.g. as bcftools call -P 4e-2, then the correct(?) 1/1 genotype will be called. Note that this is not a problem with the -m calling model, the original bcftools call -c samtools calling model gives the same answer.

So probably the more important question is why bcftools mpileup thinks the homozygous reference is so likely. This is impossible to say without seeing the alignments, and even then it may be difficult to answer. Try to run with bcftools mpileup --no-BAQ. Also try to increase --min-MQ if it changes anything.

@anderdnavarro
Copy link

Hello, I have a similar problem when I try to call indels with high depth (bcftools version: 1.10.2), but in my case I think the problem comes from the -d option.

For example:

BCFTOOLS CALL
bcftools mpileup -Q 20 -B -d 250 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O u -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam | bcftools call -vm -P 1e-1 --ploidy GRCh37 |grep 93528782
----- call output
15 93528782 . TCC TC 235 . INDEL;IDV=108;IMF=0.444444;DP=492;VDB=0.000886062;SGB=67.0654;MQSB=0.993151;MQ0F=0;ICB=0.3;HOB=0.125;AC=1;AN=4;DP4=364,29,91,8;MQ=59 GT:PL:DP:SP:ADF:ADR:AD 0/0:0,255,255:249:0:236,0:13,0:249,0 0/1:249,0,255:243:3:128,91:16,8:144,99

BCFTOOLS MPILEUP
bcftools mpileup -Q 20 -B -d 250 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O u -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam |grep 93528782
----- mpileup output
15 93528782 . T <*> 0 . DP=492;I16=395,27,0,0,14056,476454,0,0,25319,1.51908e+06,0,0,7183,152099,0,0;QS=2,0;MQSB=1;MQ0F=0 PL:DP:SP:ADF:ADR:AD 0,255,255:206:0:196,0:10,0:206,0 0,255,255:216:0:199,0:17,0:216,0
15 93528782 . TCC TC 0 . INDEL;IDV=108;IMF=0.444444;DP=492;I16=364,29,91,8,12334,466190,3298,110608,23434,1.40163e+06,5940,356400,6684,141566,1756,37322;QS=1.58914,0.410863;VDB=0.000886062;SGB=67.0654;MQSB=0.993151;MQ0F=0 PL:DP:SP:ADF:ADR:AD 0,255,255:249:0:236,0:13,0:249,0 249,0,255:243:3:128,91:16,8:144,99

The example above is fine because bcftools performs the calling well. However, I'm working with high coverage BAMs and I wanted to increase the maximum number of reads per file with the -d option (e.g. -d 260), but I lose the mutation:

BCFTOOLS CALL
bcftools mpileup -Q 20 -B -d 260 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O v -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam | bcftools call -vm -P 1e-1 --ploidy GRCh37 |grep 93528782
----- call output
No results

BCFTOOLS MPILEUP
bcftools mpileup -Q 20 -B -d 260 --ignore-RG -a 'DP,AD,ADF,ADR,SP' -O v -f /genome/hg19sorted.fa -r 15 normal.bam tumor.bam |grep 93528782
----- mpileup output
15 93528782 . T <*> 0 . DP=511;I16=406,32,0,0,14567,492891,0,0,26279,1.57668e+06,0,0,7233,152401,0,0;QS=2,0;MQSB=1;MQ0F=0 PL:DP:SP:ADF:ADR:AD 0,255,255:213:0:202,0:11,0:213,0 0,255,255:225:0:204,0:21,0:225,0

I think that the problem is in the bcftools mpileup command that doesn't detect the indel, but I have no idea why, since I'm only increasing the maximum number of reads, I can understand that it may be lost in the calling but not in the raw pileup.

@pd3
Copy link
Member

pd3 commented Sep 10, 2020

In order to debug this, we'd need to see a slice of the BAM to reproduce the problem. Sorry I can't be of much help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants