Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could bcftools report all states of characters for very low coverage? #2141

Open
ghost opened this issue Mar 26, 2024 · 4 comments
Open

Could bcftools report all states of characters for very low coverage? #2141

ghost opened this issue Mar 26, 2024 · 4 comments

Comments

@ghost
Copy link

ghost commented Mar 26, 2024

Hello,

I'm working with a phased diploid assembly and would like to implement an elementary variant calling strategy. My goal is to make observational calls based on the assumption that most positions should have a coverage depth of 2 and that the alignment is correct. Here's my current approach:

If the reference allele is A, I'd like to report A/T as a 0/1 GT. I'm currently experimenting with the following command:

bcftools mpileup -f ../../ref.fa H5A2.dual.sorted.bam | bcftools call -m -A

My initial observations suggest it might work as I want. But I fear some under-the-hood filtering due to the low coverage.

Can you confirm whether my understanding of this command's behavior is correct, given my intended goal?

I asked about this on Biostars, but no one came back to me after 20 days.

Thanks so much for any help.

@pd3
Copy link
Member

pd3 commented Mar 27, 2024

When the -A option is added and -v is not used, this tells the call command to keep all sites, including non-variant sites, and to keep all observed alleles in the ALT column, even if the genotype is not recognized as variant. So the genotype can be 0/0, but the ALT column will still carry an evidence for the other observed alleles. Adding mpileup -a FORMAT/AD will also record the number of reads supporting each allele.

@ghost
Copy link
Author

ghost commented Mar 27, 2024

Thanks for your answer; I am encountering an "extreme value encountered."

bcftools mpileup -f ../../ref.fa -a FORMAT/AD H5A2.dual.sorted.bam|bcftools call -m -A -v > H5A2.pileup
[mpileup] 1 sample in 1 input file
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] maximum number of reads per input file set to -d 250
[W::vcf_parse_format_fill5] Extreme FORMAT/AD value encountered and set to missing at Chrom_3:144644
[1]    2315323 killed     bcftools mpileup -f ../../ref.fa -a FORMAT/AD H5A2.dual.sorted.bam | 
       2315324 done       bcftools call -m -A -v > H5A2.pileup

When looking into Tablet for that position, I see nothing special, it's just the two contigs mapping there.

image

I don't understand what "extreme" is there.

@ghost
Copy link
Author

ghost commented Mar 29, 2024

Is bcftools not suitable for my application? If not, do you have any alternative suggestions? EDIT: it seems a good old samtools mpileup could actually give me what I want.
Thanks

@ghost ghost closed this as completed Mar 29, 2024
@pd3
Copy link
Member

pd3 commented Apr 2, 2024

Any chance you could provide a small test case to reproduce the problem? It is concerning that bcftools mpileup should produce a VCF that cannot be parsed by bcftools call

[W::vcf_parse_format_fill5] Extreme FORMAT/AD value encountered and set to missing at Chrom_3:144644

@pd3 pd3 reopened this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant