-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlapping variants #600
Comments
This is a simple question but difficult to help with. From the biological point of view, there should be no overlapping variants. However, all callers make false positives and the raw calls can be conflicting. There should be a filtering step to exclude low-confidence calls. Also, ideally the data should be phased. |
It makes sense to skip overlapping variants when generating a consensus for a particular sample. However, in some cases, even non-overlapping variants get reported as "overlapping" and skipped when running "bcftools consensus". Here is an example:
This is is a deletion immediately followed by an insertion, there is no overlap or conflict between them but "bcftools consensus" will report as "overlapping" and skip. Any thoughts? |
This should be now fixed. However, note that this fix is still not perfect in that it requires normalized VCF and may therefore falsely skip multiallelic indels. |
Thanks for the quick fix. However, running the program on that same VCF (see my above example) gives this error:
This error arises because position 1:2288273 has been removed by the previous deletion (1:2288271 AAC -> A), but the insertion right after the deletion (in the VCF file) still uses the raw reference (1:2288273:C) which causes a conflict. |
Any chance you could provide a test case to debug the problem? |
This improves the test introduced in cbea92d. Coincidentally the reference base of the modified and the original reference was the same in the test which masked the problem pointed out by #600 (comment)
Nevermind. This specific case should be now addressed by 21fd8da. Let's hope this will work for the rest of your VCF as well |
Great! Now it works! Thanks! |
Hi! As far as I understood, it does:
will paste an example, referring to the genotype of first sample in vcf: REF_NMAP_I3748 2556 . CGGCTG TGGCTG,,C 6236.95 . (...) GT:AD:DP:GQ:PGT:PID:PL 0/1: if u look at the region in the aligned fastas: you see that:
most of my cases seem to fit the "rules" above, but not all. I do have cases (much less) of skipped positions for the "-I". they seem to be in problematic regions (multiple possible overlapping indels) but I could not quite figure out the rules... may it be that if it is an indel completely within the other indel it is skipped? I am a bit confused, can you please explain me better what the program does? |
The program assumes that the VCF has the correct information and skips conflicting variants. |
Hi,
The results I got with I'm using bcftools 1.9-206-g4694164 and htslib 1.9-258-ga428aa2. Thanks in advance! |
The behavior depends on how However, there is no conflict in this case: the variant rs141511289 inserts a new Please open a new issue next time as this is not really a continuation of the old thread. |
Sorry for mis-posting. Thank you very much for the response. |
Hi -
I'm trying to generate a fastq file using bcftools consensus. However, I am getting the warning that consensus is skipping sites with overlapping variants (where there's two entries in the vcf for a particular position). Is there a way to circumvent this issue? It's not clear to me whether this is an upstream issue with how I'm generating my VCF file. I called SNPs after using samtools mpileup and bcftools call on a bam file with multiple individuals. I then extracted information for a single individual into a new vcf file and tried to run bcftools consensus.
Thanks.
The text was updated successfully, but these errors were encountered: