Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing homozygous variant call #45

Closed
brendanofallon opened this issue Dec 20, 2018 · 4 comments
Closed

Missing homozygous variant call #45

brendanofallon opened this issue Dec 20, 2018 · 4 comments
Assignees

Comments

@brendanofallon
Copy link

Describe the bug
Octopus appears to be missing a simple, fairly high coverage homozygous SNV call

Command
Command line to run octopus:

$ octopus  -R human_g1k_v37_decoy_phiXAdaptr.fasta -I region.bam -o region_germline_simple.vcf

Desktop (please complete the following information):

  • OS: CentOS
  • Version 7.4.1708
  • GRCh37 (with decoys and phiX adapter sequence)

Additional context
In testing against GIAB samples I came across a few false negative variants that appear readily in IGV, for instance there's a fairly clear homozygous C>G SNV at position 1:3318769 (obvious in attached screenshot). However, the variant does not appear in the VCF output. Other variants in the region are detected correctly. I'm probably missing something obvious here...
Happy to provide a small .bam file of the region as well - not sure how to include it here (it's about 100kb)

missed_snv

@dancooke
Copy link
Member

Suspicious that all of the reads are on the reverse strand. Is this WGS or WES? Octopus will usually ignore variants that are so highly strand biased as they are indicative of sequencing or mapping error. I could potentially tweak the logic to include these 'cleaner' strand-biased variants, but they'd almost certainly get filtered in any case.

@dancooke dancooke self-assigned this Dec 21, 2018
@brendanofallon
Copy link
Author

The data are from a large targeted panel - but I'm not sure that this variant is strand biased. The reads all go in one direction (we're on the right edge of a targeted region), but the number of alt-supporting reads isn't strongly biased in one direction relative to ref-supporting reads. (From what I understand, strand bias isn't a test of how close the forward / reverse ratio is to 50/50, it's a Chi-squared or FIsher's exact-like test to measure if alt-supporting reads are distributed similarly to ref-supporting reads across read directions. In the case above where there aren't any forward reads, the test probably cannot be performed meaningfully)
I've come across a handful of other false negative SNVs in similar regions where all the reads are in one direction. Happy to provide more small .bams or something if it would be helpful.

@dancooke
Copy link
Member

dancooke commented Jan 2, 2019

You're right that "strand bias" usually describes the difference in proportion of read direction supporting two or more distinct haplotypes. Octopus does compute a statistic of this sort of strand bias for filtering purposes. But as you point out, this quantity is not meaningful in the absence of reads in one direction or for homozygous calls. For WGS, we expect a binomial distribution of read directions supporting all called haplotypes, so observing a haplotype on just one strand is suggestive that the either the call is incorrect (e.g. due to a missed structural variant). It's therefore informative (about the validity of a call) to consider just the proportion of supporting read directions, which I also called "strand bias" for want of a better term. For targeted sequencing, we don't expect a binomial distribution of read directions for off-target regions, but then we cannot reasonably assign high confidence to calls in these regions in any case. I'd argue that such regions should be included in the panel, if variation within them is important.

That said, I can appreciate that some might want such variants to be called, even if they are filtered. I've therefore modified the variant discovery criteria to allow variants only seen in one direction if there are only reads in that direction overlapping the region (available in v0.5.3-beta). This should cause the variants that you mention to be called.

@dancooke dancooke closed this as completed Jan 2, 2019
@yangyxt
Copy link

yangyxt commented Jul 17, 2023

Using 0.7.4 docker image of octopus. Running into this:

image

The ploidy is 6 instead of 2. The middle mismatches of 4-bp deletion and the SNV are both captured but failed to have a valid GT (in the output both are recorded as ./.) The overlapping reads MQ and BQ are great, no strand bias.

GGGGA G 5884.22 RF . GT:AD:DP:FT ./.:67,40:152:RF
G A 5379.3 RF . GT:AD:DP:FT ./.:67,49:151:RF

Here are the command line arguments I used:
/opt/octopus/bin/octopus \ -R ${ref_genome} \ -I ${bam_file} \ -o ${output_vcf} \ --trace ${output_vcf/.vcf*/.trace.log} \ --debug ${output_vcf/.vcf*/.debug.log} \ --bamout ${bam_out} \ --bamout-type FULL \ ${region_arg} \ --threads ${threads} \ --consider-unmapped-reads \ --min-mapping-quality 15 \ --duplicate-read-detection-policy RELAXED \ --disable-downsampling \ --variant-discovery-mode ILLUMINA \ --organism-ploidy ${ploidy} \ --read-linkage PAIRED \ --caller ${caller_mode} \ --min-forest-quality ${forest_vq_filter} \ --forest ${forest} \ --annotations AD

A bit confused why this happened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants