Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix known bugs in genotyping and look at confidence distribution #11

Open
iqbal-lab opened this issue Nov 17, 2015 · 6 comments
Open

Fix known bugs in genotyping and look at confidence distribution #11

iqbal-lab opened this issue Nov 17, 2015 · 6 comments
Assignees

Comments

@iqbal-lab
Copy link
Owner

To fix:

  1. poisson error rate uses estimates of total read count - prefer to use median depth
  2. improve gt confidence
@iqbal-lab iqbal-lab self-assigned this Nov 17, 2015
@iqbal-lab
Copy link
Owner Author

At this commit on branch fix_poisson:

commit 85c17cc
Author: Zamin Iqbal zam@well.ox.ac.uk
Date: Tue Nov 17 13:51:44 2015 +0000

Add script for getting confusion matrices

Results are looking pretty interesting.

I've taken a binary of NA12878 (this is a human sample, and have used Platinum genome data, cleaning with threshold 7), and intersected with 5000 random OMNI sites. I then compare with the 1000g calls at those sites as truth

I then calculate confusion matrices of genotyping for v1.0.5.21 of Cortex and for this commit, and compare them. To clarify the signal, I produce confusion matrices for calls with genotype confidence between 0 and 9, and for those between 10 and 19, and...
Here's some results. Left column is v1.0.5.21 and right is tip of fix_poisson

comp

Notice that we now put more stuff in the low confidence bin, but by the time confidence>10, in the new commit, we lose much fewer hets, but we do call more as missing (which is bizarre). So apart from the extra missingness, I am v happy

@iqbal-lab
Copy link
Owner Author

Oh no it's OK! There are a bunch of sites which we call as missing - in the old commit they have low confidence and now they have higher. I think the missingness is likely because these OMNI sites are actually adjacent to indels. There is a separate question as to what the confidence of a ./. call is - will look into this

@iqbal-lab
Copy link
Owner Author

Merged into master here

85c17cc

@iqbal-lab
Copy link
Owner Author

I just want to check the genotyping of longer alleles and then ok to close. All unit tests pass including with valgrind.

@hcdenbakker
Copy link
Contributor

'There is a separate question as to what the confidence of a ./. call is - will look into this' Would indeed be great if we could distinguish between truly missing (deletion) and uncertainty in calling. Back to my lectures now, just had to add this.

@iqbal-lab
Copy link
Owner Author

OK, so in two parts.

Why does a ./. call have a confidence?

When Cortex genotypes a bubble, you pass it a model. That model is diploid or haploid at the moment, and allows you to be either hom-ref, het, or hom-alt (for diploid, haploid is either hom-ref or hom-alt). Now suppose one allele is longer than another. And suppose we have zero covg on both alleles. Well then, the most likely model is homozygous for the shorter allele, as more chance of having zero covg on two short alleles. So, Cortex types it thus. Then, a wrapper script converts the cortex output to a VCF, and forcibly sets genotype to ./. if there is zero coverage on both alleles, as that is what most consumers want, but leaves the confidence unchanged.

Can we distinguish truly missing from uncertainty in calling
If, Henk, you mean to ask us to support the possibility, at any site, of the whole locus being deleted - I guess we could do that, but currently do not.
One reason for going back over all samples and genotyping at all sites is to be able to make a call on SNPs/whatever that were not called in person-X, to see if we think they are hom-ref, or actually het/hom-alt, but had not been called previously due to low confidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants