Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make true pos count agree with true pos checks #825

Merged
merged 1 commit into from
Sep 2, 2015

Conversation

ihodes
Copy link
Member

@ihodes ihodes commented Aug 31, 2015

This means we don't use the sample name to test concordance, possibly
leading to weird numbers when looking at multi-sample VCFs.

fixes #822

Review on Reviewable

This means we don't use the sample name to test concordance, possibly
leading to weird numbers when looking at multi-sample VCFs.
@ihodes
Copy link
Member Author

ihodes commented Sep 2, 2015

So I actually think we should get rid of this feature soon/replace it with a better, clearer concordance system. This wasn't really a bug (strictly speaking, I'd call it a bug); the two counts of true positives were both somewhat valid—one looked for a match on the sample name, as well, which prevents duplicate counts of true positives on multi-sample VCF comparisons. I've switched the other join to also only count true positives with the same sample name, for now. Ideally, we'll improve our comparison system a bit, but that'll take a lot of work (@jaclynperrone & I chatted about this some more today).

I didn't include a test, because I think this whole thing needs to be changed significantly. Will leave it to the reviewer if you think it needs a regression test anyway!.

@armish
Copy link
Member

armish commented Sep 2, 2015

This looks good to me. I think as long as we make it clear how we calculate those values in the user documentation, either way should be fine. Reading your last comment, I thought that you changed another join to start matching with sample names, but this change set features a change in a single join statement. So just to make sure: you didn't forget to push another change out, right?

Also, just out of curiosity: how does the comparison numbers look after this change? Do we have 3 true positives and 11 false positives?

@ihodes
Copy link
Member Author

ihodes commented Sep 2, 2015

Ah sorry, I wasn't clear. Here (https://github.com/hammerlab/cycledash/blob/master/cycledash/api/genotypes.py#L170) we already join on sample_name as well, so I just made sure both joins join on the sample_name in addition to the other fields.

@armish
Copy link
Member

armish commented Sep 2, 2015

thanks for the clarification! Good to go, then 👍

@ihodes
Copy link
Member Author

ihodes commented Sep 2, 2015

Also, just out of curiosity: how does the comparison numbers look after this change? Do we have 3 true positives and 11 false positives?

0 true positives; as the VCF being compared to doesn't have a sample with the same name as the examined VCF (this is why this is a problematic way of calculating true/false/pos/negs.) Needs a serious rework in light of how this feature is likely to be used; we're punting on it until we know who needs to be using this feature next.

ihodes added a commit that referenced this pull request Sep 2, 2015
Make true pos count agree with true pos checks
@ihodes ihodes merged commit bdd587d into master Sep 2, 2015
@ihodes ihodes deleted the issue-822-fix-true-positive-count branch December 23, 2015 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VCF comparison table showing incorrect results
2 participants