Update Varcode to work with new multi-species PyEnsembl by iskandr · Pull Request #118 · openvax/varcode

iskandr · 2015-08-21T17:12:48Z

The main change here is that we can no longer assume that a PyEnsembl release version refers uniquely to a genome, now we also need to figure out the species. Anywhere where the release version was used had to be updated to construct a species-specific EnsemblRelease object.

Another change is that we can now rely on PyEnsembl to tell us which genome is the most recent for a particular reference name (instead of hard-coding release versions in ladder of if-statements).

TODO in another PR: add unit tests with mm10 aligned VCFs

…rence name into PyEnsembl

tavinathanson · 2015-08-21T21:02:11Z

Warning to @timodonnell, since his fear in #106 is coming true in this PR:

Many thanks for doing this in a way that does not break all of our existing code (which passes in ensembl_version to load_vcf)!

I do think it makes sense to get rid of it because ensembl_version doesn't mean anything when we're in a multi-species world; not sure how best to handle the old code situation?

tavinathanson · 2015-08-21T21:11:57Z

Reviewed 10 of 11 files at r1.
Review status: 10 of 11 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

^{test/test_mouse.py, line 28 [r1] (raw file):}
How about inferring the genome?

^{varcode/maf.py, line 120 [r1] (raw file):}
While you're at it, is it worth renaming this to genome, or rather save for another PR where we do a bulk rename? I'm cool with either.

^{varcode/reference_name.py, line 20 [r1] (raw file):}
Why not store all this info in PyEnsembl's species.py, since there'a already a fair bit of overlap?

^{varcode/reference_name.py, line 33 [r1] (raw file):}
You should probably mention at the declaration of this dict that the order matters, and explain what the order is (otherwise some developer will totally change the order without realizing it matters).

^{varcode/reference_name.py, line 34 [r1] (raw file):}
Typo: candidate_list

^{varcode/reference_name.py, line 36 [r1] (raw file):}
Using in, could we have a situation where mm9 is in a path with mm90? I know that's not a real example, but are there are any?

Comments from the review on Reviewable.io

tavinathanson · 2015-08-21T21:13:55Z

Reviewed 1 of 11 files at r1.
Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

Comments from the review on Reviewable.io

iskandr · 2015-08-21T21:32:51Z

Use pyensembl.cached_release(ensembl_version) to get a Genome object and pass that to load_vcf directly.

Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

^{test/test_mouse.py, line 28 [r1] (raw file):}
Good idea!

^{varcode/maf.py, line 120 [r1] (raw file):}
I was thinking about renaming it -- let's do it in the next PR

^{varcode/reference_name.py, line 20 [r1] (raw file):}
I was thinking about that but this is a different concept. Not all of these are exactly identical (e.g. hg19 isn't really GRCh37), but we want to use them as if they are identical.

^{varcode/reference_name.py, line 33 [r1] (raw file):}
The order is just reverse alphabetical (so GRCh38 comes before GRCh37); I'll comment that.

^{varcode/reference_name.py, line 34 [r1] (raw file):}
Hah, weirdly consistent typo.

^{varcode/reference_name.py, line 36 [r1] (raw file):}
It's totally possibly and I could imagine it happening. Any better ideas for inferring the genome?

Comments from the review on Reviewable.io

tavinathanson · 2015-08-21T21:45:35Z

Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.

^{varcode/reference_name.py, line 36 [r1] (raw file):}
No good ideas! Ah well.

Comments from the review on Reviewable.io

…stop_codon code to work on Py3

Update Varcode to work with new multi-species PyEnsembl

iskandr added 2 commits August 21, 2015 13:09

test mouse variants with both Genome and EnsemblRelease instances

6c91358

moved logic for building EnsemblRelease objects for a particular refe…

d5b29d4

…rence name into PyEnsembl

infer mouse VCF reference when given as GCF accesion, fixed Leekai's …

6fd7593

…stop_codon code to work on Py3

iskandr added a commit that referenced this pull request Aug 21, 2015

Merge pull request #118 from hammerlab/multi-species-pyensembl

b7998aa

Update Varcode to work with new multi-species PyEnsembl

iskandr merged commit b7998aa into master Aug 21, 2015

iskandr deleted the multi-species-pyensembl branch August 21, 2015 22:00

iskandr mentioned this pull request Aug 24, 2015

Get Topiary working with mice openvax/topiary#13

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Varcode to work with new multi-species PyEnsembl #118

Update Varcode to work with new multi-species PyEnsembl #118
iskandr merged 3 commits intomasterfrom
multi-species-pyensembl

iskandr commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

iskandr commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iskandr commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

iskandr commented Aug 21, 2015

Uh oh!

tavinathanson commented Aug 21, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants