Skip to content

Update Varcode to work with new multi-species PyEnsembl #118

Merged
iskandr merged 3 commits intomasterfrom
multi-species-pyensembl
Aug 21, 2015
Merged

Update Varcode to work with new multi-species PyEnsembl #118
iskandr merged 3 commits intomasterfrom
multi-species-pyensembl

Conversation

@iskandr
Copy link
Copy Markdown
Contributor

@iskandr iskandr commented Aug 21, 2015

The main change here is that we can no longer assume that a PyEnsembl release version refers uniquely to a genome, now we also need to figure out the species. Anywhere where the release version was used had to be updated to construct a species-specific EnsemblRelease object.

Another change is that we can now rely on PyEnsembl to tell us which genome is the most recent for a particular reference name (instead of hard-coding release versions in ladder of if-statements).

TODO in another PR: add unit tests with mm10 aligned VCFs

Review on Reviewable

@tavinathanson
Copy link
Copy Markdown
Contributor

Warning to @timodonnell, since his fear in #106 is coming true in this PR:

Many thanks for doing this in a way that does not break all of our existing code (which passes in ensembl_version to load_vcf)!

I do think it makes sense to get rid of it because ensembl_version doesn't mean anything when we're in a multi-species world; not sure how best to handle the old code situation?

@tavinathanson
Copy link
Copy Markdown
Contributor

Reviewed 10 of 11 files at r1.
Review status: 10 of 11 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


test/test_mouse.py, line 28 [r1] (raw file):
How about inferring the genome?


varcode/maf.py, line 120 [r1] (raw file):
While you're at it, is it worth renaming this to genome, or rather save for another PR where we do a bulk rename? I'm cool with either.


varcode/reference_name.py, line 20 [r1] (raw file):
Why not store all this info in PyEnsembl's species.py, since there'a already a fair bit of overlap?


varcode/reference_name.py, line 33 [r1] (raw file):
You should probably mention at the declaration of this dict that the order matters, and explain what the order is (otherwise some developer will totally change the order without realizing it matters).


varcode/reference_name.py, line 34 [r1] (raw file):
Typo: candidate_list


varcode/reference_name.py, line 36 [r1] (raw file):
Using in, could we have a situation where mm9 is in a path with mm90? I know that's not a real example, but are there are any?


Comments from the review on Reviewable.io

@tavinathanson
Copy link
Copy Markdown
Contributor

Reviewed 1 of 11 files at r1.
Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


Comments from the review on Reviewable.io

@iskandr
Copy link
Copy Markdown
Contributor Author

iskandr commented Aug 21, 2015

Use pyensembl.cached_release(ensembl_version) to get a Genome object and pass that to load_vcf directly.


Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


test/test_mouse.py, line 28 [r1] (raw file):
Good idea!


varcode/maf.py, line 120 [r1] (raw file):
I was thinking about renaming it -- let's do it in the next PR


varcode/reference_name.py, line 20 [r1] (raw file):
I was thinking about that but this is a different concept. Not all of these are exactly identical (e.g. hg19 isn't really GRCh37), but we want to use them as if they are identical.


varcode/reference_name.py, line 33 [r1] (raw file):
The order is just reverse alphabetical (so GRCh38 comes before GRCh37); I'll comment that.


varcode/reference_name.py, line 34 [r1] (raw file):
Hah, weirdly consistent typo.


varcode/reference_name.py, line 36 [r1] (raw file):
It's totally possibly and I could imagine it happening. Any better ideas for inferring the genome?


Comments from the review on Reviewable.io

@tavinathanson
Copy link
Copy Markdown
Contributor

Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


varcode/reference_name.py, line 36 [r1] (raw file):
No good ideas! Ah well.


Comments from the review on Reviewable.io

iskandr added a commit that referenced this pull request Aug 21, 2015
Update Varcode to work with new multi-species PyEnsembl
@iskandr iskandr merged commit b7998aa into master Aug 21, 2015
@iskandr iskandr deleted the multi-species-pyensembl branch August 21, 2015 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants