Skip to content

Allow Varcode to work with mouse data via Genome#106

Merged
iskandr merged 5 commits intomasterfrom
mouse
Aug 2, 2015
Merged

Allow Varcode to work with mouse data via Genome#106
iskandr merged 5 commits intomasterfrom
mouse

Conversation

@tavinathanson
Copy link
Copy Markdown
Contributor

This is a quick fix to allow Varcode to work with mouse data (and an initial test for that) by accepting Genome objects.

It's certainly confusing to be using ensembl and genome somewhat interchangeably here. I'd like a follow-up PR (and will create an issue, if this order-of-things sounds good to you) to liberate Varcode from Ensembl much like openvax/pyensembl#99, but for now this seems useful.

Review on Reviewable

@timodonnell
Copy link
Copy Markdown
Contributor

Many thanks for doing this in a way that does not break all of our existing code (which passes in ensembl_version to load_vcf)!

@tavinathanson
Copy link
Copy Markdown
Contributor Author

@timodonnell Fewer tests for me to fix, too! The intent of openvax/pyensembl#99 is also to keep the EnsemblRelease API intact despite refactoring the innards and adding a superclass to it.

@iskandr
Copy link
Copy Markdown
Contributor

iskandr commented Jul 28, 2015

Finding lots of little bugs in the PyEnsembl PR we merged, but also finding a design flaw wherein each GenomeSource only tells you the command you need to download e.g. a protein FASTA but that command can't be run in isolation (since the pyensembl script always requires a GTF).

I guess we'll figure out a better design when you come back.

@iskandr
Copy link
Copy Markdown
Contributor

iskandr commented Jul 28, 2015

To clarify on the last comment, to install the mouse genome you need to run:

pyensembl install --reference-name "GRCm38" --transcript-fasta-path-or-url "ftp://ftp.ensembl.org/pub/release-81/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz" --protein-fasta-path-or-url "ftp://ftp.ensembl.org/pub/release-81/fasta/mus_musculus/pep/Mus_musculus.GRCm38.pep.all.fa.gz" --gtf-path-or-url "ftp://ftp.ensembl.org/pub/release-81/gtf/mus_musculus/Mus_musculus.GRCm38.81.gtf.gz"

but if this genome is missing you'll see errors like:

nose.proxy.ValueError: Genome sequence data (GenomeSource(transcript_fasta_path_or_url=ftp://ftp.ensembl.org/pub/release-81/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz)) is not currently installed for this genome source. Run pyensembl install --transcript_fasta_path_or_url "ftp://ftp.ensembl.org/pub/release-81/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz" or call Genome(reference_name="GRCm38", transcript_fasta_path_or_url="ftp://ftp.ensembl.org/pub/release-81/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz")).install()

@iskandr
Copy link
Copy Markdown
Contributor

iskandr commented Jul 28, 2015

Additionally, it looks like EnsemblReleaseSource no longer gets used and thus even human Ensembl references emit an error like the one above.

@tavinathanson
Copy link
Copy Markdown
Contributor Author

Offline discussion re the design flaw: it still "works", but might be an annoyance to run multiple commands. We agreed to table that until openvax/pyensembl#100 is addressed.

@tavinathanson
Copy link
Copy Markdown
Contributor Author

Re the bugs: @iskandr and I worked through them together and they're summarized/addressed here: openvax/pyensembl#108

Offline discussion re EnsemblReleaseSource: that was intended to be used, and is used in the PR above.

@tavinathanson
Copy link
Copy Markdown
Contributor Author

@iskandr I don't remember how we left it with this PR, since it's hacky but not entirely awful. What would you like to see before we merge?

@iskandr
Copy link
Copy Markdown
Contributor

iskandr commented Aug 2, 2015

@tavinathanson Let's merge this, and get mouse epitope predictions out of Topiary, and figure out a better organization next week.

iskandr added a commit that referenced this pull request Aug 2, 2015
Allow Varcode to work with mouse data via Genome
@iskandr iskandr merged commit 000c52c into master Aug 2, 2015
@iskandr iskandr deleted the mouse branch August 2, 2015 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants