Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI #7

Merged
merged 75 commits into from
Apr 15, 2016
Merged

CLI #7

merged 75 commits into from
Apr 15, 2016

Conversation

iskandr
Copy link
Contributor

@iskandr iskandr commented Feb 29, 2016

Gives isovar a commandline interface and CSV output (#6)

@hammer
Copy link

hammer commented Mar 30, 2016

I have some code review bandwidth and am interested in isovar. Does this PR represent the latest state of the project?

@iskandr
Copy link
Contributor Author

iskandr commented Mar 30, 2016

Not quite, there are some fragments not checked in. I'll give you something
reviewable tomorrow though.
On Mar 30, 2016 1:06 AM, "Jeff Hammerbacher" notifications@github.com
wrote:

I have some code review bandwidth and am interested in isovar. Does this
PR represent the latest state of the project?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#7 (comment)

@hammer
Copy link

hammer commented Apr 1, 2016

@iskandr ready for review?

@iskandr
Copy link
Contributor Author

iskandr commented Apr 1, 2016

@hammer Not yet, still working through some edge cases I thought of last night. The parts that don't interact with read data (such as the reference_context module) are pretty stable though, if you want to start somewhere.

@iskandr
Copy link
Contributor Author

iskandr commented Apr 1, 2016

@hammer variant_reads, variant_sequences, and reads_at_locus should also be more or less in their final shape, with passing unit tests. The only missing part now is that the translation logic is split across a few incomplete files, which I need to weave back together.

@iskandr
Copy link
Contributor Author

iskandr commented Apr 1, 2016

@tavinathanson @ihodes @hammer If you all want to start reviewing, anything other than protein_sequence and its tests are fair game.

I added some explanation of the design to the README but it's worth repeating a little here:

  1. Reads are filtered from the BAM around each variant into ReadAtLocus records. These contain the sequence of a read and the offsets into that sequence for the bases immediately before and after a variant. I found "anchoring" on reference bases adjacent to the variant was easier than trying to deal with the variant locus directly.

  2. Not every ReadAtLocus object contains the variant nucleotides, so these get further filtered down into VariantRead objects.

  3. Multiple VariantReads get collapsed into a VariantSequence. These are the cDNA sequences we're going to try translating.

  4. Which reading frame to use? We're not really sure (without long reads + ribosomal footprint profiling), so instead we take a reasonable guess based on the reading frames of reference transcripts around this locus. I summarize the reference sequence at this locus and its associated reading frame in a ReferenceContext object. There's a weird little hierarchy of named tuples in that file, which I found simplified testing a lot (enabled me to right more tests and to think more clearly about little bits of functionality).

  5. Finally, the VariantReads and ReferenceContexts of a variant get combined into zero or more ProteinSequence objects. Since we may only want the single best translation, both reference contexts and variant sequences are sorted first.


chromosome : str

base1_position_before_variant : int

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just poking around; noticed that you have two base1_position_before_variantss vs. after.

… tested and run separately from rest of protein_sequence module
… VariantSequence objects, got rid of partially matching reads, added more tests for interbase range of variants on transcripts, found a bug in pyensembl's contig normalization
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling c8ee0bf on cli into * on master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 3959de7 on cli into * on master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 1706af7 on cli into * on master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 1706af7 on cli into * on master*.

@tavinathanson
Copy link

@iskandr I'm running into AttributeError: 'module' object has no attribute 'cutils' with pysam 0.8.4; doesn't complain with 0.9.0.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling bdd3125 on cli into * on master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 4910ded on cli into * on master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling d06c34e on cli into * on master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling a38fe9d on cli into * on master*.

@iskandr
Copy link
Contributor Author

iskandr commented Apr 15, 2016

Fixes #12, #11, #10, #6.

I need to more explicitly test for #4 since the B16 variants do have nearby germline variants but I haven't documented what they are.

@iskandr iskandr merged commit 89b2a4e into master Apr 15, 2016
@iskandr iskandr deleted the cli branch April 23, 2016 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants