CLI #7

iskandr · 2016-02-29T15:26:47Z

Gives isovar a commandline interface and CSV output (#6)

hammer · 2016-03-30T05:06:39Z

I have some code review bandwidth and am interested in isovar. Does this PR represent the latest state of the project?

iskandr · 2016-03-30T05:10:56Z

Not quite, there are some fragments not checked in. I'll give you something
reviewable tomorrow though.
On Mar 30, 2016 1:06 AM, "Jeff Hammerbacher" notifications@github.com
wrote:

I have some code review bandwidth and am interested in isovar. Does this
PR represent the latest state of the project?

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#7 (comment)

hammer · 2016-04-01T03:02:11Z

@iskandr ready for review?

iskandr · 2016-04-01T15:21:36Z

@hammer Not yet, still working through some edge cases I thought of last night. The parts that don't interact with read data (such as the reference_context module) are pretty stable though, if you want to start somewhere.

iskandr · 2016-04-01T20:52:12Z

@hammer variant_reads, variant_sequences, and reads_at_locus should also be more or less in their final shape, with passing unit tests. The only missing part now is that the translation logic is split across a few incomplete files, which I need to weave back together.

iskandr · 2016-04-01T23:26:18Z

@tavinathanson @ihodes @hammer If you all want to start reviewing, anything other than protein_sequence and its tests are fair game.

I added some explanation of the design to the README but it's worth repeating a little here:

Reads are filtered from the BAM around each variant into ReadAtLocus records. These contain the sequence of a read and the offsets into that sequence for the bases immediately before and after a variant. I found "anchoring" on reference bases adjacent to the variant was easier than trying to deal with the variant locus directly.
Not every ReadAtLocus object contains the variant nucleotides, so these get further filtered down into VariantRead objects.
Multiple VariantReads get collapsed into a VariantSequence. These are the cDNA sequences we're going to try translating.
Which reading frame to use? We're not really sure (without long reads + ribosomal footprint profiling), so instead we take a reasonable guess based on the reading frames of reference transcripts around this locus. I summarize the reference sequence at this locus and its associated reading frame in a ReferenceContext object. There's a weird little hierarchy of named tuples in that file, which I found simplified testing a lot (enabled me to right more tests and to think more clearly about little bits of functionality).
Finally, the VariantReads and ReferenceContexts of a variant get combined into zero or more ProteinSequence objects. Since we may only want the single best translation, both reference contexts and variant sequences are sorted first.

tavinathanson · 2016-04-03T20:18:45Z

isovar/reads_at_locus.py

+
+    chromosome : str
+
+    base1_position_before_variant : int


Just poking around; noticed that you have two base1_position_before_variantss vs. after.

…p to help validate the process

… tested and run separately from rest of protein_sequence module

… VariantSequence objects, got rid of partially matching reads, added more tests for interbase range of variants on transcripts, found a bug in pyensembl's contig normalization

… rid of ReferenceContextWithORF

coveralls · 2016-04-06T19:20:58Z

Changes Unknown when pulling c8ee0bf on cli into * on master*.

coveralls · 2016-04-06T19:36:33Z

Changes Unknown when pulling 3959de7 on cli into * on master*.

coveralls · 2016-04-06T20:56:04Z

Changes Unknown when pulling 1706af7 on cli into * on master*.

coveralls · 2016-04-06T21:13:01Z

Changes Unknown when pulling 1706af7 on cli into * on master*.

tavinathanson · 2016-04-12T16:27:29Z

@iskandr I'm running into AttributeError: 'module' object has no attribute 'cutils' with pysam 0.8.4; doesn't complain with 0.9.0.

…n code, separating Translation from ProteinSequence

coveralls · 2016-04-13T23:22:46Z

Changes Unknown when pulling bdd3125 on cli into * on master*.

coveralls · 2016-04-14T16:51:17Z

Changes Unknown when pulling 4910ded on cli into * on master*.

… reads from B16 melanoma

… sequences

coveralls · 2016-04-15T16:14:52Z

Changes Unknown when pulling d06c34e on cli into * on master*.

…iple protein lengths

coveralls · 2016-04-15T22:57:48Z

Changes Unknown when pulling a38fe9d on cli into * on master*.

iskandr · 2016-04-15T22:57:52Z

Fixes #12, #11, #10, #6.

I need to more explicitly test for #4 since the B16 variants do have nearby germline variants but I haven't documented what they are.

tavinathanson reviewed Apr 3, 2016
View reviewed changes

iskandr added 23 commits April 6, 2016 15:10

moving filtering from script to the protein_sequences logic

b112c1d

all the action should be in the library, not the CLI script

94344ed

collapse protein sequence predictions onto unique sequences

9d77c77

collapse protein sequence predictions onto unique sequences

7e6b5a5

added usage to README

4a6878d

working on getting sequences of equal length with pooled read names

b383d66

propagating read names and added dataframe construction to the library

fb168fc

adding docstrings

97da9e1

experimenting with which fields we need on Translation info objects

1fd07e7

reorganizing isovar and adding commandline script for every major ste…

e63b866

…p to help validate the process

refactoring so that translation and reference transcript logic can be…

b389c2d

… tested and run separately from rest of protein_sequence module

fixing unit tests

550a384

got variant_sequences unit test working

8a815af

added CELSR1 test data from varlens

d523496

more refactoring of reference_context

ed313ac

added effect_prediction and logging as separate modules

f414f1a

simplifying pseudo-inheritance of namedtuples in reference_context

3ef28c4

moving variant helper functions to own module

c3ccb54

added unit test for variant_helpers

0791eca

adding variant_helper unit test

a338fa3

simplified VariantSequences collection of dicts into a sorted list of…

c59b7c5

… VariantSequence objects, got rid of partially matching reads, added more tests for interbase range of variants on transcripts, found a bug in pyensembl's contig normalization

added unit tests for reference sequence key on reverse strand

9bbc86c

reorganized reference_context module, added dataframe conversion, got…

d488aeb

… rid of ReferenceContextWithORF

added travis badge

c8ee0bf

iskandr force-pushed the cli branch from 408b811 to c8ee0bf Compare April 6, 2016 19:13

iskandr added 2 commits April 6, 2016 15:29

moved mismatch check further upstream

3959de7

skip loop iters when ref positions missing in variant_reads

a2da1d2

iskandr added 2 commits April 6, 2016 16:38

fixed extra arg in translate

cffcc23

fixed offset computation for trimmed reference context

1706af7

iskandr added 6 commits April 12, 2016 16:49

using DataFrameBuilder to cut down on redundant dataframe constructio…

b3c3a21

…n code, separating Translation from ProteinSequence

started logic for merging Translation objects into ProteinSequence

0aa9e3e

finish protein_sequence, updated README

596052b

added TODO for sorting

8a2fc67

added sorted of ProteinSequence objects

fac9586

updated script names in setup.py

bdd3125

added test for sorting protein sequences

4910ded

iskandr added 3 commits April 14, 2016 19:45

removed CELSR bam & vcf files, added much smaller set of variants and…

c8029d6

… reads from B16 melanoma

added basic test for number of rows in dataframe returned for protein…

217f558

… sequences

instal pyensembl for mouse data

d06c34e

iskandr added 2 commits April 15, 2016 17:08

revamped DataFrameBuilder, adding more tests

23e8efc

fixed unit test for dropping MAPQ=255 reads, added unit test for mult…

a38fe9d

…iple protein lengths

iskandr merged commit 89b2a4e into master Apr 15, 2016

iskandr deleted the cli branch April 23, 2016 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI #7

CLI #7

iskandr commented Feb 29, 2016

hammer commented Mar 30, 2016

iskandr commented Mar 30, 2016

hammer commented Apr 1, 2016

iskandr commented Apr 1, 2016

iskandr commented Apr 1, 2016

iskandr commented Apr 1, 2016

tavinathanson Apr 3, 2016

coveralls commented Apr 6, 2016

coveralls commented Apr 6, 2016

coveralls commented Apr 6, 2016

coveralls commented Apr 6, 2016

tavinathanson commented Apr 12, 2016

coveralls commented Apr 13, 2016

coveralls commented Apr 14, 2016

coveralls commented Apr 15, 2016

coveralls commented Apr 15, 2016

iskandr commented Apr 15, 2016

CLI #7

CLI #7

Conversation

iskandr commented Feb 29, 2016

hammer commented Mar 30, 2016

iskandr commented Mar 30, 2016

hammer commented Apr 1, 2016

iskandr commented Apr 1, 2016

iskandr commented Apr 1, 2016

iskandr commented Apr 1, 2016

tavinathanson Apr 3, 2016

Choose a reason for hiding this comment

coveralls commented Apr 6, 2016

coveralls commented Apr 6, 2016

coveralls commented Apr 6, 2016

coveralls commented Apr 6, 2016

tavinathanson commented Apr 12, 2016

coveralls commented Apr 13, 2016

coveralls commented Apr 14, 2016

coveralls commented Apr 15, 2016

coveralls commented Apr 15, 2016

iskandr commented Apr 15, 2016