Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter to expressed variants #67

Merged
merged 16 commits into from Jul 13, 2016
Merged

Filter to expressed variants #67

merged 16 commits into from Jul 13, 2016

Conversation

tavinathanson
Copy link
Member

In this PR:

  • Factored some of the df_isovar logic out of load_neoantigens into a separate _load_single_patient_isovar, to use it outside the neoantigen context.
  • Added _filter_variants_to_expressed to filter out any variant that isn't present in the isovar DataFrame.
  • Randomly made it into this branch: parameterizing variant_qc_filter.

@arahuja

@@ -748,13 +800,52 @@ def load_neoantigens(self, patients=None, variant_type="snv", merge_type="union"
dfs[patient.id] = df_epitopes
return dfs

def _load_single_patient_isovar(self, patient, variant_type, merge_type, epitope_lengths):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the content of this is all the same; just moved.

@coveralls
Copy link

coveralls commented Jun 28, 2016

Coverage Status

Coverage decreased (-0.8%) to 58.344% when pulling 394dd61 on expressed_mutations into 5b1eaf6 on master.

ref=row["ref"],
alt=row["alt"],
ensembl=genome)
if variant in variants:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to go through all of variants each time, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opposed to caching the output? Yeah, I wanted to try to simplify by not caching this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe you mean that I'm iterating through the VariantCollection on each check; good point.

@arahuja
Copy link
Contributor

arahuja commented Jun 28, 2016

My thoughts on this are I'm not sure it's worth building in the isovar filtering as special case to the load_* functions and leaving it as something to do after the fact or making a filter_fn that could handle it.

@tavinathanson
Copy link
Member Author

@arahuja curious why not?

@tavinathanson
Copy link
Member Author

We spoke offline; @arahuja clarified that the cohorts.varcode_utils functions are more general than I realized, so I can indeed plop this logic in there.

@tavinathanson
Copy link
Member Author

@arahuja I'm not entirely clear on the best way to do this, since my variant_expressed_filter and effect_expressed_filter need more than variant/effect and variant_metadata (the cohort, patient, variant_type, merge_type). I could put variant_expressed_filter inside a count function and make use of the closure, but that's not easily passed to load_variants.

Perhaps I should replace variant_metadata with heavier-weight object that has the VariantCollection metadata as well as cohort, patient, etc.?

@arahuja
Copy link
Contributor

arahuja commented Jun 29, 2016

Perhaps I should replace variant_metadata with heavier-weight object that has the VariantCollection metadata as well as cohort, patient, etc.?

Perhaps? I think needing cohort would be odd, but it somewhat shows that some of the load_single functions should probably be pushed into Patient

@tavinathanson
Copy link
Member Author

@arahuja I agree re pushing more functions to Patient; for another time, though :)

@coveralls
Copy link

coveralls commented Jul 9, 2016

Coverage Status

Coverage increased (+1.0%) to 60.171% when pulling 3e4d5f8 on expressed_mutations into 5b1eaf6 on master.

@coveralls
Copy link

coveralls commented Jul 9, 2016

Coverage Status

Coverage increased (+1.0%) to 60.171% when pulling 3e4d5f8 on expressed_mutations into 5b1eaf6 on master.

@coveralls
Copy link

coveralls commented Jul 10, 2016

Coverage Status

Coverage decreased (-0.5%) to 60.57% when pulling 9747282 on expressed_mutations into fac5c92 on master.

@coveralls
Copy link

coveralls commented Jul 10, 2016

Coverage Status

Coverage decreased (-0.5%) to 60.57% when pulling 769a0a9 on expressed_mutations into fac5c92 on master.

@tavinathanson
Copy link
Member Author

12 days later, some updates to this PR:

  • Create FilterableVariant, FilterableEffect, etc. as a way to cleanly specify filter_fns as single-parameter functions. Also removes the need for neoantigen_qc_filter vs. variant_qc_filter since FilterableNeoantigen is a FilterableVariant.
  • Use the above to handle expression filtering instead of adding only_expressed everywhere.
  • Fix Spacing of new functions is off #73.
  • Fix VariantCollection metadata is lost after filtering #66.
  • Move merge_type and variant_type into Cohort so they aren't passed around to a million methods.
  • Add additional parameters to variant_qc_filter.

Back to you @arahuja

@coveralls
Copy link

coveralls commented Jul 10, 2016

Coverage Status

Coverage decreased (-0.5%) to 60.57% when pulling a043188 on expressed_mutations into fac5c92 on master.

@coveralls
Copy link

coveralls commented Jul 11, 2016

Coverage Status

Coverage decreased (-0.4%) to 60.646% when pulling 709db27 on expressed_mutations into fac5c92 on master.

@coveralls
Copy link

coveralls commented Jul 12, 2016

Coverage Status

Coverage decreased (-0.4%) to 60.621% when pulling 126bda4 on expressed_mutations into fac5c92 on master.

@@ -215,10 +223,16 @@ def __init__(self,
join_how="inner",
check_provenance=False,
polyphen_dump_path=None,
pageant_coverage_path=None):
pageant_coverage_path=None,
variant_type="snv",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is a cohort level property now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it became cumbersome and wordy to pass it around everywhere. Agree it's a little weird--

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think I prefer the alternative, but also see the issue that you can pass one to load_variants and a different set to load_effects which could be confusing, so I'm fine with this

@coveralls
Copy link

coveralls commented Jul 13, 2016

Coverage Status

Coverage decreased (-0.4%) to 60.621% when pulling 777bf4d on expressed_mutations into fac5c92 on master.

@coveralls
Copy link

coveralls commented Jul 13, 2016

Coverage Status

Coverage decreased (-0.5%) to 60.714% when pulling 946eab4 on expressed_mutations into e09195b on master.

@tavinathanson tavinathanson merged commit 130b155 into master Jul 13, 2016
@tavinathanson tavinathanson deleted the expressed_mutations branch July 13, 2016 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants