Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutation class #57 #63

Merged
merged 40 commits into from Dec 9, 2022
Merged

Mutation class #57 #63

merged 40 commits into from Dec 9, 2022

Conversation

ericmjl
Copy link
Contributor

@ericmjl ericmjl commented Sep 26, 2022

This PR is ready for review.

In this PR, we add in Mutation and MutationSet classes. The intent is to represent mutations made to a SeqLike. I've taken inspiration from multiple places, but the biggest one has been the discussion on #57.

The use cases for this Mutation class are primarily two-fold:

  1. Adding them to a SeqLike returns another mutated SeqLike.
  2. Subtracting (diffing) a SeqLike from another yields a MutationSet.

While working through use case 2, I noticed how it's a bit tricky:

  • adding a Mutation or MutationSet to a SeqLike results in a new SeqLike,
  • but subtracting the new SeqLike from the original SeqLike might not necessarily result in the original MutationSet,
  • yet adding the new MutationSet to the original SeqLike will give back a SeqLike identical to the new SeqLike.

An example of this phenomena is shown in a new notebook, docs/notebooks/mutations.ipynb. For reviewing purposes, that is probably the go-to notebook for understanding what Mutation and MutationSet classes can do; the rest are implementation details.

Would love to get feedback on this PR -- especially if there are semantics that I haven't yet thought of.


TODO list of what's left:

  • FIX TEST: SeqLike classes have two more class methods. Therefore, 77 is correct. Reference here.
  • Add .positions class method, which returns a list of positions to mutate.
  • Switch out magical_parse() for __new__() under Mutation.

seqlike/SeqLike.py Outdated Show resolved Hide resolved
seqlike/Mutation.py Outdated Show resolved Hide resolved
@ericmjl
Copy link
Contributor Author

ericmjl commented Oct 5, 2022

I am going to do a few more commits to address @andrewgiessel's remaining comments.

@ndousis
Copy link
Contributor

ndousis commented Oct 5, 2022

@ericmjl , this new mutation functionality is awesome, and the jupyter notebook is very helpful in exploring it! The notation and "arithmetic" conventions are simple, clear and sensible.

The following behavior doesn't look right, however:

s1 = aaSeqLike("MKAILV")
ms2 = MutationSet(mutations="^2A;^4D;5-".split(";"))
str(s1 + ms2)

yields

MAAD-V

whereas I'd expect

MAKD-ILV

Perhaps this is related, but the following also gives an unexpected result:

s1 - (s1 + Mutation("5-"))

yields

[5V, 6-]

instead of

[5-]

@ericmjl
Copy link
Contributor Author

ericmjl commented Nov 4, 2022

@ndousis thanks for the feedback!

The behaviour you've outlined is one that I've been thinking about w.r.t. semantics. The essence is that there is an incrementing of position numbers every time that we have an insertion, so that any insertions are preserved relative to the original numbering of the sequence. In the example you showed, on the latest commit ffdb0bc, I get the behaviour defined by the aforementioned semantics:

image

Do you think it'll help to jump on a call with you to see if the current semantics make sense? If so, could you email me at my work address (intentionally not posting here to avoid spam)? I'll send you a calendly link that we can use to schedule.

@ericmjl
Copy link
Contributor Author

ericmjl commented Nov 4, 2022

@ndousis and @andrewgiessel this PR is ready for more reviews. I also hope that we can get official "approvals" on the PR before merging, so that we leave a record.

seqlike/Mutation.py Outdated Show resolved Hide resolved
Copy link
Contributor

@andrewgiessel andrewgiessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, pending final discussion/clarification on MutationSet Semantics + resulting documentation

@ndousis
Copy link
Contributor

ndousis commented Dec 7, 2022

@ericmjl , I think the current semantics make sense, and they do yield stable solutions. I was troubled by the asymmetry of cases like s1 - (s1 + Mutation("5-")), but as you've pointed out in a previous version of mutations.ipynb, consistent annotation of indels is difficult without pairwise alignment, which in itself is inconsistent.

Mutation and MutationSet are very thoughtful and useful new features in SeqLike. Approved!

@ericmjl
Copy link
Contributor Author

ericmjl commented Dec 8, 2022

Thank you, @andrewgiessel and @ndousis! I am going to merge and cut a new release once the CI passes!

@ericmjl ericmjl merged commit d8b98ab into main Dec 9, 2022
@ericmjl ericmjl deleted the mutation-class-#57 branch December 9, 2022 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants