-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate leaf sequences from Fitch sets #44
Comments
EDIT 1/13/23: EDIT 1/11/23: So we need a way to take an SPR move and build a tree fragment (like those in #43) summarizing all the changes (to CG's) that result from applying that SPR move to the original tree (sampled from the DAG). Here's a description of how to do this in different words than Mary's that's still kind of vague: Part One:Although we don't want to actually apply the SPR move to a copy of the original tree, we could use a class storing references to the original tree and the SPR move data, to encapsulate the logic involved in getting node neighbors and other node data in the hypothetical tree resulting from applying that SPR move. I'll call this the
The mapping between nodes in this hypothetical tree and the original MAT can be via the node IDs, like in this picture: The highlighted nodes, which are on a path from the source node to the target node, are exactly the nodes whose LeafSets will be changed by the SPR move, so the mapping between the two trees on those nodes doesn't fully make sense. However, for the nodes which aren't highlighted, the natural mapping via node IDs is the one we want. Also, there may not be a mapping for the parent of the target node (the red node in the picture). Part Two:Define a recursive function BuildFragment( Definition of BuildFragment(
* How to decide if a node is an anchor?Besides the anchor node which is a tree fragment's root (which we determine below when deciding which node to start BuildFragment on) a node is an anchor node (meaning it's guaranteed to be the same as a node in the DAG, and there are no changes in the tree below that node resulting from the SPR move) if the following are all true:
** Which sites do we need to iterate through?Normally, we need only iterate through the sites at which either the base on the parent node changed (relative to the parent node's CG before the SPR move), or at which there's a Fitch set change. However, there are exceptions:
Bringing it together:Now given a (unmodified) MAT being optimized, and a proposed SPR move on that MAT, the corresponding tree fragment should be what we get from BuildFragment( Here Finally, we can merge the resulting fragment into the DAG using #43. Things that need to be clearer here:
|
By the way, the boundary allele set is not used in the matOptimize implementation of the Fitch algorithm. It is just one by-product useful for quickly calculating the parsimony score change from a single move. |
One suggestion about applying Fitch set changes: |
matOptimize represents Fitch set as mutations internally and is not exported, but I will write about its omission conditions anyway.
|
Thanks Cheng for the really helpful meeting, and your ideas. Here's a summary of what we talked about (thanks Mary for the first draft of these)
|
This is true, but not all alternative alleles are represented, only whichever ones happen to have been chosen in a tree that was merged into the DAG. I think this could make your next suggestion (below) even simpler.
I think this is somewhat equivalent to what we're suggesting, although possibly simpler. I'll defer to @ognian- on whether we'd like to replace the tree fragment merging scheme with a more direct modification of the DAG. Either way, I'll reframe this in the context of an implementation of BuildFragment (updated above in the description of BuildFragment) |
Cheng provided some additional clarification about how fitch set changes are represented for an SPR move, in nodes_with_changed_major_alleles set. Some of the following is directly copied from our conversation, and some is edited slightly. For example, with this SPR with source node 7 and target node 9: there is a fitch set change on node 10, and on the new node which is colored red and has the new label 11.
Where the incremented_allele and decremented_allele values are one-hot-encoded subsets of {A,G,C,T}. The fix to properly present new node fitch sets as changes to the target node is here: Finally, in the case when the source node has only one sibling, its parent remains in the tree after the SPR move despite having only one child. We will of course want to change this to avoid unifurcation when building an SPR move's fragment. |
Implementation for #44 and #43 --------- Co-authored-by: Mary Barker <mbarker2@quokka.fhcrc.org> Co-authored-by: Will Dumm <wrhdumm@gmail.com> Co-authored-by: marybarker <marybarker103@gmail.com> Co-authored-by: David Rich <31897211+davidrich27@users.noreply.github.com> Co-authored-by: david.rich27 <david.rich@umotana.edu>
This is a beginning step toward implementing #6
Since the callback has access to the altered/updated nodes for a given move, we can fully resolve these nodes without using the entire tree if we use the Fitch set changes that the move provides access to through the
node_with_major_allele_set_change
vector.The Fitch algorithm is an alternative to the Sankoff algorithm, where Fitch sets are assigned to nodes instead of cost vectors.
Each site in the sequence for node
n
is assigned a Fitch set: a set containing the possible choices of base that minimize the subtree cost below that node.Fitch Algorithm
(for simplicity, written for a length-1 sequence, but generalizes to length-k sequence if we allow a Fitch set for each site in the sequence)
Notation: given a node
n
n
n
n
Then we assign Fitch sets to the nodes in a postorder traversal as follows:
n
, the Fitch set is the singleton set that containsn
n
:Using the Fitch sets and boundary allele sets, the sequences are assigned in a preorder traversal:
p
ofn
has base that is inn
equal to that base.n
to be that base.n
to be an element ofThe text was updated successfully, but these errors were encountered: