Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collapsing the DAG #49

Open
marybarker opened this issue Apr 10, 2023 · 2 comments
Open

collapsing the DAG #49

marybarker opened this issue Apr 10, 2023 · 2 comments

Comments

@marybarker
Copy link
Contributor

marybarker commented Apr 10, 2023

The matOptimize routine requires that the input tree is collapsed. That is, that every edge has at least one mutation(in the case where an edge does not have mutations, it can be collapsed by replacing the edge and its parent and child nodes with a single multifurcating node). This is currently accomplished by calling reassign_states on the sampled tree and merging it into the DAG before passing to matOptimize. The reassign_states routine computes ancestral sequences that are parsimony-optimal for the given topology, and also collapses all of the edges that do not carry mutations)

Thus, in practice, there will be redundant information in the DAG. Both the pre- and post-collapsed copies of trees are included. So if we implement collapsing directly on the DAG, we will reduce the amount of information stored and also reduce the number of changes returned with reassign_states

The collapse routine implemented in the histordag project is, in practice, very efficient, and would perhaps be helpful to add as a step, either performed after each iteration of sample-optimize-merge, or else at some regular interval.

@willdumm
Copy link
Contributor

willdumm commented Apr 20, 2023

We may want to implement collapsing on the whole DAG, like we have in the Python version.

However, a significant source of uncollapsed edges will likely be the fragment merging process described in #43 (here I'll use terminology from the file fitch_set_pseudocode.cpp

To fix this, we should implement a collapsing operation on move fragments before merging them into the DAG. However, this is slightly more complicated than just collapsing edges without mutations on them, because such an operation can possibly change LeafSets of the fragment's anchor nodes.

Here's a proposed method for correctly collapsing fragments, and guaranteeing that their anchor nodes are still in the DAG:

(In the following figures, nodes and edges in black are in a fragment, and nodes and edges in pink are in the hypothetical tree but not in the fragment. The edge with a double line drawn through it is the edge to be collapsed)

For each edge in the fragment, do the following:

Case 1 If the edge has mutations (new parent CG != new child CG), or if the edge's parent node is the UA node, or if the edge's child node is a leaf node, do nothing (the edge is ineligible for collapse)

Case 2 If the edge has no mutations and is not incident to an anchor node, it can be collapsed, so that the child edges of the child node are added to the parent node (changing its LeafSet) and the child node is deleted from the fragment:

image

Case 3 If the edge has no mutations and its child node is a non root anchor which is not a leaf, then all the children of that non root anchor node should be added to the fragment (they will be non root anchor nodes after collapsing the edge), so that the edge to be collapsed no longer is adjacent to an anchor node, and can be collapsed as in case 2:
image

Case 4 If the edge has no mutations and its parent node is a root anchor node which is not the UA node, then all of the children of the edge's parent (other than the edge's child, which is already in the fragment) should be added to the fragment. Also, the edge's parent node's parent node and edge should be added to the fragment. This parent becomes the fragment's new root anchor node. Since the edge to be collapsed is no longer incident to any anchor node, it can be collapsed as in Case 2:
image

In all cases, the nodes which are added before collapsing can be given the same CGs and LeafSets that they had in the tree before the move was applied.

I'm pretty sure Case 4 and Case 3 are mutually exclusive, since if an edge were incident to both a non root and a root anchor node, that edge and those two nodes would be the entire fragment. However, any SPR move results in nodes with changed leaf sets, so there must be at least one non-anchor node in any fragment.

@marybarker
Copy link
Contributor Author

Note that we have now implemented a partial collapsing on the DAG that gets called in the sample-optimize-merge loop. This is not an algorithm that collapses all of the empty edges in the DAG, but it does insure that no new empty edges are added to the DAG.

The default larch-usher implementation now calls a routine CollapseEmptyFragmentEdges after applying an SPR move to the sampled tree, and this ensures that each optimized tree that is merged into the DAG has collapsed edges. In order to disable this, you can run larch-usher with the command line argument --keep-fragment-uncollapsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants