Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LeafSets directly to the fragment #73

Open
marybarker opened this issue Jan 9, 2024 · 0 comments
Open

Add LeafSets directly to the fragment #73

marybarker opened this issue Jan 9, 2024 · 0 comments

Comments

@marybarker
Copy link
Contributor

We would like to avoid recomputing LeafSets for every fragment that is merged into the DAG. In order to do this, we will add Deduplicate<LeafSets> to the sampled trees that we pass to matOptimize.

In addition to the storage, we will add functionality to the MATConversion routines BuildFromMAT and BuildMAT that will track these optional LeafSets. These routines will have to deal separately with condensed nodes. Fortunately, we only condense leaf nodes, and so the uncondensed nodes' LeafSets consist of the corresponding SampleIds.

So for the fragment, each HypotheticalNode now will contain a LeafSets attribute.

There is a natural place to update LeafSets In ApplyMoveImpl() where we change the LeafSets of all of the altered nodes as in the following pseudocode:

while true {
    current_node = src_node.GetParent().GetSingleParent();
    if (not current_node.IsMoveNew()) {
        LeafSets = {clades_difference(ls, clades_union(src_node.GetLeafSet()) for ls in current_node.GetLeafSets()}
        current_node = current_node.GetParent().GetSingleParent();
    } else {
         LeafSets = LeafSet{clades_union(src_node.GetLeafSet()), clades_union(dst_node.GetLeafSet())};
    }
    if (current_node == LCA) {
        break;
   }
}

while true {
    current_node = dst_node.GetParent().GetSingleParent();
    if (not current_node.IsMoveNew()) {
        LeafSets = {clades_union(ls, clades_union(src_node.GetLeafSet())) for ls in current_node.GetLeafSets() if ls.Contains(clades_union(dst_node.GetLeafSets())))}
        current_node = current_node.GetParent().GetSingleParent();
    } else {
        LeafSets = LeafSet{clades_union(src_node.GetLeafSet()), clades_union(dst_node.GetLeafSet())};
    }
    if (current_node == LCA) {
        break;
   }
}

However, this is not optimal in the default setting, where we allow collapsing. When we collapse edges following an applied move, the leafsets of the remaining uncollapsed nodes will be altered, and so must be recomputed from scratch, and hence the previous calculation should be suppressed in the case of collapsing.

We will add a routine to call at the end of CollapseEmptyFragmentEdges, which will update the LeafSets of the nodes in a postorder traversal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant