Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NNI Evaluation Engine #377

Merged
merged 1 commit into from
Mar 3, 2022
Merged

NNI Evaluation Engine #377

merged 1 commit into from
Mar 3, 2022

Conversation

davidrich27
Copy link
Collaborator

@davidrich27 davidrich27 commented Nov 1, 2021

Description

NNI Engine and Grafted DAG

  • Grafted DAG: This allows modifying the SubsplitDAG with the quick adding and removing of proposed NNIs to DAG.

  • NNI Engine: This is an engine for using Nearest Neighbor Interchanges to perform a systematic search on a DAG.

Closes #372

Tests

  • TEST_CASE("NNI Engine: Adjacent NNI Maintenance")
    • This test builds a DAG, tests if engine generates the same set of adjacent NNIs and manually created set. Then adds a node pair to DAG, and tests if engine updates correctly.
  • TEST_CASE("NNI Engine: Add NNI Test")
  • TEST_CASE("NNI Engine: AddNodePair vs GraftNodePair")
    • This compares DAGs after adding NNIs via AddNodePair vs GraftNodePair.

Checklist:

  • Code follows our detailed contribution guidelines
  • clang-format has been run
  • TODOs have been eliminated from the code
  • Comments are up to date, document intent, and there are no commented-out code blocks

@davidrich27 davidrich27 linked an issue Nov 10, 2021 that may be closed by this pull request
@davidrich27 davidrich27 force-pushed the 372-nni-evaluation-engine branch 2 times, most recently from 249b84d to 03182a2 Compare February 4, 2022 17:35
@davidrich27 davidrich27 force-pushed the 372-nni-evaluation-engine branch 3 times, most recently from 8e8a2ce to 1bda563 Compare February 10, 2022 05:28
@davidrich27
Copy link
Collaborator Author

TEST_CASE("NNI Engine: Add NNI Test")

This test builds several SubsplitDAGs: dag_A_1, dag_A_2, dag_A_2b, and dag_B.
   - dag_B contains the all the subsplit pairs in all dag_A variants.
   - dag_A_1 and dag_A_2 contain all the same subsplit nodes except two subsplits
   from each dag, which are NNIs of one another.
   - dag_A_2 and dag_A_2b contain all the same subsplit nodes, but have two different
   taxon mappings, due to the order of subsplit insertion, which is based on the
   ordering of taxons in the Newick tree file.
- Tests: Checks DAGs for the DAG equivalency and PerGPCSP_Likelihood equivalency.
   - dag_A_1 is tested for self equivalence.
   - dag_A_1 and dag_A_2 are test for NONequivalence.
   - dag_A_2 and dag_A_2b are tested for equivalence.
   - dag_A_1 and dag_B are tested for NONequivalence before adding pair_1 to dag_A_1.
   - dag_A_1 and dag_B are tested for equivalence after adding pair_1 to dag_A_1.
   - dag_A_2 and dag_B are tested for equivalence after adding pair_2 to dag_A_2.
   - dag_A_2b and dag_B are tested for equivalence after adding pair_2 to dag_A_2b.

@davidrich27 davidrich27 force-pushed the 372-nni-evaluation-engine branch 2 times, most recently from 327dd44 to 5604219 Compare February 11, 2022 06:32
davidrich27 added a commit that referenced this pull request Feb 11, 2022
* Grafted DAG: This allows modifying the SubsplitDAG with the quick
adding and removing of proposed NNIs to DAG.

* NNI Engine: This is an engine for using Nearest Neighbor Interchanges to perform a systematic search on a DAG.
  * Finds all NNIs adjacent to DAG.
  * Adds all NNIs to DAG via Graft.
  * Evaluates likelihood of NNI. (Unfinished - left to Issue #405 and #404)
  * Filter NNIs to accept or reject. (Unfinished - left to Issue #405)
  * Add accepted NNIs to DAG.
  * Remove adjacent NNIs from Graft and Add back NNIs adjacent to new Accepted NNIs. Return to Step c).

    Closes #372
@davidrich27 davidrich27 marked this pull request as ready for review February 11, 2022 06:44
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks here, but mostly tomorrow I'd like to discuss the overall design-- who owns what and who has a pointer or reference to what. Could you come with a little diagram describing that (just a photo of a hand-scribble would be great).

Also, it feels like there is a lot of code duplication with things like IterateOverLeafwardEdges. I wonder if this could be reduced. We'll chat!

CMakeLists.txt Outdated Show resolved Hide resolved
src/bitset.hpp Outdated Show resolved Hide resolved
src/gp_doctest.cpp Outdated Show resolved Hide resolved
src/gp_doctest.cpp Outdated Show resolved Hide resolved
src/gp_doctest.cpp Outdated Show resolved Hide resolved
src/subsplit_dag.cpp Show resolved Hide resolved
src/subsplit_dag.cpp Outdated Show resolved Hide resolved
src/subsplit_dag_graft.hpp Outdated Show resolved Hide resolved
src/subsplit_dag_graft.hpp Outdated Show resolved Hide resolved
src/subsplit_dag_graft.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @davidrich27 ! Some things I'm looking forward to discussing today.

CMakeLists.txt Outdated Show resolved Hide resolved
src/bitset.hpp Outdated Show resolved Hide resolved
src/gp_doctest.cpp Outdated Show resolved Hide resolved
src/nni_evaluation_engine.hpp Outdated Show resolved Hide resolved
src/gp_doctest.cpp Show resolved Hide resolved
src/subsplit_dag_nni.hpp Outdated Show resolved Hide resolved
src/subsplit_dag_nni.hpp Show resolved Hide resolved
src/subsplit_dag_nni.hpp Outdated Show resolved Hide resolved
src/subsplit_dag_nni.hpp Outdated Show resolved Hide resolved
src/subsplit_dag_node.hpp Outdated Show resolved Hide resolved
src/graft_dag.hpp Outdated Show resolved Hide resolved
src/graft_dag.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to move to something else, but will come back soon to look at the cpp files.

Have we picked a naming convention with regards to left and leftside, etc? I'm sure you have something but clarifying it for me would help.

src/gp_doctest.cpp Outdated Show resolved Hide resolved
src/gp_doctest.cpp Outdated Show resolved Hide resolved
src/nni_engine.cpp Outdated Show resolved Hide resolved
src/graft_dag.hpp Outdated Show resolved Hide resolved
src/nni_operation.hpp Outdated Show resolved Hide resolved
src/subsplit_dag_node.hpp Outdated Show resolved Hide resolved
src/subsplit_dag_node.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.hpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A factoring point... "pushing" before meeting.

src/nni_engine.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partway through graft_dag

src/nni_engine.hpp Outdated Show resolved Hide resolved
src/subsplit_dag.cpp Outdated Show resolved Hide resolved
src/subsplit_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finishing my pass.

src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
src/graft_dag.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fix needed.

I think you'd notice these things if you looked at the diff after pushing.

src/subsplit_dag.cpp Outdated Show resolved Hide resolved
davidrich27 added a commit that referenced this pull request Feb 28, 2022
* Grafted DAG: This allows modifying the SubsplitDAG with the quick
adding and removing of proposed NNIs to DAG.

* NNI Engine: This is an engine for using Nearest Neighbor Interchanges to perform a systematic search on a DAG.
  * Finds all NNIs adjacent to DAG.
  * Adds all NNIs to DAG via Graft.
  * Evaluates likelihood of NNI. (Unfinished - left to Issue #405 and #404)
  * Filter NNIs to accept or reject. (Unfinished - left to Issue #405)
  * Add accepted NNIs to DAG.
  * Remove adjacent NNIs from Graft and Add back NNIs adjacent to new Accepted NNIs. Return to Step c).

    Closes #372
davidrich27 added a commit that referenced this pull request Mar 3, 2022
* Grafted DAG: This allows modifying the SubsplitDAG with the quick
adding and removing of proposed NNIs to DAG.

* NNI Engine: This is an engine for using Nearest Neighbor Interchanges to perform a systematic search on a DAG.
  * Finds all NNIs adjacent to DAG.
  * Adds all NNIs to DAG via Graft.
  * Evaluates likelihood of NNI. (Unfinished - left to Issue #405 and #404)
  * Filter NNIs to accept or reject. (Unfinished - left to Issue #405)
  * Add accepted NNIs to DAG.
  * Remove adjacent NNIs from Graft and Add back NNIs adjacent to new Accepted NNIs. Return to Step c).

    Closes #372
Copy link
Collaborator

@matsen matsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work. Some nitpicking as usual, but trying to form habits here.

Do I understand correctly that the complexity of adding every NNI as a graft and then removing it is O(N^2), because we have to traverse the DAG, and then we also have a linear time addition process needing to search for parents and children?

This may be kind of big for dags with 100K nodes. Down the road we may want to consider keeping more information in the NNIs so that we can just reconnect their edges without needing a linear search.


#pragma once

class GPDAG;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this forward declaration necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was needed before the PLVHandler. I'll remove it.

// Get the GPCSP/edge index by its parent/child pair.
size_t GetEdgeIdx(const Bitset &parent_subsplit, const Bitset &child_subsplit) const;
size_t GetEdgeIdx(const size_t parent_id, const size_t child_id) const;
// Get a sorted vector of all node subsplit's bitset representation. Optionally only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Get a sorted vector of all node subsplit's bitset representation. Optionally only
// Get a sorted vector of all node subsplit bitset representations. Optionally only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Get a sorted vector of all node subsplit's bitset representation. Optionally only
// graft nodes, or graft and host nodes.
BitsetVector GetSortedVectorOfNodeBitsets(bool include_host = true) const;
// Get a sorted vector of all edge PCSP's bitset representation. Optionally only graft
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Get a sorted vector of all edge PCSP's bitset representation. Optionally only graft
// Get a sorted vector of all edge PCSP bitset representations. Optionally only graft

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// DAG that the graft is proposed to be connected to.
SubsplitDAG &host_dag_;
// Nodes and edges in the graft.
SubsplitDAGStorage storage_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for calling this graft_storage_ so it's obvious that it only stores the grafts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed name.

// Nodes and edges in the graft.
SubsplitDAGStorage storage_;
// Duplicate nodes from host for edge bridging across graft.
// std::vector<std::unique_ptr<SubsplitDAGNode>> bridge_nodes_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cruft?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Removed.

// Get Map of Adjacent NNIs with their score.
const std::map<NNIOperation, double> &GetScoredNNIs() { return scored_nnis_; };
// Get number of runs of NNI engine.
size_t GetRunCount() { return run_count_; };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Get Number of Accepted NNIs.
size_t GetAcceptedNNICount() const { return accepted_nnis_.size(); };
// Get Map of Adjacent NNIs with their score.
const std::map<NNIOperation, double> &GetScoredNNIs() { return scored_nnis_; };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return has_invalid_node;
size_t correct_id = 0;
for (auto node : storage_.GetVertices()) {
if (correct_id++ != node.Id()) return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current style guideline says the body of any conditional gets wrapped in braces. We can change that if you feel strongly!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was from some rebasing by Ognian. I am happy with or without braces. We could ask Ognian if he has an opinion tomorrow.

@@ -27,6 +27,14 @@ template <typename T>
auto& GetStorage(const T& node) {
return node.node_;
}
static inline void RemapNeighbors(NeighborsView neighbors,
const SizeVector& node_reindexer) {
std::map<VertexId, LineId> remaped;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::map<VertexId, LineId> remaped;
std::map<VertexId, LineId> remapped;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -222,6 +204,8 @@ class GenericNeighborsView {
return result;
}

void Remap(const T& neighbors) { neighbors_ = neighbors; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the clearest name for this method? It seems more like a SetNeighbors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Name changed.

@davidrich27
Copy link
Collaborator Author

davidrich27 commented Mar 3, 2022

I believe it should be O(N) per NNI. But since the number of NNIs is O(N), I think the whole process is roughly O(N^2). That's what you're saying?

But yes, currently it is a linear search to find the parents/children. I think I may have proposed this to you earlier, but we could make this constant time operation by using a clade map. Essentially, we could store each subsplit's left and right clade in a map, and the unions of their clades into a map, paired with a vector of node ids. A subsplit's parents have a single clade equal to your union clade, and a subplit's children have a union clade equal to one of your single clades. So each would just require a single lookup.

* Grafted DAG: This allows modifying the SubsplitDAG with the quick
adding and removing of proposed NNIs to DAG.

* NNI Engine: This is an engine for using Nearest Neighbor Interchanges to perform a systematic search on a DAG.
  * Finds all NNIs adjacent to DAG.
  * Adds all NNIs to DAG via Graft.
  * Evaluates likelihood of NNI. (Unfinished - left to Issue #405 and #404)
  * Filter NNIs to accept or reject. (Unfinished - left to Issue #405)
  * Add accepted NNIs to DAG.
  * Remove adjacent NNIs from Graft and Add back NNIs adjacent to new Accepted NNIs. Return to Step c).

    Closes #372
@davidrich27 davidrich27 merged commit 5aaebe4 into main Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluate NNI by GP Criterion
3 participants