Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spill trees #747

Merged
merged 84 commits into from Aug 18, 2016
Merged

Spill trees #747

merged 84 commits into from Aug 18, 2016

Conversation

MarcosPividori
Copy link
Contributor

Hi @sumedhghaisas @rcurtin,

I have implemented Spill trees with axis-parallel splitting hyperplanes. I have made an effort to avoid duplicating existing code for neighbor search.

I created a new class SpillSearch that provides an interface similar to NeighborSearch but with an extension to properly set the tau parameter. It encapsulates an instance of NeighborSearch.

Also, I have implemented a new version of NeighborSearchRules specialized for Spill Trees, because I needed to modify the methods:

  • Score() to consider splitting hyperplanes for overlapping nodes.
  • CalculateBounds() to ignore B_2 bound (we can not use B_2 bound for Spill Trees).

Single Tree Search:

The SingleTreeTraverser is similar to the implementation for BinarySpaceTree.
The difference is in the implementation of NeighborSearchRules for SpillTrees.
When calculating the score of a query point and a reference node I consider 2 cases:

  • If the reference node is non-overlapping, I calculate the score the same than before.
  • If the reference node is overlapping, I analyze the reference node's half space. If it contains the given query point, I return 0 (best score). Else, I return DBL_MAX (prune).

Dual Tree Search:

The Query tree is built without overlapping.

When calculating the score of a query node and a reference node, I consider 2 cases:

  • If the reference node is a non-overlapping node, I calculate the score the same as before.
  • If the reference node is a overlapping node, I analyze query node's bounding box. If it intersects the reference node's half space, I return 0 (best score). Else, I return DBL_MAX (prune).

The DualTreeTraverser is slightly different to the implementation for BinarySpaceTree. When referenceNode is a overlapping node and we can't decide which child node to traverse, this means that queryNode is at both sides of the splitting hyperplane, we analyse the queryNode:

  • If queryNode is a non-leaf node, I recurse down the query node.
  • If queryNode is a leaf node, I do single tree search for each point in the query node.

The DualTreeTraverser is faster than the SingleTreeTraverser. Specially when the value of tau increases, because we will have more non-overlapping nodes which implies more time involved in backtracking.

The extension was incorporated to existing mlpack_knn. With actual implementation, we can use "-t spill" to consider spill trees and "--tau 0.1" to set different values for the overlapping size (default value is tau=0).

Every feedback is welcome! :)

Thanks

…borSearch class, and adds the functionality to deal with spill trees.
if (dataset->n_cols > 0)
// Fill points with all possible indexes: 0 .. (dataset->n_cols - 1).
points = arma::linspace<arma::Col<size_t>>(0, dataset->n_cols - 1,
dataset->n_cols);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just using regspace() is the better thing to do here, but I'm indifferent either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcurtin,
I am facing an error: error: ‘regspace’ is not a member of ‘arma'
Maybe it was included in a very recent version of armadillo...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, regspace() was introduced in 6.700, so let's not worry about that now. Maybe someday when the minimum supported Armadillo version is that or higher we can make all the changes, but it is too much work to backport it for such a minor reason.

@MarcosPividori
Copy link
Contributor Author

Hi @rcurtin,
I have modified the SpillTree's Traversers. Now they take a template boolean parameter Defeatist, that determines if the traverser must consider defeatist search on overlapping nodes. In the commits:
0267acb
70fbeab
b334674

@rcurtin rcurtin mentioned this pull request Aug 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants