New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spill trees #747
Spill trees #747
Conversation
…fer in the bounds of both nodes.
…cluded in single tree traverser).
…tion for Spill Trees.
overlapping points.
…borSearch class, and adds the functionality to deal with spill trees.
267265b
to
ac5f836
Compare
…ic SingleTreeTraverser.
if (dataset->n_cols > 0) | ||
// Fill points with all possible indexes: 0 .. (dataset->n_cols - 1). | ||
points = arma::linspace<arma::Col<size_t>>(0, dataset->n_cols - 1, | ||
dataset->n_cols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just using regspace()
is the better thing to do here, but I'm indifferent either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rcurtin,
I am facing an error: error: ‘regspace’ is not a member of ‘arma'
Maybe it was included in a very recent version of armadillo...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, regspace()
was introduced in 6.700, so let's not worry about that now. Maybe someday when the minimum supported Armadillo version is that or higher we can make all the changes, but it is too much work to backport it for such a minor reason.
…ided in exact neighbor search.
ab7fc2a
to
95cee1b
Compare
95cee1b
to
b1530ea
Compare
6426e54
to
815391b
Compare
Hi @sumedhghaisas @rcurtin,
I have implemented Spill trees with axis-parallel splitting hyperplanes. I have made an effort to avoid duplicating existing code for neighbor search.
I created a new class
SpillSearch
that provides an interface similar toNeighborSearch
but with an extension to properly set the tau parameter. It encapsulates an instance ofNeighborSearch
.Also, I have implemented a new version of
NeighborSearchRules
specialized for Spill Trees, because I needed to modify the methods:Score()
to consider splitting hyperplanes for overlapping nodes.CalculateBounds()
to ignore B_2 bound (we can not use B_2 bound for Spill Trees).Single Tree Search:
The
SingleTreeTraverser
is similar to the implementation forBinarySpaceTree
.The difference is in the implementation of
NeighborSearchRules
for SpillTrees.When calculating the score of a query point and a reference node I consider 2 cases:
Dual Tree Search:
The Query tree is built without overlapping.
When calculating the score of a query node and a reference node, I consider 2 cases:
The
DualTreeTraverser
is slightly different to the implementation forBinarySpaceTree
. When referenceNode is a overlapping node and we can't decide which child node to traverse, this means that queryNode is at both sides of the splitting hyperplane, we analyse the queryNode:The
DualTreeTraverser
is faster than theSingleTreeTraverser
. Specially when the value of tau increases, because we will have more non-overlapping nodes which implies more time involved in backtracking.The extension was incorporated to existing mlpack_knn. With actual implementation, we can use "-t spill" to consider spill trees and "--tau 0.1" to set different values for the overlapping size (default value is tau=0).
Every feedback is welcome! :)
Thanks