-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it bug in searching ? #45
Comments
I looked at the code but I'm not sure if it's a bug... the heuristic In your case I'm not surprised most of the trees aren't visited because you have so many trees per node. Annoy keeps a priority queue of the most promising nodes (initialized to the roots) If n is much smaller than K I see why it doesn't visit all the trees though. Either way the solution is probably to change the |
Also you really shouldn't need 500 trees for 2000 nodes. It should be enough to have 5-10 trees |
Thanks erikbern! I agree with your quick suggestions by now ^_^ |
See #80 – can now specify at query time now many nodes to inspect! |
in order to implement it for my own use, I read the source code.
I guess there might be a logic bug -- unless I understand the idea in a wrong way..
I will try to explain it step by step.
during the _make_tree() step, a memory optimisation is used for nodes with n_descendants <= K, which means a sub-tree with only leave nodes can have a size as large as K.
_get_all_nns(), vector nns is used to store all potential candidates, and its size is limited as k * number_of_trees. where k is number of nearest neighbours to report.
at _get_all_nns(), the search for a tree stops when it meets a internal node with only leave nodes, and all of these leave nodes are added to vector nns.
the problem is that vector nns quickly reaches it size limitation after searching for a few trees, and the majority of the forest is not visited.
In my test example, I build 500 trees for 2000 nodes, vectors has dimension of 400, and I want to find top 5 NN. _get_all_nns() only visits 10 trees and stops, leaving the remaining 490 trees untouched.
Maybe that explains why annoy works faster than many other programs...
I am not totally convinced that it is a heuristic instead of a bug in the program...
Any comments are helpful.
The text was updated successfully, but these errors were encountered: