Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neighbor search: Fix error in adjustedScore calculation. #590

Merged
merged 1 commit into from Mar 23, 2016

Conversation

MarcosPividori
Copy link
Contributor

Hi,
Reading the code of nearest neighbor search, I noticed an error.
I hope this is useful!

@MarcosPividori MarcosPividori changed the title Fix error in adjustedScore calculation. Neighbor search: Fix error in adjustedScore calculation. Mar 23, 2016
@rcurtin
Copy link
Member

rcurtin commented Mar 23, 2016

Hi Marcos,

You are right, this is an error. I think that this problem could, in some bizarre edge cases, cause an invalid prune and cause the results of kNN to be incorrect. But I did not try to set up a dataset where the issue was actually caused.

This commit will possibly slow things down in some situations, but at least for the corel and covertype datasets, there is no difference in the number of prunes or number of base cases.

It might be interesting to look into whether or not the adjusted score prune on line 234 is even useful for kd-trees. I wonder if removing it entirely would accelerate kNN when kd-trees (and R/R* trees and ball trees) are used? Anyway, if you want, feel free to investigate. I unfortunately don't have time for that.

I know you only changed something like eight characters, but if you'd like to add your name to the list of contributors in src/mlpack/core.hpp and COPYRIGHT.txt, do feel free. My requirement for inclusion on that list is "changed at least one character". :)

@rcurtin rcurtin merged commit e3b4189 into mlpack:master Mar 23, 2016
@MarcosPividori
Copy link
Contributor Author

Hi Ryan! Thanks for your response!

I don't think it was causing invalid prunes before, in the other hand, it was doing less tighter pruning.
This commit means a higher value of adjustedScore for nearest neighbor policy and lower for farthest neighbor, resulting in a tighter bound.

Yes, it would be interesting to check how effective this prunning rule is with different trees. I will investigate this as soon as I finish the GSoC proposal.

Just appearing in the github list of contributors is more than enough for me! :)

@rcurtin
Copy link
Member

rcurtin commented Mar 24, 2016

You're right, I had it backwards. The CalculateBound() function is pretty complex. That part of the code is the weird "let's try to prune the node combination without ever calculating the distance between the nodes, by using the cached parent distances", but there are lots of weird edge cases there.

That idea originally came from John Langford's cover tree code, but that calculation turns out to be a lot easier for the cover tree case. Generalizing it, like is done here, gets a lot more complicated...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants