Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Other KNN Distance Metrics #26

Closed
jlevy44 opened this issue Jun 30, 2019 · 12 comments
Closed

Support for Other KNN Distance Metrics #26

jlevy44 opened this issue Jun 30, 2019 · 12 comments

Comments

@jlevy44
Copy link
Contributor

jlevy44 commented Jun 30, 2019

Could one compute the knn_graph using a cosine distance metric?

@rusty1s
Copy link
Owner

rusty1s commented Jun 30, 2019

You need to adjust the distance computation here. We could add a flag to support different distance metrics. Feel free to submit a PR if you would like to contribute on this one.

@jlevy44
Copy link
Contributor Author

jlevy44 commented Jul 1, 2019

Ok, will do. How does the knn computation introduced here compare to other ANN methods (say an approximated way to make this faster)?

@rusty1s
Copy link
Owner

rusty1s commented Jul 1, 2019

What do you mean with ANN methods?

@jlevy44
Copy link
Contributor Author

jlevy44 commented Jul 1, 2019

@rusty1s
Copy link
Owner

rusty1s commented Jul 1, 2019

Ah I see, well for CPU computation we fallback to scipy. The GPU version does parallelization over the number of examples (block parallelization) and number of nodes (thread parallelization), so it is only fast when using a fair amount of batched examples and when not having too many data points in each example (ideally around 1024).

@jlevy44
Copy link
Contributor Author

jlevy44 commented Jul 1, 2019

Ah makes sense. The data I am dealing with can range in size to order ~ 10k x 100k

GPU computed affinity does seem tractable if it can handle this size of data

@jlevy44
Copy link
Contributor Author

jlevy44 commented Jul 10, 2019

I forked it. If the goal is just to form an unweighted adjacency, is the more sensible calculation ||a|| ||b|| - dot(a,b) rather than 1-dot(a,b)/(||a|| * ||b||), I suppose it doesn't matter, but I may implement the former, I see you've done that for euclidean.

@rusty1s
Copy link
Owner

rusty1s commented Jul 10, 2019

Sounds good to me :)

@jlevy44
Copy link
Contributor Author

jlevy44 commented Jul 10, 2019

Just a heads up, I've never coded in cuda before, and its been a bit of time since I've done any serious c++, so please forgive me if I botch some of this. ;)

@jlevy44
Copy link
Contributor Author

jlevy44 commented Jul 10, 2019

For the non-cuda implementation, seems the cosine metric is not as used with KDTree, except given some transformation. We may have to use another indexing method.

@jlevy44
Copy link
Contributor Author

jlevy44 commented Jul 10, 2019

I'll just replace the c implementation for now with sklearn's. It's slower, we can just use it as a placeholder as we make changes to the PR.

@liaopeiyuan
Copy link
Contributor

I forked it. If the goal is just to form an unweighted adjacency, is the more sensible calculation ||a|| ||b|| - dot(a,b) rather than 1-dot(a,b)/(||a|| * ||b||), I suppose it doesn't matter, but I may implement the former, I see you've done that for euclidean.

This is really genius! I was really confused at first, so it may be helpful to add a few lines of explanation in the source code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants