Support for Other KNN Distance Metrics #26

jlevy44 · 2019-06-30T17:45:24Z

Could one compute the knn_graph using a cosine distance metric?

rusty1s · 2019-06-30T20:13:04Z

You need to adjust the distance computation here. We could add a flag to support different distance metrics. Feel free to submit a PR if you would like to contribute on this one.

jlevy44 · 2019-07-01T01:16:04Z

Ok, will do. How does the knn computation introduced here compare to other ANN methods (say an approximated way to make this faster)?

rusty1s · 2019-07-01T04:33:42Z

What do you mean with ANN methods?

jlevy44 · 2019-07-01T11:30:40Z

https://github.com/erikbern/ann-benchmarks

rusty1s · 2019-07-01T11:34:30Z

Ah I see, well for CPU computation we fallback to scipy. The GPU version does parallelization over the number of examples (block parallelization) and number of nodes (thread parallelization), so it is only fast when using a fair amount of batched examples and when not having too many data points in each example (ideally around 1024).

jlevy44 · 2019-07-01T16:00:52Z

Ah makes sense. The data I am dealing with can range in size to order ~ 10k x 100k

GPU computed affinity does seem tractable if it can handle this size of data

jlevy44 · 2019-07-10T13:28:40Z

I forked it. If the goal is just to form an unweighted adjacency, is the more sensible calculation ||a|| ||b|| - dot(a,b) rather than 1-dot(a,b)/(||a|| * ||b||), I suppose it doesn't matter, but I may implement the former, I see you've done that for euclidean.

rusty1s · 2019-07-10T13:46:43Z

Sounds good to me :)

jlevy44 · 2019-07-10T13:49:09Z

Just a heads up, I've never coded in cuda before, and its been a bit of time since I've done any serious c++, so please forgive me if I botch some of this. ;)

jlevy44 · 2019-07-10T13:52:30Z

For the non-cuda implementation, seems the cosine metric is not as used with KDTree, except given some transformation. We may have to use another indexing method.

jlevy44 · 2019-07-10T13:55:09Z

I'll just replace the c implementation for now with sklearn's. It's slower, we can just use it as a placeholder as we make changes to the PR.

liaopeiyuan · 2020-05-25T09:56:38Z

I forked it. If the goal is just to form an unweighted adjacency, is the more sensible calculation ||a|| ||b|| - dot(a,b) rather than 1-dot(a,b)/(||a|| * ||b||), I suppose it doesn't matter, but I may implement the former, I see you've done that for euclidean.

This is really genius! I was really confused at first, so it may be helpful to add a few lines of explanation in the source code.

jlevy44 mentioned this issue Jul 10, 2019

Attempt at adding cosine similarity metric #28

Merged

rusty1s closed this as completed Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Other KNN Distance Metrics #26

Support for Other KNN Distance Metrics #26

jlevy44 commented Jun 30, 2019

rusty1s commented Jun 30, 2019

jlevy44 commented Jul 1, 2019

rusty1s commented Jul 1, 2019

jlevy44 commented Jul 1, 2019

rusty1s commented Jul 1, 2019 •

edited

Loading

jlevy44 commented Jul 1, 2019

jlevy44 commented Jul 10, 2019

rusty1s commented Jul 10, 2019

jlevy44 commented Jul 10, 2019

jlevy44 commented Jul 10, 2019

jlevy44 commented Jul 10, 2019

liaopeiyuan commented May 25, 2020

Support for Other KNN Distance Metrics #26

Support for Other KNN Distance Metrics #26

Comments

jlevy44 commented Jun 30, 2019

rusty1s commented Jun 30, 2019

jlevy44 commented Jul 1, 2019

rusty1s commented Jul 1, 2019

jlevy44 commented Jul 1, 2019

rusty1s commented Jul 1, 2019 • edited Loading

jlevy44 commented Jul 1, 2019

jlevy44 commented Jul 10, 2019

rusty1s commented Jul 10, 2019

jlevy44 commented Jul 10, 2019

jlevy44 commented Jul 10, 2019

jlevy44 commented Jul 10, 2019

liaopeiyuan commented May 25, 2020

rusty1s commented Jul 1, 2019 •

edited

Loading