Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_kdtree_distance restrictions? #143

Closed
piskvorky opened this issue Nov 17, 2013 · 8 comments
Closed

is_kdtree_distance restrictions? #143

piskvorky opened this issue Nov 17, 2013 · 8 comments

Comments

@piskvorky
Copy link

Sorry if I missed this in the docs: what properties must a FLANN distance function have, in order to work correctly with autotune & kdtree?

I'm thinking of adding the cosine distance, 1 - xy/|x||y| (not a metric!), as another type of distance. Implementing the distance itself is fairly trivial, but I'm not sure how to make it play nicely with the rest of FLANN (index types etc).

Is it a is_vector_space_distance, a is_kdtree_distance?

Cossim is a very popular measure, so I assume there some problem, since it's not in FLANN yet?

@sromberg
Copy link

If both your vectors a and b are L2-normalized the cosine similarity induces the same ordering as the Euclidean distance. The absolute values are different though.

@piskvorky
Copy link
Author

They're not, but I could pre-normalize vectors before sending them to FLANN, good idea.

Still curious about adding cossim directly though. What would be the proper way to go about that, so that autotune&co work?

@mariusmuja
Copy link
Collaborator

The cosine similarity is not a kd-tree distance (dimension-wise additive
and monotonically increasing with the addition of new dimensions), nor is a
vector-space distance, so it would not work ok with the indexes in FLANN.

On Tue, Nov 19, 2013 at 7:53 AM, Radim Řehůřek notifications@github.comwrote:

They're not, but I could pre-normalize vectors before sending them to
FLANN, good idea.

Still curious about adding cossim directly though. What would be the
proper way to go about that, so that autotune&co work?


Reply to this email directly or view it on GitHubhttps://github.com//issues/143#issuecomment-28801337
.

@piskvorky
Copy link
Author

Thanks Marius!

Maybe these two restrictions (dim-wise additive and non-decreasing) could appear in the docs? Or appear more prominently, in case I missed them :)

What are the FLANN restrictions for a "vector-space distance"?

@mariusmuja
Copy link
Collaborator

It's basically a distance over a normed vector space (
http://en.wikipedia.org/wiki/Norm_(mathematics)#Definition). The
homogeneity property of the norm is required to be able to "average" the
different elements of the vector space (for example when constructing a
hierarchical k-means tree).

On Tue, Nov 19, 2013 at 9:38 AM, Radim Řehůřek notifications@github.comwrote:

Thanks Marius!

Maybe these two restrictions (dim-wise additive and non-decreasing) could
appear in the docs? Or appear more prominently, in case I missed them :)

What are the restrictions for a "vector-space distance"?


Reply to this email directly or view it on GitHubhttps://github.com//issues/143#issuecomment-28813634
.

@piskvorky
Copy link
Author

Ok, thanks. I'll go with @sromberg's suggestion then.

@piskvorky
Copy link
Author

In case anyone stumbles upon this issue later:

The way to convert between what comes out of FLANN (squared euclidean distance on L2 normalized input) and cosine similarity is simply: cossim = 1 - flann_dist / 2.

@StevenLOL
Copy link

@piskvorky thanks for point this out: cossim = 1 - flann_dist / 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants