Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Euclidean distance instead of angular used? #131

Closed
bittremieux opened this issue Jan 23, 2016 · 5 comments
Closed

Euclidean distance instead of angular used? #131

bittremieux opened this issue Jan 23, 2016 · 5 comments

Comments

@bittremieux
Copy link

Whether or not I create an index using the angular distance or the Euclidean distance, the distances returned for the nearest neighbor search always seem to be Euclidean distances (get_nns_by_vector(include_distances=True)).

I don't know what goes wrong because based on my cursory glance through the source this shouldn't be happening. I also don't know if it's only these distance calculations that are affected or if the whole index always uses the Euclidean distance instead of the angular distance.

@erikbern
Copy link
Collaborator

I don't think that's true – there's a unit test here: https://github.com/spotify/annoy/blob/master/test/annoy_test.py#L172

and in the following test here, I check that the distance is always less than or equal to 2: https://github.com/spotify/annoy/blob/master/test/annoy_test.py#L186

If the distance had been Euclidean this had not been the case

If you still don't trust it, can you try to produce a minimal breaking example?

@bittremieux
Copy link
Author

Ok, see the attached file for an approximation of what I'm doing:
euclidean.txt

If I run this, the returned distances are equal to the computed Euclidean distance (I also added the dot product for comparison, the vectors are normalized).

For example, I get the following output:
annoy_dist = 0.8814489841461182 euclidean = 0.8814489556668296 dot = 0.6115238692769277 annoy_dist = 0.8938978314399719 euclidean = 0.8938978219006966 dot = 0.6004733420005951 annoy_dist = 0.9004353880882263 euclidean = 0.9004354783899279 dot = 0.5946079746283507

@erikbern
Copy link
Collaborator

since you are dividing by the norm of the vector, the euclidean and angular distance will be identical

annoy's "angular" distance is really just the euclidean distance of normalized vectors i.e. (u / |u| - v / |v|)^2

@bittremieux
Copy link
Author

Ok, that was silly. Thanks for clearing up the confusion.

@erikbern
Copy link
Collaborator

np :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants