-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Euclidean distance instead of angular used? #131
Comments
I don't think that's true – there's a unit test here: https://github.com/spotify/annoy/blob/master/test/annoy_test.py#L172 and in the following test here, I check that the distance is always less than or equal to 2: https://github.com/spotify/annoy/blob/master/test/annoy_test.py#L186 If the distance had been Euclidean this had not been the case If you still don't trust it, can you try to produce a minimal breaking example? |
Ok, see the attached file for an approximation of what I'm doing: If I run this, the returned distances are equal to the computed Euclidean distance (I also added the dot product for comparison, the vectors are normalized). For example, I get the following output: |
since you are dividing by the norm of the vector, the euclidean and angular distance will be identical annoy's "angular" distance is really just the euclidean distance of normalized vectors i.e. (u / |u| - v / |v|)^2 |
Ok, that was silly. Thanks for clearing up the confusion. |
np :) |
Whether or not I create an index using the angular distance or the Euclidean distance, the distances returned for the nearest neighbor search always seem to be Euclidean distances (
get_nns_by_vector(include_distances=True)
).I don't know what goes wrong because based on my cursory glance through the source this shouldn't be happening. I also don't know if it's only these distance calculations that are affected or if the whole index always uses the Euclidean distance instead of the angular distance.
The text was updated successfully, but these errors were encountered: