-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed as not planned
Labels
Description
Describe the bug
Using neighbors.NearestNeighbors I noticed that when finding an exact match, kneighbors sometimes returns a distance > 0. (Although the values I've seen so far have been pretty small ~1e-8 to 1e-9)
At first I thought this was a floating point precision problem, but spacial.distance - which the documentation for NearestNeighbor implies it uses - never displays this problem in my testing.
Steps/Code to Reproduce
from scipy import spatial
from sklearn import neighbors, metrics
data = [
[-0.05634, 0.08516, 0.07541], # this one works
[0.07924, -0.01755, 0.12372], # this one doesn't
]
neighborhood = neighbors.NearestNeighbors(metric='euclidean')
neighborhood.fit(data)
for i, entry in enumerate(data):
distances, indexes = neighborhood.kneighbors(
[entry],
n_neighbors=1,
return_distance=True,
)
found_index = indexes[0][0]
found_distance = distances[0][0]
print(f'{i}->{found_index}:')
print(f"\tkneigbors' distance: {found_distance}")
spacial_distance = spatial.distance.euclidean(entry, data[found_index])
print(f'\tspacial.distance.euclidean: {spacial_distance}')
pairwise_distance = metrics.pairwise.euclidean_distances(
[entry],
[data[found_index]],
)
print(f'\tmetrics.pairwise.euclidean_distances: {pairwise_distance}')
Expected Results
Ideally, distance should always be accurate, i.e. 0 between two identical elements.
If this is infeasible, I would suggest updating the documentation to warn users about this and encourage them to recalculate the distance with spacial.distance
instead of relying on the returned value if accuracy is important.
Actual Results
0->0:
kneigbors' distance: 0.0
spacial.distance.euclidean: 0.0
metrics.pairwise.euclidean_distances: [[0.]]
1->1:
kneigbors' distance: 1.862645149230957e-09
spacial.distance.euclidean: 0.0
metrics.pairwise.euclidean_distances: [[0.]]
Process finished with exit code 0
Versions
System:
python: 3.8.2 (default, Mar 25 2020, 17:03:02) [GCC 7.3.0]
executable: /home/andreas/miniconda3/envs/retrieval_eval/bin/python
machine: Linux-5.15.0-83-generic-x86_64-with-glibc2.10
Python dependencies:
sklearn: 1.3.0
pip: 23.2.1
setuptools: 68.0.0
numpy: 1.24.3
scipy: 1.10.1
Cython: 3.0.2
pandas: None
matplotlib: 3.7.3
joblib: 1.2.0
threadpoolctl: 2.2.0
Built with OpenMP: True
threadpoolctl info:
filepath: /home/andreas/miniconda3/envs/retrieval_eval/lib/libmkl_rt.so.1
prefix: libmkl_rt
user_api: blas
internal_api: mkl
version: 2021.4-Product
num_threads: 24
threading_layer: intel
filepath: /home/andreas/miniconda3/envs/retrieval_eval/lib/libomp.so
prefix: libomp
user_api: openmp
internal_api: openmp
version: None
num_threads: 48