Skip to content

NearestNeighbors.kneighbors returns inaccurate distance #27531

@andreaswimmer

Description

@andreaswimmer

Describe the bug

Using neighbors.NearestNeighbors I noticed that when finding an exact match, kneighbors sometimes returns a distance > 0. (Although the values I've seen so far have been pretty small ~1e-8 to 1e-9)

At first I thought this was a floating point precision problem, but spacial.distance - which the documentation for NearestNeighbor implies it uses - never displays this problem in my testing.

Steps/Code to Reproduce

from scipy import spatial
from sklearn import neighbors, metrics

data = [
    [-0.05634, 0.08516, 0.07541],  # this one works
    [0.07924, -0.01755, 0.12372],  # this one doesn't
]
neighborhood = neighbors.NearestNeighbors(metric='euclidean')
neighborhood.fit(data)

for i, entry in enumerate(data):
    distances, indexes = neighborhood.kneighbors(
        [entry],
        n_neighbors=1,
        return_distance=True,
    )
    found_index = indexes[0][0]
    found_distance = distances[0][0]

    print(f'{i}->{found_index}:')
    print(f"\tkneigbors' distance: {found_distance}")

    spacial_distance = spatial.distance.euclidean(entry, data[found_index])
    print(f'\tspacial.distance.euclidean: {spacial_distance}')

    pairwise_distance = metrics.pairwise.euclidean_distances(
        [entry],
        [data[found_index]],
    )
    print(f'\tmetrics.pairwise.euclidean_distances: {pairwise_distance}')

Expected Results

Ideally, distance should always be accurate, i.e. 0 between two identical elements.

If this is infeasible, I would suggest updating the documentation to warn users about this and encourage them to recalculate the distance with spacial.distance instead of relying on the returned value if accuracy is important.

Actual Results

0->0:
	kneigbors' distance: 0.0
	spacial.distance.euclidean: 0.0
	metrics.pairwise.euclidean_distances: [[0.]]
1->1:
	kneigbors' distance: 1.862645149230957e-09
	spacial.distance.euclidean: 0.0
	metrics.pairwise.euclidean_distances: [[0.]]

Process finished with exit code 0

Versions

System:
    python: 3.8.2 (default, Mar 25 2020, 17:03:02)  [GCC 7.3.0]
executable: /home/andreas/miniconda3/envs/retrieval_eval/bin/python
   machine: Linux-5.15.0-83-generic-x86_64-with-glibc2.10
Python dependencies:
      sklearn: 1.3.0
          pip: 23.2.1
   setuptools: 68.0.0
        numpy: 1.24.3
        scipy: 1.10.1
       Cython: 3.0.2
       pandas: None
   matplotlib: 3.7.3
       joblib: 1.2.0
threadpoolctl: 2.2.0
Built with OpenMP: True
threadpoolctl info:
       filepath: /home/andreas/miniconda3/envs/retrieval_eval/lib/libmkl_rt.so.1
         prefix: libmkl_rt
       user_api: blas
   internal_api: mkl
        version: 2021.4-Product
    num_threads: 24
threading_layer: intel
       filepath: /home/andreas/miniconda3/envs/retrieval_eval/lib/libomp.so
         prefix: libomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 48

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions