# K-Nearest Neighbors in a dataset
- The goal is to find the K closest data points to a new point without the overhead of sorting the entire dataset. argpartition is perfect for this because it's much faster than a full sort.
- Imagine you have the coordinates of hundreds of cafes, and you want to find the 3 closest cafes to your current location. You don't care which of the three is the absolute closest, you just need to identify the top three.

In [1]:
import numpy as np

In [2]:
# create random coordinates
np.random.seed(42)
cafes = np.random.randint(0, 100, size=(20, 2))
print(cafes)

[[51 92]
 [14 71]
 [60 20]
 [82 86]
 [74 74]
 [87 99]
 [23  2]
 [21 52]
 [ 1 87]
 [29 37]
 [ 1 63]
 [59 20]
 [32 75]
 [57 21]
 [88 48]
 [90 58]
 [41 91]
 [59 79]
 [14 61]
 [61 46]]


In [3]:
# current location
my_location = np.array([50, 50])

In [4]:
# Step 1: calculate the distance to all cafes
# Euclidean distance
distances_sq = np.sum((cafes - my_location)**2, axis=1)
print(distances_sq)

[1765 1737 1000 2320 1152 3770 3033  845 3770  610 2570  981  949  890
 1448 1664 1762  922 1417  137]


In [5]:
# Find the top 3 
k = 3
closest_indices = np.argpartition(distances_sq, k)
print(closest_indices)

[19  9  7 13 17 12 11  2  4 18 14 15  1 16  0  3 10  6  8  5]


In [7]:
top_K_indices = closest_indices[:k]
print(top_K_indices)

[19  9  7]


In [8]:
cafes[top_K_indices]

array([[61, 46],
       [29, 37],
       [21, 52]], dtype=int32)