In [1]:
X_train = [[10,   10,  10],
           [0,   0,  0],
           [-1, -1, -1],
           [1, 1, 1]]

y_train = ['chartreuse',
           'white',
           'blue',
           'red']

X_test = [[1.1, 1.1, 1.1]]

def distance(A, B):
    squares = [(a - b) ** 2 for a, b in zip(A, B)]
    return sum(squares) ** 0.5

In [2]:
a1 = [10,   10,  10]
a2 = [-1, -1, -1]
for x, y in (zip(a1,a2)):
    print(x,y)
print(distance(a1,a2))

10 -1
10 -1
10 -1
19.05255888325765


#### **Solution for 1NN - lambda style**

In [3]:
def oneNN(X_train, y_train, X_test):
    for test in X_test:    
        index, point = min(enumerate(X_train),
                           key=lambda ind_point: distance(test, ind_point[1]))
        yield y_train[index]

In words: For every item in X_test, search through X_train to find the index and value of the entry that has the smallest distance from that item. Use this index to report the feature (color) of the nearest Wookie.

In [4]:
for result in oneNN(X_train, y_train, X_test):
    print(result)

red


**Same thing as above using list comprehension (More readable):**

In [5]:
def oneNN(X_train, y_train, X_test):
    result = []
    for test in X_test:
        point, index = min([(distance(test,p),i) for i,p in enumerate(X_train)])
        result.append(y_train[index])
    return result

In words: For every item in X_test, search through X_train to find the value and index of the entry that has the smallest distance from that item in X_test. Use this index to report the feature (color) of the nearest Wookie.

Note here the order of point, index (opposite of one in previous function). This is because min operates by default on the first item of a tuple, which we chose to be the distance. In the previous function, we needed to use a special key function to impose minimization by distance.

In [6]:
for result in oneNN(X_train, y_train, X_test):
    print(result)

red


**Expand to KNN. Just a matter of getting closest N elements and choosing the most common class from that.**

In [7]:
from collections import Counter
def KNN(X_train, y_train, X_test,k=1):  # Note new parameter k, number of nearest neighbors
    result = []
    for test in X_test:
        s = sorted([(distance(test,p),i) for i,p in enumerate(X_train)])[:k]
        # print(s)
        c = Counter([y_train[i] for _,i in s]).most_common()
        result.append(c[0][0])
    return result

Notes:
1. sorted creates a list of distances, sorted from smallest to largest, for the distance of the test items from the items in X_train. The slice operator is used to keep only the k smallest distances (k nearest neighbors).
2. most_common creates a list of features with their count, listing them from most common to least common. Since most_common only needs the index, the distances in s are ignored (that's the meaning of the underscore)

In [8]:
b1=[3,5,7,9,2]
print(sorted(b1))
print(sorted(b1)[:3])

[2, 3, 5, 7, 9]
[2, 3, 5]


In [9]:
for result in KNN(X_train, y_train, X_test):
    print(result)

red


In [10]:
Counter('abracadabra').most_common()

[('a', 5), ('b', 2), ('r', 2), ('c', 1), ('d', 1)]

In [11]:
Counter('abracadabra').most_common(2)

[('a', 5), ('b', 2)]

## Fancy Numpy/Scipy Version :)

In [12]:
import numpy as np
from scipy.spatial.distance import cdist
from scipy.stats import mode

def KNN_np(X_train, y_train, X_test,k=1):

    dists = cdist(X_train, X_test) # compute pairwise dist matrix
    idx = np.argpartition(dists, k, axis=0)[:k] # partition dist columns into k least and greater  
    k_nearest = y_train[idx] # fancy array indexing 

    return mode(k_nearest,axis=0)[0]

In [13]:
X_train, y_train = np.array(X_train), np.array(y_train) 
X_test = np.array([[1.1,1.1,1.1],[8,8,8]])

KNN_np(X_train, y_train, X_test, k=2)



array([['red', 'chartreuse']], dtype='<U10')