In this tutorial, I am going to code from scratch the KNN algorithm. Many thanks to http://www.jiaaro.com/KNN-for-humans/ for his really simple and clear blogs The steps are: 
- Get your data
- Standardize your features (I won't do it for this example, but the algorithm usually perform better with standardization)
- create a function with the distance measures you want to use
- Find the closest neightbour for every observation that you want to predict (you will see it's easier than it seems)
- Make a vote 
- Predict


In [163]:
# Test data from a fruit list
# example : the fruit we want to predict
unknown_fruit = [373, 1]

#  train dataset in list
dataset = [
  # weight, color, type
  [303, 3, "banana"],
  [370, 1, "apple"],
  [298, 3, "banana"],
  [277, 3, "banana"],
  [377, 4, "apple"],
  [299, 3, "banana"],
  [382, 1, "apple"],
  [374, 4, "apple"],
  [303, 4, "banana"],
  [309, 3, "banana"],
  [359, 1, "apple"],
  [366, 1, "apple"],
  [311, 3, "banana"],
  [302, 3, "banana"],
  [373, 4, "apple"],
  [305, 3, "banana"],
  [371, 3, "apple"],
]
# convert the list into np.array
dataset = np.array(dataset)

We will standardize our data so that every features (input,X) have zero mean and 1 unit variance

We will use the eucledienne distance measures

In [88]:
def eucledienne_distance(fruit1, fruit2):
    """
    The args are iterables of the values in the table. 
    for example the args should look something like this:
    
    #         weight,  color
    fruit1 = [303,     3]  # Banana from the data set
    fruit2 = [373,     1]  # the unclassified fruit
    """
    
    # first let's get the distance of each parameter
    a = fruit1[0] - fruit2[0]
    b = fruit1[1] - fruit2[1]
    
    # the distance from point A (fruit1) to point B (fruit2)
    distance = (a**2 + b**2) **0.5
    
    return distance

Find the closest neightbour for every observation that you want to predict 
(you will see it's easier than it seems)

In [184]:
from operator import itemgetter, attrgetter
import numpy as np
import pandas as pd
features = dataset[:,0:2]
features = features.astype(np.float)
predictor = dataset[:,2]
distances = [None]*len(features) # fill empty vector

# compute the distance for every train row with the prediction. 
for row in range(0,len(features)):
    distances[row] = eucledienne_distance(features[row], unknown_fruit)

# append distance result with predictor
# we can clearly see that apple will be selected with an appropriate k
# because it has the lowest distance.

neightbor_distance = zip(predictor,distances)
sorted_dataset = sorted(neightbor_distance, 
                        key=itemgetter(1),reverse=True)#sorted with column2
sorted_dataset

[('banana', 96.020831073262428),
 ('banana', 75.026661927610775),
 ('banana', 74.027022093286988),
 ('banana', 71.028163428319047),
 ('banana', 70.064256222413434),
 ('banana', 70.028565600046392),
 ('banana', 68.029405406779802),
 ('banana', 64.031242374328485),
 ('banana', 62.032249677083293),
 ('apple', 14.0),
 ('apple', 9.0),
 ('apple', 7.0),
 ('apple', 5.0),
 ('apple', 3.1622776601683795),
 ('apple', 3.0),
 ('apple', 3.0),
 ('apple', 2.8284271247461903)]

Make a vote considering k every observation to predict

In [182]:
# from the python std library
from collections import Counter

# take only the first K items. What are the top 3 
# fruit that are similar with test features and train features.
top_k = sorted_dataset[:k]
class_counts = Counter(fruit for (weight, color, fruit) in top_k)

# class_counts now looks like this:
# {"apple": 3}
# we can see that the features in the train set that are in closest 
# distance with the test data relate to the apple.

# get the class (fruit(apple or banana))  with the most votes
classification = max(class_counts, key=lambda cls: class_counts[cls])

# There you have it! the prediction is:
classification

'apple'

In [36]:
# alternative implementation propose in the book data science handbook. One liner. of code
import numpy as np
rand = np.random.RandomState(42)
X = np.random.rand(10,2)

print('X')
print(X)
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()

# three liner KNN
dist_sq = np.sum((X[:,np.newaxis,:] - X[np.newaxis, :, :]) ** 2, axis=-1) # compute the square distance matrix
K = 2
nearest_partition = np.argpartition(dist_sq, K + 1, axis = 1)

# np.newaxis allow to increase the dimension of numpy by 1 unit.
# we learn that the shape in 3D is now (z, row, column)
print(np.shape(X))
print(np.shape(X[:,np.newaxis,:]))
print(np.shape(X[np.newaxis, :, :]))

# compute the differences for all pair of points
# we can see that X[:, np.newaxis , :] correspond to one dimension per pair of point (total of 10 dimensions)
# we can see that X[X[np.newaxis, :, :]] correspond to all pair of point
differences = X[:,np.newaxis, :] - X[np.newaxis, :, :]

print('sq differences')

# compute the square difference
sq_differences = differences ** 2
print(sq_differences)

# sum the coordinate difference
# axis may be negative, in which case it counts from the last to the first axis.
dist_sq = sq_differences.sum(-1)
print(dist_sq.shape)
print(dist_sq) 
print(nearest)

# classify by nearest partition
K = 2
nearest_partition = np.argpartition(dist_sq, K + 1, axis = 1)
print('nearest_partition')
print(nearest_partition)

X
[[ 0.18779781  0.10550066]
 [ 0.39726865  0.70194293]
 [ 0.63269774  0.68142554]
 [ 0.56107598  0.4760867 ]
 [ 0.91445571  0.99576099]
 [ 0.60566823  0.74802971]
 [ 0.61898564  0.28289108]
 [ 0.57422117  0.27357343]
 [ 0.72589989  0.48741596]
 [ 0.64548204  0.79811172]]
(10, 2)
(10, 1, 2)
(1, 10, 2)
sq differences
[[[  0.00000000e+00   0.00000000e+00]
  [  4.38780348e-02   3.55743379e-01]
  [  1.97935952e-01   3.31689468e-01]
  [  1.39336593e-01   1.37334009e-01]
  [  5.28031709e-01   7.92563446e-01]
  [  1.74615688e-01   4.12843578e-01]
  [  1.85922951e-01   3.14673624e-02]
  [  1.49323015e-01   2.82484564e-02]
  [  2.89553850e-01   1.45859294e-01]
  [  2.09474858e-01   4.79710077e-01]]

 [[  4.38780348e-02   3.55743379e-01]
  [  0.00000000e+00   0.00000000e+00]
  [  5.54268565e-02   4.20963190e-04]
  [  2.68328403e-02   5.10110379e-02]
  [  2.67482454e-01   8.63290508e-02]
  [  4.34303831e-02   2.12399138e-03]
  [  4.91584247e-02   1.75604448e-01]
  [  3.13121933e-02   1.83500426e-