### KNN - a brief summary

K-Nearest neighbors is a method of modeling by which the algorithm takes the K-nearest neighbors of each point and assigns it a classification based on the characteristics of it's neighbors.

K is the number of neighbors assigned for classification purposes, and it is determined by the user depending on the characteristics they desire from their model; higher K-values will generate a more biased, less variant model, while lower K-values will generate a more variant, less biased model.

Please see the code below for a quick example of K-Nearest Neighbors.


In [32]:
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial.distance import euclidean as euc
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
np.random.seed(0)

In [63]:
#generate arrays for example purposes

x = np.asarray(range(100)).reshape(-1, 1)

z = list(np.zeros(50)) + list(np.ones(50))
y = np.asarray(z)

In [65]:
#split your data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(x, y)

In [66]:
#scale your data to avoid disproportional coefficient influence

ss = StandardScaler()

ss.fit(X_train)

ss.transform(X_train)
ss.transform(X_test)

array([[ 0.20424214],
       [ 1.17992115],
       [-0.47873317],
       [ 0.17171951],
       [-1.48693482],
       [ 0.04162897],
       [ 0.00910634],
       [ 1.11487589],
       [-0.2185521 ],
       [ 0.59451375],
       [-0.6088237 ],
       [-0.34864263],
       [ 0.69208165],
       [ 0.0741516 ],
       [ 1.60271539],
       [ 0.36685531],
       [ 0.65955901],
       [ 0.88721745],
       [ 0.72460428],
       [ 0.13919687],
       [ 0.30181004],
       [ 0.43190058],
       [-0.0234163 ],
       [ 0.91974008],
       [-1.16170848]])

In [67]:
#fit KNN to your training data 

knn = KNeighborsClassifier()

knn.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [68]:
confusion_matrix(knn.predict(X_test), y_test)

array([[ 9,  1],
       [ 0, 15]])

### Simple confusion matrix

In this example, there are 2 possible predictions and 4 possible outcomes. The array values correspond to their respective indices.

0,0 is where the true value is 0 and a 0 was predicted  
0,1 is where the true value is 0 and a 1 was predicted  
1,0 is where the true value is 1 and a 0 was predicted  
1,1 is where the true value is 1 and a 1 was predicted  