data source: https://www.kaggle.com/rakeshrau/social-network-ads

### K Nearest Neighbors

- classification/regression algorithm
- supervised learning
- nonparametric: does not make any assumptions on the underlying data distributions
- instance based /lazy algo: only memorize the training stances

#### How does it work?
To classify an unlabeled object, the distance of this object to the labeled object is computed, its k-nearest neightbors are identified, and the class label of the majority of nearest neighbors is then used to determine the class label of the object. For real-valued input variables, the most popular distance measure is Euclidean distance. There also are Hamming distance, Manhattan Distance and Minkowski Distance.

#### How to determine the value of K?
Tricky! Small k means that noise will have a higher influence on the result and a larger value make it computationally expensive. It depends a lot on individual cases. Sometimes, the best solution is to run through each possible value of k and then decide. 

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [13]:
data = pd.read_csv('../Social_Network_Ads.csv')

In [14]:
X = data.iloc[:, [2, 3]].values
y = data.iloc[:, [4]].values

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                   test_size = 0.25,
                                                   random_state = 25)

In [6]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)



In [7]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)

  This is separate from the ipykernel package so we can avoid doing imports until


KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

In [8]:
y_pred = knn.predict(X_test)

In [11]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [12]:
print(cm)

[[59  7]
 [ 1 33]]
