# KNN From Scratch

This is the code for implementing K-Nearest Neighbor Algorithm from scratch

In [2]:
# Import Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## KNN Theory

K-Nearest Neighbors Algorithm is a classification model that is based on the principles on Bayes Classifier. In this algorithm we predict the class to which a data point belong to based on the classes that its 'k' nearest neighbors belong to. In this case the value of k can be anything between 1 to (n-1) 

## KNN Practical

**Steps to implement KNN:**
- Input: Training Data, Testing Data, value of 'k'
- Output: Prediction of class

**Algorithm Pseudo Code:**

- for each test_data_point in test dataset
    - Take Eucledian distance between the test data point and all the training data points
    - Sort by distance
    - Depending on the value of k, consider the first 'k' nearest points depending on the distance calculated
    - Find the probability of classes for the 'k' data points
    - Assign the class with the highest probability as the prediction

In [137]:
def EU_dist(X,Y):
    diff=X-Y
    diff_sq=diff*diff
    row_sum=diff_sq.values.sum()
    return np.sqrt(row_sum)

def KNN(X_train, X_test, Y_train, k):
    prediction = []
    for i in range(X_test.shape[0]):
        row_test=X_test.iloc[i,:]
        
        dist=[]
        for j in range(X_train.shape[0]):
            row_train=X_train.iloc[j,:]
            dist.append(EU_dist(row_train, row_test))
        Y_train = pd.DataFrame(Y_train)
        Y_train['Distance'] = dist
        Y_train.sort_values(by=['Distance'],inplace=True)
        Y_train.drop('Distance',axis=1,inplace=True)
        unique, counts = np.unique(Y_train[0:k], return_counts=True)
        ind=np.argmax(counts)
        prediction.append(unique[ind])
        
    return prediction

## Testing the outcome

We will first try it out with a simple dataset to see if the model is working.

In [142]:
# INitialize the test and train datasets
d = {'X1': [0,2,0,0,-1,1], 'X2': [3,0,1,1,0,1], 'X3':[0,0,3,2,1,1], 'Y': ['Red','Red','Red','Green','Green','Red']}
train=pd.DataFrame(data=d)
X_train=train.drop('Y',axis=1)
Y_train=train['Y']

d={'X1': [0], 'X2': [0], 'X3':[0]}
test=pd.DataFrame(data=d)

In [143]:
print "Prediction for the Test data with K=1 is : ", (KNN(X_train, test,Y_train,1))

Prediction for the 1st Test data is :  ['Green']


In [144]:
print "Prediction for the Test data with K=3 is : ", (KNN(X_train, test,Y_train,3))

Prediction for the Test data with K=3 is :  ['Red']
