# KNN Implementation without Build-Up library

**KNN is a supervised learning algorithm that estimates how likely a data point(instance) belongs to one class or the other depending upon which class its ‘k’ nearest instances belong to.**

Steps to implement KNN:
1. Calculate the distance(Euclidean, Manhattan, etc) between a test data point and every training data point. This is to see who is closer and who is far by how much.

2. Sort the distances and pick K nearest distances(first K entries) from it. Those will be K closest neighbors to your given test data point.

3. Get the labels of the selected K neighbors. The most common label(label with a majority vote) will be the predicted label for our test data point.

**Importing libraries**

In [2]:
import numpy as np
import scipy.spatial
from collections import Counter

**loading the Iris-Flower dataset from Sklearn**

In [3]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state = 42, test_size = 0.2)

# **Background knowledge about class in Python.**

If you havent use class, there are some information to help you understand. If you are very familiar with the class in Python, you can skip this session.

1. **init()**: All classes have a function called __init__(), which is always executed when the class is being initiated.

    Use the __init__() function to assign values to object properties, or other operations that are necessary to do when the object is being created.
    

2. **Self Parameter**: The self parameter is a reference to the current instance of the class, and is used to access variables that belongs to the class.


In [14]:
class KNN:
    # K in the number of the neighbours
    def __init__(self, k):
        self.k = k
    # Split to tain set and get input and output
    def fit(self, X, y):
        self.X_train = X
        self.y_train = y

Create a function Caculate the Euclidean distance

In [15]:
    def distance(self, X1, X2):
        distance = scipy.spatial.distance.euclidean(X1, X2)

Create a function for KNN. The details is in the comments of the code.

In [21]:
    def predict(self, X_test):
        final_output = []
        for i in range(len(X_test)):
            d = []
            votes = []
            for j in range(len(X_train)):
                # Calculated the distance between each test datapoint to all the train datapoints
                dist = scipy.spatial.distance.euclidean(X_train[j] , X_test[i])
                # Save the distance in the list
                d.append([dist, j])
            # Sort it from low to high
            d.sort()
            # Pick the K amount in the sequence
            d = d[0:self.k]
            # Find the corresponding 
            for d, j in d:
                # Save the K nearest outputs from the training output
                votes.append(y_train[j])
            # Find the majority vote
            ans = Counter(votes).most_common(1)[0][0]
            
            # Prediction
            final_output.append(ans)
            
        return final_output

The full code of the whole class part 

In [8]:
class KNN:
    def __init__(self, k):
        self.k = k
        
    def fit(self, X, y):
        self.X_train = X
        self.y_train = y
        
    def distance(self, X1, X2):
        distance = scipy.spatial.distance.euclidean(X1, X2)
    
    def predict(self, X_test):
        final_output = []
        for i in range(len(X_test)):
            d = []
            votes = []
            for j in range(len(X_train)):
                dist = scipy.spatial.distance.euclidean(X_train[j] , X_test[i])
                d.append([dist, j])
            d.sort()
            d = d[0:self.k]
            for d, j in d:
                votes.append(y_train[j])
            ans = Counter(votes).most_common(1)[0][0]
            # print("counter", Counter(votes).most_common(1))
            final_output.append(ans)
            
        return final_output
    
    def score(self, X_test, y_test):
        predictions = self.predict(X_test)
        return (predictions == y_test).sum() / len(y_test)

In [12]:
# 3 nearest neighbour
clf = KNN(7)
# Put training set in the model
clf.fit(X_train, y_train)
# Predict the results with test dataset
prediction = clf.predict(X_test)
for i in prediction:
    print(i, end= ' ')

1 0 2 1 1 0 1 2 2 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 

In [14]:
# Evaluation
prediction == y_test

array([ True,  True,  True,  True,  True,  True,  True,  True, False,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True])

In [15]:
clf.score(X_test, y_test)

0.9666666666666667