# 7. k-NN Implementation

**Instructions:**
* go through the notebook and complete the **tasks** .  
* Make sure you understand the examples given!
* When a question allows a free-form answer (e.g., ``what do you observe?``) create a new markdown cell below and answer the question in the notebook.
* ** Save your notebooks when you are done! **

In this lab, you will try to implement your own k-NN classifier using numpy functions.  


**Note** You can always copy the code in a separate notebook (or, a plain text file .py that you can run with python from the command line) if you want.  After you are done, you can copy the code back in this notepad.

<hr>
<span style="color:rgb(170,0,0)">**Task:**</span> Run the cell below to load our data. This piece of code is exactly the same as in the previous notebook.

In [None]:
%matplotlib inline


from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt

#import k-nn classifier
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
import operator


iris = datasets.load_iris()

#view a description of the dataset (uncomment next line to do so)
#print(iris.DESCR)

#Set X equal to features, Y equal to the targets

X=iris.data 
y=iris.target 


mySeed=1234567
#initialize random seed generator 
np.random.seed(mySeed)

#we add some random noise to our data to make the task more challenging
X=X+np.random.normal(0,0.5,X.shape)

<hr>
<span style="color:rgb(170,0,0)">**Task:**</span> The code below splits our data into two sets (a training and testing set), and subsequently trains a scikit-learn classifier on the training data and tests on the testing data.  To avoid complicating things, in this lab you just need to follow this setting, no need to consider cross-validation.  We are also using a fixed number of neighbours (10) and the euclidean distance.  You can just run the cell below and make sure you understand the code - nothing else to do here.

In [None]:
np.random.seed(mySeed)
indices= np.random.permutation(X.shape[0]) 
bins=np.array_split(indices,2) # we  just need a training and testing set here
foldTrain=bins[0]
foldTest=bins[1]

knn=KNeighborsClassifier(n_neighbors=10, metric='euclidean')
knn.fit(X[foldTrain],y[foldTrain])
y_pred=knn.predict(X[foldTest])
print(accuracy_score(y[foldTest],y_pred))

<hr>
<span style="color:rgb(170,0,0)">**Task:**</span> Your task is now to implement your own version of k-NN, based on the lecture slides and the description given here.  A suggested structure for doing this is included in the comments below, but feel free to start working in a different cell or in your favourite IDE.  We are still using a simple training/test split (no cross-validation here) to avoid complicating things, and thus use a fixed number of neighbours (10) and the euclidean distance.

In [None]:
## ANSWER HERE: Suggested code structure in comments below
# given a test point, your code should
# 
# - get the 'nearest neighbours' - i.e. the samples in the training set - that are nearest to our test sample
# -----> done by evaluating the distance of the test sample to all samples in the training set
# - assign a label to the test sample based on the 'neighbours'

##=== FUNCTION DEFINITIONS  ===##


#define distance functions: given two vectors (ndarrays), this function returns the distance between them
#Write at least two distance functions, measuring the squared distance between your data and the absolute value distance.
#You can implement both of these by looking at the numpy.linalg.norm method, or implement your own version.  
def euclideanDistance(in1,in2):
    return ##eucledian distance between in1 and in2##


#The get neighbours function  returns the nearest neighbour indices in X of the test point x_.  In more detail
# Input: x_ : point in test data
#       X   : training data
#       n   : number of neighbours to return
#       T   : total number of training data
# Output: n-nearest neighbours of x_ in training data X

def getNeighbours(x_,X,n,T): # where T is number of data
    return # indices of n-nearest neighbours in training data


# The assign label function returns the assigned label for a test data point, given the labels of nearest neighbours
# Input: nLabels : labels (classes) of nearest neighbours of a test point
# Output: the assigned label
# e.g., if we have n=1 (one neighbour), then we can just return the label of the nearest neighbour
# else, we can e.g., choose the majority class
def assignLabel(nLabels):
    return # label assigned to test point x_

##=== FUNCTION DEFINITIONS (END)  ===##




# here is some sample code for evaluating the kNN classifier you just built
# NOTE: this is just a suggested way to do this - you can do it in another way if you want
correct=0;
for i in foldTest: #for all test points
    # knn classifier
    x_=X[i] # test point x_
    y_=y[i] # true label for y_
    
    # get neighbours of x_ in training data 
    # assignLabel to x_ based on neighbours
    # evaluate if the assigned label is correct (equal to y_)
    
#print accuracy