# K nearest neighbourhood
### From scratch

*“Birds of a feather flock together.”*

The KNN Algorithm

1. Initialize K to your chosen number of neighbors
2. For each example in the data
    - Calculate the distance between the query example and the current example from thedata.
    - Add the distance and the index of the example to an ordered collection
3. Sort the ordered collection of distances and indices in ascending order of distances
4. Pick the first K entries from the sorted collection and get their labels
5. Return the mean (for regression) or the mode (for classification) of the k labels. 

## Imports

In [None]:
from random import seed
from random import randrange
from csv import reader
from math import sqrt

Step 1: Calculate Euclidean Distance.  
Step 2: Get Nearest Neighbors.  
Step 3: Make Predictions.  

## Step 1: Calculate Euclidean Distance

$ \large d = \sqrt{\sum_i^n{(X_{1i} - X_{2i})^2}}$  

In [None]:
def euclidean_distance(X1, X2):
    d = 0.0
    for i in range(len(X1)-1):
        d += (X1[i] - X2[i])**2
    return sqrt(d)

In [None]:
# Test distance function
dataset = [[2.7810836,2.550537003,0], [1.465489372,2.362125076,0], [3.396561688,4.400293529,0], [1.38807019,1.850220317,0], [3.06407232,3.005305973,0], [7.627531214,2.759262235,1], [5.332441248,2.088626775,1], [6.922596716,1.77106367,1], [8.675418651,-0.242068655,1], [7.673756466,3.508563011,1]]
x0 = dataset[0]
for x in dataset:
    d = euclidean_distance(x0, x)
    print(d)

## Step 2: Get Nearest Neighbors

In [None]:
def get_neighbors(train, test_row, n_neighbors):
    distances = list()
    for train_row in train:
        dist = euclidean_distance(test_row, train_row)
        distances.append((train_row, dist))
    distances.sort(key=lambda tup: tup[1])
    neighbors = list()
    for i in range(n_neighbors):
        neighbors.append(distances[i][0])
    return neighbors

In [None]:
# Test neighbors
neighbors = get_neighbors(dataset, dataset[0], 3)
for neighbor in neighbors:
    print(neighbor)

## Step 3: Make Predictions

In [None]:
# Make a classification prediction with neighbors
def predict_classification(train, test_row, n_neighbors):
    neighbors = get_neighbors(train, test_row, n_neighbors)
    output_values = [row[-1] for row in neighbors]
    prediction = max(set(output_values), key=output_values.count)
    return prediction

In [None]:
# Test distance function
prediction = predict_classification(dataset, dataset[0], 3)
print('Expected %d, Got %d.' % (dataset[0][-1], prediction))

## Testing

In [None]:
%%capture
%run NaiveBayes.ipynb

In [None]:
# kNN Algorithm
def k_nearest_neighbors(train, test, num_neighbors):
    predictions = list()
    for row in test:
        output = predict_classification(train, row, num_neighbors)
        predictions.append(output)
    return(predictions)
 
# Test the kNN on the Iris Flowers dataset
seed(1)
filename = 'D:/data/csv/iris.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
    str_column_to_float(dataset, i)
str_column_to_int(dataset, len(dataset[0])-1)

n_folds = 5
num_neighbors = 5
scores = evaluate_algorithm(dataset, k_nearest_neighbors, n_folds, num_neighbors)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

## Credits & Links
Based on the following sources:

https://machinelearningmastery.com/tutorial-to-implement-k-nearest-neighbors-in-python-from-scratch/