<a href="https://colab.research.google.com/github/maxxies/knearestneighbour_algorithm_scratch/blob/main/KNN_algorithm_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Libraries**

In [27]:
from sklearn.datasets import load_breast_cancer
import numpy as np
import random
from collections import Counter

**Function takes in the arguments: the training data, a single array of features ,only, for prediction and chosen number of close neighbours respectively.**

**The euclidean distance of every feature in the training data with the features for prediction is calculated.**

 **The distance and the target/label from which the feature comes from in the training data is added to a list.**

 **The total list after every feature in the training data is used is sorted and the first number of close neighbours is chosen**

**The most occurring target/label becomes the target/label for the features for prediction**

In [28]:
def k_nearest_neighbours(train_set_data, predict, k=5):
    distances = []
    for group in train_set_data:
        for features in train_set_data[group]:
            euclidean_distance = np.linalg.norm(np.array(features)-np.array(predict))
            distances.append([euclidean_distance, group])
    neighbours = [pairs[1] for pairs in sorted(distances)[:k]]
    nearest_neighbour = Counter(neighbours).most_common(1)[0][0]

    return nearest_neighbour

**Loading data set**

In [29]:
dataset = load_breast_cancer()
data = dataset['data']
target = dataset['target']

**Coverting data sets into numpy arrays**

In [30]:
new_data = np.array(data.astype(float))
new_target = np.array(target.astype(float))
full_data = []

**Adding targets to their respective features**

In [31]:
for i in range(len(new_data)):
    temp = new_data[i]
    new_array = np.append(temp, new_target[i])
    full_data.append(new_array)


**Shuffling total data set**

In [32]:
random.shuffle(full_data)
# Changing full data set into a numpy array
new_full_data = np.array(full_data)


**Splitting data set into train and test sets**

In [33]:
test_size = 0.15
train_data = new_full_data[:-int(test_size*len(new_full_data))]
test_data = new_full_data[-int(test_size*len(new_full_data)):]

train_set = {'malignant': [], 'benign': []}
test_set = {'malignant': [], 'benign': []}
for sub_data in train_data:
    if (sub_data[-1] == 0):
        train_set['benign'].append(sub_data[:-1])
    else:
        train_set['malignant'].append(sub_data[:-1])

for sub_data in test_data:
    if (sub_data[-1] == 0):
        test_set['benign'].append(sub_data[:-1])
    else:
        test_set['malignant'].append(sub_data[:-1])


**Making predictions using test data set**

In [34]:
correct_predictions = 0
total_number_features = 0
for groups in test_set:
    for features in test_set[groups]:
        prediction = k_nearest_neighbours(train_set, features,15)
        if prediction == groups:
            correct_predictions += 1
        total_number_features += 1
accuracy = correct_predictions/total_number_features
print(accuracy)

0.9529411764705882
