# 4. K-Nearest Neighbors Algorithm
the k-nearest neighbors algorithm is a non-parametric classification algorithm. That is, it does not make any assumption about the distribution of the data nor the function that maps the input features to output. Rather, it assigns a class label to a new data point based on the class lables of its k-nearest points in the training data set. More formally, let $(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), \ldots, (x^{(n)}, y^{(n)})$ be a sequence of training samples, where $x^{(i)} \in \mathbb{R}^d$, and $y^{(i)} \in \{0, 1\}$ for $i = 1, \ldots, n$. Given an positive integer $k$ and a new data point $x \in \mathbb{R}^d$, the algorithm works as follows:

1. compute $\|x - x^{(i)}\|$ for $i = 1, \ldots, n$.
1. let $i_1, \ldots, i_k \in \{1, \ldots, n\}$ be the indices of the $k$ nearest points to $x$.
1. among $y^{(i_1)}, \ldots, y^{(i_k)}$, choose the class label that apears more frequently, and assign it to $x$.

Below you can find an implementation of this algorithm.

In [10]:
import numpy as np
from collections import Counter

In [11]:
class KNN:
    def __init__(self, k = 2):
        self.k = k
    
    def fit(self, X, y):
        self.X = X
        self.y = y
    
    def predict(self, X_test):
        y_test = []
        for x in X_test:
            distances = np.linalg.norm(self.X - x, axis = 1)
            nearest_indices = np.argsort(distances)[:self.k]
            nearest_labels = self.y[nearest_indices]
        
            pred_label = Counter(nearest_labels).most_common(1)[0][0]
            y_test.append(pred_label)
        return y_test

In [12]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Create a KNN classifier with k=5 and euclidean distance
knn = KNN(5)

# Train the classifier on the training data
knn.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn.predict(X_test)

# Compute the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Accuracy: 1.0
