## KNN

KNN (K-Nearest Neighbors) is a machine learning algorithm used for classification and regression tasks. It is a non-parametric algorithm, which means that it does not make any assumptions about the distribution of the data.

In KNN algorithm, the training dataset is used to create a model, which is then used to classify new data points. When a new data point is to be classified, the algorithm searches for the K nearest data points in the training set and assigns the class label based on the majority class among the K-nearest neighbors.

The number K is a hyperparameter that needs to be specified prior to training the model. A small value of K may lead to overfitting, whereas a large value of K may lead to underfitting. Thus, choosing the appropriate value of K is crucial in achieving good performance.

KNN algorithm can be used for both classification and regression tasks. For classification tasks, the output is a categorical variable, whereas for regression tasks, the output is a continuous variable.

KNN algorithm is easy to implement and interpret, and can be used with any number of classes. However, it can be computationally expensive, especially when the size of the training dataset is large. Additionally, the algorithm can be sensitive to the choice of distance metric used to calculate the distance between data points.

In [None]:
import numpy as np
from collections import Counter

def euclidean_distance(x1, x2):
    return np.sqrt(np.sum((x1 - x2) ** 2))

class KNN:
    def __init__(self, k=3):
        self.k = k
        
    def fit(self, X, y):
        self.X_train = X
        self.y_train = y
    
    def predict(self, X):
        predictions = [self._predict(x) for x in X]
        return predictions
        
    def _predict(self, x):
        distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
        k_idx = np.argsort(distances)[:self.k]
        k_nearest_labels = [self.y_train[i] for i in k_idx]
        most_common = Counter(k_nearest_labels).most_common()
        return most_common[0][0]
    

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

clf = KNN(k=5)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)

print(predictions)

acc = np.sum(predictions == y_test) / len(y_test)
print(acc)