# **Nearest Neighbors**

### **Tasks**
1. Implement the K-Nearest Neighbors (KNN) algorithm from scratch.
2. Evaluate the performance of the KNN model using a confusion matrix.
3. Optimize the model to achieve the best accuracy on CIFAR-10 dataset.


## **Part 1: Load the CIFAR-10 Dataset**
We'll use the CIFAR-10 dataset, which contains 60,000 32x32 color images in 10 classes.

We will load this dataset using the Keras library.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import cifar10
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the CIFAR-10 dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()


# Reshape the labels to be a 1D array
y_train = y_train.reshape(-1)
y_test = y_test.reshape(-1)

# Normalize the data (to improve distance calculations)
X_train = X_train.astype("float32") / 255.0
X_test = X_test.astype("float32") / 255.0
# Flatten the 32x32x3 images into 1D arrays for simpler processing in nearest neighbor
X_train_flat = X_train.reshape(X_train.shape[0], -1)
X_test_flat = X_test.reshape(X_test.shape[0], -1)

## **Part 2: Load the MNIST Dataset**
We'll use the MNIST dataset, which contains 70,000 28x28 color images in 10 classes.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score

# Load the MNIST dataset
mnist = fetch_openml('mnist_784')
X = mnist['data']
y = mnist['target'].astype(int)

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize the pixel values by scaling them to [0, 1].
X_train = X_train / 255.0
X_test = X_test / 255.0
print(X_train.shape, X_test.shape)

# Standardize the data
scaler = StandardScaler()
X_train_flat = scaler.fit_transform(X_train)
X_test_flat = scaler.transform(X_test)

(56000, 784) (14000, 784)


## **Part 2: Implement Nearest Neighbor**
Now, implement the nearest neighbor algorithm from scratch (data strcture? hyperparameters?).

In [None]:
class NearestNeighbor:
    def __init__(self, k):
        pass

    def fit(self, X_train, y_train):
        pass

    def predict(self, X_test):
        pass

In [None]:
nn = NearestNeighbor()
nn.fit(X_train_flat, y_train)

# Make predictions
y_pred = nn.predict(X_test_flat)

## **Part 3: Evaluate with Confusion Matrix and Accuracy**
Here, you need to calculate the confusion matrix and accuracy for the CIFAR-10 classification task.


In [None]:
# Calculate confussion matrix
print("Confusion Matrix:")

# Calculate accuracy
print("\nAccuracy:")

## **Part 4: Visualizing Some Predictions**
You can visualize some predictions to better understand how well the model performs.


In [None]:
def plot_sample_images(X, y_true, y_pred=None):
    fig, axes = plt.subplots(1, 5, figsize=(15, 3))
    for i, ax in enumerate(axes):
        ax.imshow(X[i])
        ax.axis('off')
        title = f'True: {y_true[i]}'
        if y_pred is not None:
            title += f'\nPred: {y_pred[i]}'
        ax.set_title(title)
    plt.show()


plot_sample_images(X_test, y_test, y_pred)

## **Part 5: Improve the Model**

Try different distance metrics and hyperparameters for better performance. Report your best performance (accuracy + confussion matrix + precition + recall + f1 score).