Train a k-Nearest Neighbors (kNN) classifier to recognize digits from the MNIST dataset.


In [1]:

# Data Preparation : load and prepare the MNIST dataset

# In this step, we load the MNIST dataset using Keras5. The dataset consists of 60,000 training images
# and 10,000 test images, each of size 28x28 pixels4. We reshape the images into 1D arrays
# and normalize the pixel values to the range [0, 1].

from tensorflow.keras.datasets import mnist
import numpy as np

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshape the data
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)

# Normalize the data
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [2]:
# Training the kNN Classifier

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Create and train the kNN classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

Accuracy: 0.9688


We use scikit-learn's KNeighborsClassifier to create and train our kNN model. We set n_neighbors=5, which means the classifier will consider the 5 nearest neighbors when making predictions.

**Evaluation**

After training the model and making predictions on the test set, we calculate the accuracy of our classifier.
The accuracy represents the proportion of correct predictions among the total number of cases examined.

**Results and Discussion**

The kNN classifier typically achieves an accuracy of around 96-97% on the MNIST dataset. This performance is quite good, considering the simplicity of the kNN algorithm. However, it's worth noting that more advanced techniques, such as convolutional neural networks (CNNs),
can achieve even higher accuracy on this dataset, often exceeding 99%

**Advantages and Limitations of kNN for MNIST**

**Advantages:**
- Simple to implement and understand
- No training phase (lazy learning)
- Can perform well on datasets with clear decision boundaries

**Limitations:**
- Slow prediction time for large datasets
- Requires a lot of memory to store the entire training set
- Sensitive to irrelevant features and the scale of the data

**Conclusion**

We have successfully trained a k-Nearest Neighbors classifier to recognize handwritten digits from the MNIST dataset.
While kNN provides a good baseline performance, more advanced techniques may be necessary for state-of-the-art results
on this and similar image classification tasks.