In [10]:
# CAP 5415 Programming Assignment 03: Nearest Neighbor Classification

"""
Due Date: 1ONov2023
Author: Lam Nguyen

Subject: Nearest Neighbor Classification [2.5 pts]

Overview:

Implement the Nearest Neighbor Classifier for digit classification. We will use the digit dataset available from the sklearn library. 

Tasks:

1. Import and process the dataset.
    a. There are around 1800 images 
    b. 10 digit classes
    c. Each image is 8x8 single channel.
    d. Split the dataset into training and testing, keep 500 images for testing
        i. Choose randomly with 50 images per class

2. Implement Neighbor Classification using pixels as features. Test the method for classification accuracy.

3. Implement a k-nearest neighbor classifier using pixels as features.
    a. Test method for k=3,5, and 7 and compute classification accuracy.

4. Create a short writeup about implementation with results:
    1. Accuracy scores for all the variations
    2. Compare the variations using accuracy scores.
    3. Comment of how the accuracy changes when you increase the value of k

Note: You can use L2-Norm for distance between 2 samples.
        

Sources:

Train Test Split: https://www.geeksforgeeks.org/how-to-split-the-dataset-with-scikit-learns-train_test_split-function/
K-Nearest Neighbors: https://www.youtube.com/watch?v=ngLyX54e1LU

"""

'\nDue Date: 1ONov2023\nAuthor: Lam Nguyen\n\nSubject: Nearest Neighbor Classification [2.5 pts]\n\nOverview:\n\nImplement the Nearest Neighbor Classifier for digit classification. We will use the digit dataset available from the sklearn library. \n\nTasks:\n\n1. Import and process the dataset.\n    a. There are around 1800 images \n    b. 10 digit classes\n    c. Each image is 8x8 single channel.\n    d. Split the dataset into training and testing, keep 500 images for testing\n        i. Choose randomly with 50 images per class\n\n2. Implement Neighbor Classification using pixels as features. Test the method for classification accuracy.\n\n3. Implement a k-nearest neighbor classifier using pixels as features.\n    a. Test method for k=3,5, and 7 and compute classification accuracy.\n\n4. Create a short writeup about implementation with results:\n    1. Accuracy scores for all the variations\n    2. Compare the variations using accuracy scores.\n    3. Comment of how the accuracy chang

In [11]:
# ========================================================================================#
# 1. Load Modules
# ========================================================================================#

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import numpy as np
from collections import Counter


def euclidean_distance(x1, x2):
    return np.sqrt(np.sum((x1-x2)**2))


In [12]:
# ========================================================================================#
# 2. Load and Process Data
# ========================================================================================#


digits = load_digits() # Create instance of dataset

image_data = digits.data # Create instance of data arrays without labels
image_targets = digits.target # Create instance of data labels


# Split data into Training and test groups
train_features,test_features, train_labels,test_labels = train_test_split(
image_data, image_targets, test_size=500, random_state=42
)





In [13]:
# ========================================================================================#
# 3. Create K-Nearest Neighbors Classifier
# ========================================================================================#

class K_Nearest_Neighbors:
    
    def __init__(self, k):
        self.k = k


    # Used for the training Method
    def fit(self, features,labels):
        self.train_features = features
        self.train_labels = labels

    # For predicting new samples
    def predict(self, features):
        # Can receive multiple samples
        predicted_labels = [self._predict(x) for x in features ] # Input single value x into predict from the array called features using list comprehension
        return np.array(predicted_labels)


    
    def _predict(self,x):
        # Compute Distances of nearby points from the test point
        # Compute the distance of 1 new sample x with all the k-nearest training samples using list comprehension
        distances = [euclidean_distance(x, features) for features in self.train_features]

        # Now that we have all the distances, now we get k-nearest samples and labels.
        # We have to sort our distances
        k_sorted_indices = np.argsort(distances)[:self.k]
        k_nearest_labels = [self.train_labels[i] for i in k_sorted_indices]

        # Get majority vote, get the most common class label
        most_common = Counter(k_nearest_labels).most_common(1) # Get the first most common item. Returns the item in the list as well as the number of times that item has appeared.

        return most_common[0][0] # Only want to return the label, and not the number of times the label has appeared. The most_common function also returns the number of times the label has appeared.








In [16]:
# ========================================================================================#
# 4. Implement and test accuracy 
# ========================================================================================#

# Different K Values. Note that using K=1 is the implementing the Nearest Neighbor Classifier  as specified in the assignment.

# The only difference between the K-Nearest Neighbors and Nearest Neighbor is that the Nearest Neighbor Classifier only uses one neighbor (k=1) for comparison, versus K-Nearest neighbors use multiple nearest values as specified by k.
k=[1,3,5,7]

for value in k:

    model = K_Nearest_Neighbors(k=value)
    model.fit(train_features,train_labels)
    prediction = model.predict(test_features)
    accuracy = np.sum(prediction == test_labels) / len(test_labels)
    print(f'Accuracy with a k value of {value}: {accuracy}')


Accuracy with a k value of 1: 0.984
Accuracy with a k value of 3: 0.99
Accuracy with a k value of 5: 0.992
Accuracy with a k value of 7: 0.992
