## Run 1: 

You should develop a simple k-nearest-neighbour classifier using the "tiny image" feature. The "tiny image" feature is one of the simplest possible image representations. One simply cropseach image to a square about the centre, and then resizes it to a small, fixed resolution (we recommend 16x16). The pixel values can be packed into a vector by concatenating each image row. It tends to work slightly better if the tiny image is made to have zero mean and unit length. You can choose the optimal k-value for the classifier.

#### Importing the necessary libraries

In [1]:
# Import necessary libraries
# Os- For interacting with the operating system
import os
# Numpy- For numerical operations
import numpy as np
# PIL Image- For handling images
from PIL import Image
# KNeighborsClassifier- For K-Nearest Neighbors classification
from sklearn.neighbors import KNeighborsClassifier  

#### Load images and labels, and perform training

In [2]:
# Function to load images and labels from a folder
def load_images(folder):
    # List to store image data and labels
    images = [] 
    labels = []
    # Iterate over each label in the folder
    for label in os.listdir(folder):  
        label_folder = os.path.join(folder, label)
        # Check if it's a directory
        if os.path.isdir(label_folder):  
            # Iterate over each file in the label folder
            for filename in os.listdir(label_folder): 
                # Check if it's a JPEG file
                if filename.endswith('.jpg'):  
                    # Open the image, convert to grayscale, resize to 16x16, and flatten
                    img = Image.open(os.path.join(label_folder, filename)).convert('L')
                    img = img.resize((16, 16))
                    img = np.array(img).flatten()
                    # Append the flattened image to the list
                    images.append(img)  
                    # Append the label to the list
                    labels.append(label)  
    # Convert lists to numpy arrays and return
    return np.array(images), np.array(labels)  

# Load training data
train_images, train_labels = load_images(r'C:/Users/DELL/Downloads/Coursework3/Coursework3/training/training')

# Calculate mean and standard deviation of training images
train_mean = np.mean(train_images)
train_std = np.std(train_images)

# Normalize the training images
train_images_normalized = (train_images - train_mean) / train_std

# Determine the number of classes and length of training data
num_classes = len(np.unique(train_labels))
data_length = len(train_labels)
print("Number of classes:", num_classes)
print("Length of training data:", data_length)

# Initialize k-nearest-neighbour classifier with k=15
knn_classifier = KNeighborsClassifier(n_neighbors=15)

# Train the classifier
knn_classifier.fit(train_images_normalized, train_labels)

# Compute accuracy on the training data
train_accuracy = knn_classifier.score(train_images_normalized, train_labels)
print("Accuracy on training data:", train_accuracy)

Number of classes: 15
Length of training data: 1500
Accuracy on training data: 0.25266666666666665


#### Testing on unseen test data

In [3]:
# Initialize empty lists to store test images, filenames, and true labels
test_images = []  
test_files = []  
test_folder = r'C:/Users/DELL/Downloads/Coursework3/Coursework3/testing/testing'
true_labels = []  

# Loop through each file in the test folder
for filename in os.listdir(test_folder):
    # Open the image, convert to grayscale, resize to 16x16, and flatten
    img = Image.open(os.path.join(test_folder, filename)).convert('L')
    img = img.resize((16, 16))
    img = np.array(img).flatten()
    # Append the flattened image nd filename to the lists
    test_images.append(img)  
    test_files.append(filename)
    
    # Extract the true label from the filename
    true_label = filename.split('.')[0]
    # Append the true label to the list
    true_labels.append(true_label) 

# Extract the true labels without the additional information
true_labels = [label.split('_')[0] for label in true_labels]

# Convert test images list to numpy array
test_images = np.array(test_images)

# Print the number of test images and true labels
print("Number of test images:", len(test_images))
print("Number of true labels:", len(true_labels))

# Make predictions on test images using the trained KNN classifier
predictions = knn_classifier.predict(test_images)

Number of test images: 2985
Number of true labels: 2985


#### Writing the results in run1.txt file

In [4]:
# Open a file named 'run1.txt' in write mode
with open('run1.txt', 'w') as file:
    # Iterate through each test filename and corresponding prediction
    for filename, prediction in zip(test_files, predictions):
        # Write the filename and its corresponding prediction to the file
        file.write(f"{filename} {prediction}\n")