# Assignment 3: Classification of Image Data
- COMP 551 Winter 2024, Mcgill University
- Rudolf Kischer: 260956107


### Synopsis
- In this miniproject, we will implement a multilayer perceptron from scratch, and use it to classify image data. The goal is to implement a basic neural network and its training algorithm from scratch and get hands-on experience with important decisions that you have to make while training these models. You will also have a chance to experiment with convolutional neural networks.

# Data

- [Sign Language MNIST](https://www.kaggle.com/datasets/datamunge/sign-language-mnist/data)
- Features: 28 x 28 grayscale images of hand symbols (784 pixels/features)
- Labels: 24 classes of letters (excludes 9=J and 25=Z because they require motion)
- Train: 27,455
- Test: 7172 
- Most of the images are produced through an alteration of 1704 uncropped color images
- These alternations include the following:
    - "To create new data, an image pipeline was used based on ImageMagick and included cropping to hands-only, gray-scaling, resizing, and then creating at least 50+ variations to enlarge the quantity. The modification and expansion strategy was filters ('Mitchell', 'Robidoux', 'Catrom', 'Spline', 'Hermite'), along with 5% random pixelation, +/- 15% brightness/contrast, and finally 3 degrees rotation."
- CSV format, (label, pixel1, pixel2, ... , pixel784)
- <img width=400 src="https://storage.googleapis.com/kagglesdsdata/datasets/3258/5337/amer_sign3.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2@kaggle-161607.iam.gserviceaccount.com/20240314/auto/storage/goog4_request&X-Goog-Date=20240314T192335Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=5090d6842cb28ba5080a37a44706cfab4cba880c104d7acf1510df2a187f3c644bccde7d786cf964d8704f172a1b288bff914ae767deace400edfbd0610023d7cc6c6e329c2d365dc9f5a81c6bfe641800d6c7ecb500470fb48cabf2b555080be0f07559522be5487e6f3f456e8c20b909a818ffd6eaf2658089c82659443e1df42d0c06956fd5f46d9d1b9dfd6458ab03e47796b278463a2d1ebbeac2328b7ba668662807ce3b138e72afca7e9f29d4d01854d0ed4e8416afc4206787976e861cc0f14d9755542f06ee1a52e71e16a112f7e2e1e53a6136d711f54a64e8ad531c07083108fd034a1bf8cf04a5c9a13f94ca6fa0291fb1c60dc9b7629095b1a9"/>
- <img width=400 src="https://storage.googleapis.com/kagglesdsdata/datasets/3258/5337/american_sign_language.PNG?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2@kaggle-161607.iam.gserviceaccount.com/20240315/auto/storage/goog4_request&X-Goog-Date=20240315T200917Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=208afda814e246ab63f6d5f39436a53bcb0aa69be2ca78290e4caf450a4b47f8ea3fd9686815a4836e37c35682e0cc70c122f52f664ee18fa36407caa91f3f00b3874aa926ca4fab2d2366e114da10cad6014c1e8d978a80a150c45e3b4b5a855756a5d8ac9e1c606674728b5868a48e954329c9b41af9a3a0b912fedecf4d2bca40407add7f87d4c4bd57a423dbb4257b73fe0bf5830b81eadea549a41dd70b47c9acc9150078416f517b2814578506b379aee8543fe99f8e060ac978dd21dfc9dab5d7702a1d16b9ccc330500f9204e43ca21462f6bb3f48a70a70c7445f88a0a02e96a587c4babb965eb12adc6b2ca82e3cd6e91026f5204e93edc8bb82f2"/>


# Setup

In [None]:
# IMPORTS

import csv
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelBinarizer
import matplotlib.pyplot as plt

## Data Processing
- To prepare the data for usage we need to:
  - download the dataset
  - load the dataset
  - seperate into X and Y
  - vectorize the image data
  - Center and normalize the data

In [None]:

def download_mnist_sign_language():
  kaggle_command = f'kaggle datasets download -d datamunge/sign-language-mnist'


In [None]:
def load_csv_dataset(csv_path):
  df = pd.read_csv(csv_path)
  return df

def load_mnist_sign_dataset():
  dataset_directory = 'data/archive/'
  test_set_path = f'{dataset_directory}/sign_mnist_test/sign_mnist_test.csv'
  train_set_path = f'{dataset_directory}/sign_mnist_train/sign_mnist_train.csv'
  df_test = pd.read_csv(test_set_path)
  df_train = pd.read_csv(train_set_path)
  # df_train.info()
  # df_test.info()
  return df_test, df_train

def shape(df):
  Y = df['label']
  lb = LabelBinarizer()
  Y = lb.fit_transform(Y)
  X = df.drop(['label'],axis=1)
  X = X.values.reshape(-1,28,28,1)
  # one hot encode
  
  return X, Y

def standardize_data(X):
  # center by subtracting the mean
  # divide by the standard deviation
  # N X D 
  # N X W X H X C
  # copy the data
  X = X.copy()
  X = X - X.mean(axis=0)
  X = X / X.std(axis=0)
  return X

def display_image(X, Y, index):
  plt.imshow(X[index].reshape(28,28), cmap='gray')
  alphanumeric = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
  plt.title(f'{alphanumeric[np.argmax(Y[index])]}')
  plt.show()

def display_images(X, Y, indices, title=None):
  # do a multiplot
  # that displays the the first 9 images
  # with their labels
  fig, ax = plt.subplots(3, 3, figsize=(10,10))
  # add title
  if title:
    fig.suptitle(title)
  alphanumeric = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
  for i in range(3):
    for j in range(3):
      ax[i,j].imshow(X[indices[i*3+j]].reshape(28,28), cmap='gray')
      ax[i,j].set_title(f'{indices[i*3+j]}: {alphanumeric[np.argmax(Y[indices[i*3+j]])]}')

  

test, train = load_mnist_sign_dataset()
X_e, Y_e = shape(test)
X_t, Y_t = shape(train)

display_images(X_e, Y_e, [0,1,2,3,4,5,6,7,8], 'Test Set')
display_images(X_t, Y_t, [0,1,2,3,4,5,6,7,8], 'Train Set')


# X_e = standardize_data(X_e)
X_t = standardize_data(X_t)

#display first 5 images
# display_images(X_e, Y_e, [0,1,2,3,4,5,6,7,8], 'Test Set Standardized')
display_images(X_t, Y_t, [0,1,2,3,4,5,6,7,8], 'Train Set Standardized')

# print the dimensions
print(f'Test Set: X:{X_e.shape} Y:{Y_e.shape}')
print(f'Train Set: X:{X_t.shape} Y:{Y_t.shape}')




# Model
- We will be implementing a deep multi layered perceptron
- We want it to be custimizable for different depths and breadths as well as activation functions for experimentations
- We will also want to implement convolutional layers as well to test the performance increase
- Each model will be composed of multiple layers, each which has a forward and backward function to update the weights of that layer


#### Activation functions

In [None]:
def relu(Z):
  return np.maximum(0, Z)

def sigmoid(Z):
  return 1 / (1 + np.exp(-Z))

def softmax(Z):
  expZ = np.exp(Z - np.max(Z))
  return expZ / expZ.sum(axis=1, keepdims=True)

### Linear Layer

# Experiments

### Exp. 1: MLP Layer Depth

### Exp. 2: MLP activation Function

### Exp. 3: Regularization

### Exp. 4: Convultional Neural Net

### Exp. 5: Optimizing MLP Architecture

### Exp. 6: Report Results

# Experiment Extensions

# Results And Conclusion