# Lab Assignment 1

Student name: [Mukund Mahesan]

## Notebook version

This notebook includes all the codes in the codebase of lab assignment 1. Completing and submitting this script is equivalent to submitting the codebase. Please note that your submitted script should include errorless cell outputs that contain necessary information that proves you have successfully run the notebook in your own directory.

You can choose to (1) run this notebook locally on your end or (2) run this notebook on colab. For the former, you will need to download the dataset to your device that resembles the instructions for the codebase. For the latter, **you will need to upload the dataset to your Google Drive** account, and connect your colab notebook to your Google Drive. Then, go to "File->Save a copy in Drive" to create a copy you can edit.


#### Colab (if applicable)

If you are running this script on colab, uncomment and run the cell below:

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

Note that the Google Drive directory has the root `/content/drive/`. For instance, my directory to the dataset is `'/content/drive/My Drive/Courses/CSCI 5922/CSCI 5922 SP25/Demo/MNIST/'`.

### mnist.py

In [27]:
#Original source: https://www.kaggle.com/code/hojjatk/read-mnist-dataset
#It has been modified for ease of use w/ pytorch

#You do NOT need to modify ANY code in this file!

import numpy as np
import struct
from array import array
import torch

class MnistDataloader(object):
    def __init__(self, training_images_filepath,training_labels_filepath,
                 test_images_filepath, test_labels_filepath):
        self.training_images_filepath = training_images_filepath
        self.training_labels_filepath = training_labels_filepath
        self.test_images_filepath = test_images_filepath
        self.test_labels_filepath = test_labels_filepath

    def read_images_labels(self, images_filepath, labels_filepath):
        n = 60000 if "train" in images_filepath else 10000
        labels = torch.zeros((n, 10))
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            l = torch.tensor(array("B", file.read())).unsqueeze(-1)
            l = torch.concatenate((torch.arange(0, n).unsqueeze(-1), l), dim = 1).type(torch.int32)
            labels[l[:,0], l[:,1]] = 1

        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())
        images = torch.zeros((n, 28**2))
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            #img = img.reshape(28, 28)
            images[i, :] = torch.tensor(img)

        return images, labels

    def load_data(self):
        x_train, y_train = self.read_images_labels(self.training_images_filepath, self.training_labels_filepath)
        x_test, y_test = self.read_images_labels(self.test_images_filepath, self.test_labels_filepath)
        return (x_train, y_train),(x_test, y_test)

### activations.py

In [154]:
import torch

class ReLU():
    #Complete this class
    def forward(x: torch.tensor) -> torch.tensor:
        #implement ReLU(x) here
        return torch.max(torch.tensor(0, dtype=x.dtype, device=x.device), x)

    def backward(delta: torch.tensor, x: torch.tensor) -> torch.tensor:
        #implement delta * ReLU'(x) here
        return torch.where(x > 0, delta, torch.tensor(0., dtype=x.dtype, device=x.device))

class LeakyReLU():
    #Complete this class
    def forward(x: torch.tensor) -> torch.tensor:
        #implement LeakyReLU(x) here
        return torch.where(x > 0, x, x * 0.1)

    def backward(delta: torch.tensor, x: torch.tensor) -> torch.tensor:
        #implement delta * LeakyReLU'(x) here
        return torch.where(x > 0, delta, delta * 0.1)
        

### framework.py

In [167]:
import torch
import numpy as np
import tqdm

class MLP:
    '''
    This class should implement a generic MLP learning framework. The core structure of the program has been provided for you.
    But, you need to complete the following functions:
    1: initialize()
    2: forward(), including activations
    3: backward(), including activations
    4: TrainMLP()
    '''
    def __init__(self, layer_sizes: list[int]):
        #Storage for model parameters
        self.layer_sizes: list[int] = layer_sizes
        self.num_layers = len(layer_sizes) - 1
        self.weights: list[torch.tensor] = []
        self.biases: list[torch.tensor] = []

        #Temporary data
        self.features: list[torch.tensor] = []

        #hyper-parameters w/ default values
        self.learning_rate: float = 1
        self.batch_size: int = 1
        self.activation_function: callable[[torch.tensor], torch.tensor] = ReLU

    def set_hp(self, lr: float, bs: int, activation: object) -> None:
        self.learning_rate = lr
        self.batch_size = bs
        self.activation_function = activation

        return

    def initialize(self) -> None:
        #Complete this function

        '''
        initialize all biases to zero, and all weights with random sampling from a unifrom distribution.
        This uniform distribution should have range +/- sqrt(6 / (d_in + d_out))
        '''
        for i in range(self.num_layers):  # Iterate over layers (excluding input)
            d_in, d_out = self.layer_sizes[i], self.layer_sizes[i + 1]
            limit = (6 / (d_in + d_out)) ** 0.5
            W = torch.empty(d_out, d_in).uniform_(-limit, limit)
            self.weights.append(W)
            temp_b = torch.zeros(d_out)
            self.biases.append(temp_b)


        return

    def forward(self, x: torch.tensor) -> torch.tensor:
        #Complete this function

        '''
        This function should loop over all layers, forward propagating the input via:
        x_i+1 = f(x_iW + b)
        Remember to STORE THE INTERMEDIATE FEATURES!
        '''
        self.features=[x]
        z=x
        for i in range(self.num_layers):
            z=torch.matmul(z, self.weights[i].T) + self.biases[i]
            if i < self.num_layers -1:
                z=self.activation_function.forward(z)
            self.features.append(z)

        z=(torch.exp(z))
        return z/ (torch.sum(z, dim=1, keepdim=True))

    def backward(self, delta: torch.tensor) -> None:
        #Complete this function

        '''
        This function should backpropagate the provided delta through the entire MLP, and update the weights according to the hyper-parameters
        stored in the class variables.
        '''
        batch_size = delta.shape[0]
        gradients_w = []
        gradients_b = []

        for i in reversed(range(self.num_layers)):
            # if i < self.num_layers - 1:
            #     delta = self.activation_function.backward(delta, self.features[i+1])
            
            gradients_w.insert(0, torch.matmul(delta.T, self.features[i]) / batch_size)
            gradients_b.insert(0, torch.mean(delta, dim = 0))

            if i > 0:
                delta = torch.matmul(delta, self.weights[i])
                delta = self.activation_function.backward(delta, self.features[i])
        
        for i in range(self.num_layers):
            self.weights[i] -= self.learning_rate * gradients_w[i]
            self.biases[i] -= self.learning_rate * gradients_b[i]
        return


def TrainMLP(model: MLP, x_train: torch.tensor, y_train: torch.tensor) -> MLP:
    #Complete this function

    '''
    This function should train the MLP for 1 epoch, using the provided data and forward/backward propagating as necessary.
    '''

    #set up a random sampling of the data
    bs = model.batch_size
    N = x_train.shape[0]
    rng = np.random.default_rng()
    idx = rng.permutation(N)

    #variable to accumulate total loss over the epoch
    L = 0

    for i in tqdm.tqdm(range(N // bs)):
        x = x_train[idx[i * bs:(i + 1) * bs], ...]
        y = y_train[idx[i * bs:(i + 1) * bs], ...]

        #forward propagate and compute loss (l) here
        prediction=model.forward(x)
        l=-(torch.sum(y* torch.log(prediction)))
        if not torch.isnan(l):
            L += l
        delta=prediction-y
        #backpropagate here
        model.backward(delta)

    print("Train Loss:", L / ((N // bs) * bs))
    return


def TestMLP(model: MLP, x_test: torch.tensor, y_test: torch.tensor) -> tuple[float, float]:
    bs = model.batch_size
    N = x_test.shape[0]

    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0
    A = 0

    for i in tqdm.tqdm(range(N // bs)):
        x = x_test[idx[i * bs:(i + 1) * bs], ...]
        y = y_test[idx[i * bs:(i + 1) * bs], ...]

        y_hat = model.forward(x)
        p = torch.exp(y_hat)
        p /= torch.sum(p, dim = 1, keepdim = True)
        l = -1 * torch.sum(y * torch.log(p))
        L += l

        A += torch.sum(torch.where(torch.argmax(p, dim = 1) == torch.argmax(y, dim = 1), 1, 0))

    print("Test Loss:", L / ((N // bs) * bs), "Test Accuracy: {:.2f}%".format(100 * A / ((N // bs) * bs)))

def normalize_mnist() -> tuple[torch.tensor, torch.tensor, torch.tensor, torch.tensor]:
    '''
    This function loads the MNIST dataset, then normalizes the "X" values to have zero mean, unit variance.
    '''

    #IMPORTANT!!!#
    #UPDATE THE PATH BELOW!#
    base_path = "/Users/mukund/Documents/Neural Nets/MNIST/"
    #^^^^^^^^#


    mnist = MnistDataloader(base_path + "train-images.idx3-ubyte", base_path + "train-labels.idx1-ubyte",
                            base_path + "t10k-images.idx3-ubyte", base_path + "t10k-labels.idx1-ubyte")
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_mean = torch.mean(x_train, dim = 0, keepdim = True)
    x_std = torch.std(x_train, dim = 0, keepdim = True)

    x_train -= x_mean
    x_train /= x_std
    x_train[x_train != x_train] = 0

    x_test -= x_mean
    x_test /= x_std
    x_test[x_test != x_test] = 0

    return x_train, y_train, x_test, y_test

def main():
    '''
    This is an example of how to use the framework when completed. You can build off of this code to design your experiments for part 2.
    '''

    x_train, y_train, x_test, y_test = normalize_mnist()

    '''
    For the experiment, adjust the list [784,...,10] as desired to test other architectures.
    You are encouraged to play around with any of the following values if you so desire:
    E, lr, bs, activation
    '''

    model = MLP([784, 256, 10])
    model.initialize()
    model.set_hp(lr = 1e-6, bs = 512, activation = ReLU)

    E = 25
    for _ in range(E):
        TrainMLP(model, x_train, y_train)
        TestMLP(model, x_test, y_test)

if __name__ == "__main__":
    main()

100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1089.88it/s]


Train Loss: tensor(2.9120)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2031.66it/s]


Test Loss: tensor(nan) Test Accuracy: 9.17%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1158.53it/s]


Train Loss: tensor(2.9113)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2229.76it/s]


Test Loss: tensor(nan) Test Accuracy: 9.20%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1279.26it/s]


Train Loss: tensor(2.9103)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2342.43it/s]


Test Loss: tensor(nan) Test Accuracy: 9.15%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1426.34it/s]


Train Loss: tensor(2.9094)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2248.51it/s]


Test Loss: tensor(2.3151) Test Accuracy: 9.18%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1403.50it/s]


Train Loss: tensor(2.9086)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1983.71it/s]


Test Loss: tensor(nan) Test Accuracy: 9.29%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1395.16it/s]


Train Loss: tensor(2.9077)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2407.53it/s]


Test Loss: tensor(nan) Test Accuracy: 9.32%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1263.04it/s]


Train Loss: tensor(2.9065)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2150.52it/s]


Test Loss: tensor(nan) Test Accuracy: 9.19%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1347.68it/s]


Train Loss: tensor(2.9057)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2356.42it/s]


Test Loss: tensor(nan) Test Accuracy: 9.25%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1360.91it/s]


Train Loss: tensor(2.9050)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2330.37it/s]


Test Loss: tensor(nan) Test Accuracy: 9.30%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1373.88it/s]


Train Loss: tensor(2.9040)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2317.09it/s]


Test Loss: tensor(nan) Test Accuracy: 9.32%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1394.28it/s]


Train Loss: tensor(2.9031)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2125.79it/s]


Test Loss: tensor(nan) Test Accuracy: 9.34%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1191.24it/s]


Train Loss: tensor(2.9025)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1573.26it/s]


Test Loss: tensor(nan) Test Accuracy: 9.38%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1269.48it/s]


Train Loss: tensor(2.9009)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1945.08it/s]


Test Loss: tensor(nan) Test Accuracy: 9.39%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1243.75it/s]


Train Loss: tensor(2.9003)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1798.06it/s]


Test Loss: tensor(nan) Test Accuracy: 9.42%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1048.07it/s]


Train Loss: tensor(2.8995)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1973.99it/s]


Test Loss: tensor(nan) Test Accuracy: 9.46%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1013.52it/s]


Train Loss: tensor(2.8991)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1832.92it/s]


Test Loss: tensor(nan) Test Accuracy: 9.32%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1008.32it/s]


Train Loss: tensor(2.8981)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1422.91it/s]


Test Loss: tensor(nan) Test Accuracy: 9.43%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1000.02it/s]


Train Loss: tensor(2.8970)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1594.60it/s]


Test Loss: tensor(nan) Test Accuracy: 9.46%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1080.84it/s]


Train Loss: tensor(2.8964)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1504.95it/s]


Test Loss: tensor(nan) Test Accuracy: 9.35%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1009.87it/s]


Train Loss: tensor(2.8955)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1674.87it/s]


Test Loss: tensor(nan) Test Accuracy: 9.49%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1007.40it/s]


Train Loss: tensor(2.8944)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1647.17it/s]


Test Loss: tensor(nan) Test Accuracy: 9.45%


100%|████████████████████████████████████████| 117/117 [00:00<00:00, 988.14it/s]


Train Loss: tensor(2.8929)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1687.56it/s]


Test Loss: tensor(nan) Test Accuracy: 9.47%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1169.87it/s]


Train Loss: tensor(2.8930)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 1839.82it/s]


Test Loss: tensor(nan) Test Accuracy: 9.49%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1369.17it/s]

Train Loss: 




tensor(2.8920)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2449.49it/s]


Test Loss: tensor(nan) Test Accuracy: 9.42%


100%|███████████████████████████████████████| 117/117 [00:00<00:00, 1344.24it/s]


Train Loss: tensor(2.8908)


100%|█████████████████████████████████████████| 19/19 [00:00<00:00, 2001.45it/s]

Test Loss: tensor(nan) Test Accuracy: 9.41%



