# Lab Assignment 1

Student name: Cornelius Adejoro

## Notebook version

This notebook includes all the codes in the codebase of lab assignment 1. Completing and submitting this script is equivalent to submitting the codebase. Please note that your submitted script should include errorless cell outputs that contain necessary information that proves you have successfully run the notebook in your own directory.

You can choose to (1) run this notebook locally on your end or (2) run this notebook on colab. For the former, you will need to download the dataset to your device that resembles the instructions for the codebase. For the latter, **you will need to upload the dataset to your Google Drive** account, and connect your colab notebook to your Google Drive. Then, go to "File->Save a copy in Drive" to create a copy you can edit.


#### Colab (if applicable)

If you are running this script on colab, uncomment and run the cell below:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Note that the Google Drive directory has the root `/content/drive/`. For instance, my directory to the dataset is `'/content/drive/My Drive/Courses/CSCI 5922/CSCI 5922 SP25/Demo/MNIST/'`.

### mnist.py

In [None]:
#Original source: https://www.kaggle.com/code/hojjatk/read-mnist-dataset
#It has been modified for ease of use w/ pytorch

#You do NOT need to modify ANY code in this file!

import numpy as np
import struct
from array import array
import torch

class MnistDataloader(object):
    def __init__(self, training_images_filepath,training_labels_filepath,
                 test_images_filepath, test_labels_filepath):
        self.training_images_filepath = training_images_filepath
        self.training_labels_filepath = training_labels_filepath
        self.test_images_filepath = test_images_filepath
        self.test_labels_filepath = test_labels_filepath

    def read_images_labels(self, images_filepath, labels_filepath):
        n = 60000 if "train" in images_filepath else 10000
        labels = torch.zeros((n, 10))
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            l = torch.tensor(array("B", file.read())).unsqueeze(-1)
            l = torch.concatenate((torch.arange(0, n).unsqueeze(-1), l), dim = 1).type(torch.int32)
            labels[l[:,0], l[:,1]] = 1

        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())
        images = torch.zeros((n, 28**2))
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            #img = img.reshape(28, 28)
            images[i, :] = torch.tensor(img)

        return images, labels

    def load_data(self):
        x_train, y_train = self.read_images_labels(self.training_images_filepath, self.training_labels_filepath)
        x_test, y_test = self.read_images_labels(self.test_images_filepath, self.test_labels_filepath)
        return (x_train, y_train),(x_test, y_test)

### activations.py

In [None]:
import torch

class ReLU():
  #Complete this class
    def forward(self, x: torch.tensor) -> torch.tensor:
        return torch.maximum(x, torch.tensor(0.0))

    def backward(self, delta: torch.tensor, x: torch.tensor) -> torch.tensor:
        return delta * (x > 0).float()


class LeakyReLU():
  #Complete this class
    def forward(self, x: torch.tensor) -> torch.tensor:
        return torch.maximum(0.1 * x, x)

    def backward(self, delta: torch.tensor, x: torch.tensor) -> torch.tensor:
        return delta * torch.where(x > 0, torch.tensor(1.0), torch.tensor(0.1))

### framework.py

In [None]:
import torch
import numpy as np
import tqdm

class MLP:
    '''
    This class implements a generic MLP learning framework. The core structure is provided,
    but key functions such as initialize(), forward(), backward(), and TrainMLP() have been completed.
    '''
    def __init__(self, layer_sizes: list[int]):
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes) - 1
        self.weights = []
        self.biases = []
        self.features = []

        # Hyperparameters with default values
        self.learning_rate = 1e-6
        self.batch_size = 512
        self.epoch = 25
        self.activation_function = ReLU()

    def set_hp(self, lr: float, bs: int, activation: object) -> None:
        self.learning_rate = lr
        self.batch_size = bs
        self.activation_function = activation

    def initialize(self) -> None:
        # Initialize weights with uniform distribution +/- sqrt(6 / (d_in + d_out))
        for i in range(self.num_layers):
            d_in, d_out = self.layer_sizes[i], self.layer_sizes[i + 1]
            bound = np.sqrt(6 / (d_in + d_out))
            self.weights.append(torch.empty((d_in, d_out)).uniform_(-bound, bound))
            #self.weights.append(torch.rand((d_in, d_out)) * 2 * bound - bound)
            self.biases.append(torch.zeros(1, d_out))

    def forward(self, x: torch.tensor) -> torch.tensor:
        # Forward propagation
        self.features = [x]
        for i in range(self.num_layers):
            x = x @ self.weights[i] + self.biases[i]
            x = self.activation_function.forward(x)
            self.features.append(x)
        logits = self.features[-1]
        softmax_x = torch.exp(logits) / torch.sum(torch.exp(logits), dim=1, keepdim=True)
        return softmax_x

    def backward(self, delta: torch.tensor) -> None:
      for i in reversed(range(self.num_layers)):
          # Gradient of weights and biases
          grad_w = self.features[i].T @ delta
          grad_b = delta.mean(dim=0)

          # Update weights and biases using gradient descent
          self.weights[i] -= self.learning_rate * grad_w
          self.biases[i] -= self.learning_rate * grad_b

          # Compute delta for the previous layer (only if it's not the input layer)
          if i > 0:
              # grad_w = self.weights[i]
              # delta = delta @ grad_w.T
              # delta = self.activation_function.backward(delta, self.features[i - 1])

              delta = self.activation_function.backward( (delta @ self.weights[i].T),(self.features[i]))

              # grad_b = delta.mean(dim=0)
              # delta = (delta @ self.weights[i].T) * (self.features[i] > 0).float()  # Apply ReLU derivative

def TrainMLP(model: MLP, x_train: torch.tensor, y_train: torch.tensor) -> MLP:
    bs = model.batch_size
    N = x_train.shape[0]
    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0  # Total loss

    for i in tqdm.tqdm(range(N // bs)):
        x = x_train[idx[i * bs:(i + 1) * bs], ...]
        y = y_train[idx[i * bs:(i + 1) * bs], ...]

        # Forward pass
        y_hat = model.forward(x)

        # Compute cross-entropy loss (convert one-hot to class indices)
        target = torch.argmax(y, dim=1)
        l = -torch.sum(y * torch.log(y_hat))
        L += l.item()

        # Backward pass
        delta = y_hat - y  # Gradient for softmax + cross-entropy
        model.backward(delta)

    print("Train Loss:", L / ((N // bs) * bs))

def TestMLP(model: MLP, x_test: torch.tensor, y_test: torch.tensor) -> tuple[float, float]:
    bs = model.batch_size
    N = x_test.shape[0]

    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0
    A = 0

    for i in tqdm.tqdm(range(N // bs)):
        x = x_test[idx[i * bs:(i + 1) * bs], ...]
        y = y_test[idx[i * bs:(i + 1) * bs], ...]

        y_hat = model.forward(x)
        target = torch.argmax(y, dim=1)

        # Cross-entropy loss
        l = torch.nn.functional.cross_entropy(y_hat, target)
        L += l.item()

        # Accuracy calculation
        predictions = torch.argmax(y_hat, dim=1)
        A += torch.sum(predictions == target).item()

    print("Test Loss:", L / ((N // bs) * bs), "Test Accuracy: {:.2f}%".format(100 * A / N))

def normalize_mnist() -> tuple[torch.tensor, torch.tensor, torch.tensor, torch.tensor]:
    base_path = "/content/drive/MyDrive/Neural_Networks/Data/"

    mnist = MnistDataloader(base_path + "train-images.idx3-ubyte", base_path + "train-labels.idx1-ubyte",
                            base_path + "t10k-images.idx3-ubyte", base_path + "t10k-labels.idx1-ubyte")
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_mean = torch.mean(x_train, dim=0, keepdim=True)
    x_std = torch.std(x_train, dim=0, keepdim=True)

    x_train -= x_mean
    x_train /= x_std
    x_train[x_train != x_train] = 0

    x_test -= x_mean
    x_test /= x_std
    x_test[x_test != x_test] = 0

    return x_train, y_train, x_test, y_test

def main():
    x_train, y_train, x_test, y_test = normalize_mnist()

    model = MLP([784, 256, 10])
    model.initialize()
    model.set_hp(lr=1e-6, bs=512, activation=ReLU())

    E = 25
    for _ in range(E):
        TrainMLP(model, x_train, y_train)
        TestMLP(model, x_test, y_test)

if __name__ == "__main__":
    main()


100%|██████████| 117/117 [00:02<00:00, 51.50it/s]


Train Loss: 2.382489379654583


100%|██████████| 19/19 [00:00<00:00, 86.11it/s]


Test Loss: nan Test Accuracy: 18.25%


100%|██████████| 117/117 [00:02<00:00, 57.37it/s]


Train Loss: 2.180727708034026


100%|██████████| 19/19 [00:00<00:00, 114.51it/s]


Test Loss: nan Test Accuracy: 28.66%


100%|██████████| 117/117 [00:02<00:00, 58.47it/s]


Train Loss: 1.976637112788665


100%|██████████| 19/19 [00:00<00:00, 125.63it/s]


Test Loss: nan Test Accuracy: 40.06%


100%|██████████| 117/117 [00:01<00:00, 82.95it/s]


Train Loss: 1.7913277403921144


100%|██████████| 19/19 [00:00<00:00, 137.92it/s]


Test Loss: nan Test Accuracy: 48.99%


100%|██████████| 117/117 [00:01<00:00, 77.10it/s]


Train Loss: 1.631622124940921


100%|██████████| 19/19 [00:00<00:00, 147.46it/s]


Test Loss: nan Test Accuracy: 54.85%


100%|██████████| 117/117 [00:01<00:00, 78.39it/s]


Train Loss: 1.4970486785611536


100%|██████████| 19/19 [00:00<00:00, 152.22it/s]


Test Loss: nan Test Accuracy: 59.53%


100%|██████████| 117/117 [00:01<00:00, 70.88it/s]


Train Loss: 1.384220854848878


100%|██████████| 19/19 [00:00<00:00, 116.85it/s]


Test Loss: nan Test Accuracy: 63.16%


100%|██████████| 117/117 [00:01<00:00, 68.95it/s]


Train Loss: 1.2892259787290523


100%|██████████| 19/19 [00:00<00:00, 142.36it/s]


Test Loss: nan Test Accuracy: 65.95%


100%|██████████| 117/117 [00:01<00:00, 75.97it/s]


Train Loss: 1.2077473422400973


100%|██████████| 19/19 [00:00<00:00, 113.20it/s]


Test Loss: nan Test Accuracy: 68.25%


100%|██████████| 117/117 [00:02<00:00, 54.82it/s]


Train Loss: 1.1381452002077022


100%|██████████| 19/19 [00:00<00:00, 98.24it/s]


Test Loss: nan Test Accuracy: 70.24%


100%|██████████| 117/117 [00:02<00:00, 51.71it/s]


Train Loss: 1.0776699154804914


100%|██████████| 19/19 [00:00<00:00, 113.79it/s]


Test Loss: nan Test Accuracy: 72.05%


100%|██████████| 117/117 [00:01<00:00, 65.44it/s]


Train Loss: 1.024091926904825


100%|██████████| 19/19 [00:00<00:00, 142.87it/s]


Test Loss: nan Test Accuracy: 73.47%


100%|██████████| 117/117 [00:01<00:00, 79.47it/s]


Train Loss: 0.9772219005812947


100%|██████████| 19/19 [00:00<00:00, 150.54it/s]


Test Loss: nan Test Accuracy: 74.52%


100%|██████████| 117/117 [00:01<00:00, 79.83it/s]


Train Loss: 0.9356629761875185


100%|██████████| 19/19 [00:00<00:00, 157.03it/s]


Test Loss: nan Test Accuracy: 75.92%


100%|██████████| 117/117 [00:01<00:00, 77.83it/s]


Train Loss: 0.8981687253356999


100%|██████████| 19/19 [00:00<00:00, 139.22it/s]


Test Loss: nan Test Accuracy: 76.82%


100%|██████████| 117/117 [00:01<00:00, 80.75it/s]


Train Loss: 0.8645242855080173


100%|██████████| 19/19 [00:00<00:00, 149.36it/s]


Test Loss: nan Test Accuracy: 77.57%


100%|██████████| 117/117 [00:01<00:00, 79.87it/s]


Train Loss: 0.8343855550146511


100%|██████████| 19/19 [00:00<00:00, 152.43it/s]


Test Loss: nan Test Accuracy: 78.39%


100%|██████████| 117/117 [00:01<00:00, 72.62it/s]


Train Loss: 0.8068519015597482


100%|██████████| 19/19 [00:00<00:00, 92.26it/s] 


Test Loss: nan Test Accuracy: 78.95%


100%|██████████| 117/117 [00:02<00:00, 54.35it/s]


Train Loss: 0.7817512149484749


100%|██████████| 19/19 [00:00<00:00, 117.73it/s]


Test Loss: nan Test Accuracy: 79.47%


100%|██████████| 117/117 [00:01<00:00, 61.93it/s]


Train Loss: 0.7584617137908936


100%|██████████| 19/19 [00:00<00:00, 120.51it/s]


Test Loss: nan Test Accuracy: 80.09%


100%|██████████| 117/117 [00:01<00:00, 86.55it/s]


Train Loss: 0.7374428205001049


100%|██████████| 19/19 [00:00<00:00, 161.68it/s]


Test Loss: nan Test Accuracy: 80.38%


100%|██████████| 117/117 [00:01<00:00, 95.26it/s]


Train Loss: 0.7178384508842077


100%|██████████| 19/19 [00:00<00:00, 185.20it/s]


Test Loss: nan Test Accuracy: 80.97%


100%|██████████| 117/117 [00:01<00:00, 96.03it/s]


Train Loss: 0.6996176146034502


100%|██████████| 19/19 [00:00<00:00, 185.10it/s]


Test Loss: nan Test Accuracy: 81.17%


100%|██████████| 117/117 [00:01<00:00, 93.46it/s]


Train Loss: 0.6831291312845345


100%|██████████| 19/19 [00:00<00:00, 176.14it/s]


Test Loss: nan Test Accuracy: 81.51%


100%|██████████| 117/117 [00:01<00:00, 91.14it/s]


Train Loss: 0.6673054975322169


100%|██████████| 19/19 [00:00<00:00, 170.68it/s]

Test Loss: nan Test Accuracy: 81.87%





Part Two

In [None]:
import torch
import numpy as np
import tqdm

class MLP:
    '''
    This class implements a generic MLP learning framework. The core structure is provided,
    but key functions such as initialize(), forward(), backward(), and TrainMLP() have been completed.
    '''
    def __init__(self, layer_sizes: list[int]):
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes) - 1
        self.weights = []
        self.biases = []
        self.features = []

        # Hyperparameters with default values
        self.learning_rate = 1e-6
        self.batch_size = 512
        self.epoch = 25
        self.activation_function = ReLU()

    def set_hp(self, lr: float, bs: int, activation: object) -> None:
        self.learning_rate = lr
        self.batch_size = bs
        self.activation_function = activation

    def initialize(self) -> None:
        # Initialize weights with uniform distribution +/- sqrt(6 / (d_in + d_out))
        for i in range(self.num_layers):
            d_in, d_out = self.layer_sizes[i], self.layer_sizes[i + 1]
            bound = np.sqrt(6 / (d_in + d_out))
            self.weights.append(torch.empty((d_in, d_out)).uniform_(-bound, bound))
            #self.weights.append(torch.rand((d_in, d_out)) * 2 * bound - bound)
            self.biases.append(torch.zeros(1, d_out))

    def forward(self, x: torch.tensor) -> torch.tensor:
        # Forward propagation
        self.features = [x]
        for i in range(self.num_layers):
            x = x @ self.weights[i] + self.biases[i]
            x = self.activation_function.forward(x)
            self.features.append(x)
        logits = self.features[-1]
        softmax_x = torch.exp(logits) / torch.sum(torch.exp(logits), dim=1, keepdim=True)
        return softmax_x

    def backward(self, delta: torch.tensor) -> None:
      for i in reversed(range(self.num_layers)):
          # Gradient of weights and biases
          grad_w = self.features[i].T @ delta
          grad_b = delta.mean(dim=0)

          # Update weights and biases using gradient descent
          self.weights[i] -= self.learning_rate * grad_w
          self.biases[i] -= self.learning_rate * grad_b

          # Compute delta for the previous layer (only if it's not the input layer)
          if i > 0:
              # grad_w = self.weights[i]
              # delta = delta @ grad_w.T
              # delta = self.activation_function.backward(delta, self.features[i - 1])

              delta = self.activation_function.backward( (delta @ self.weights[i].T),(self.features[i]))

              # grad_b = delta.mean(dim=0)
              # delta = (delta @ self.weights[i].T) * (self.features[i] > 0).float()  # Apply ReLU derivative

def TrainMLP(model: MLP, x_train: torch.tensor, y_train: torch.tensor) -> MLP:
    bs = model.batch_size
    N = x_train.shape[0]
    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0  # Total loss

    for i in tqdm.tqdm(range(N // bs)):
        x = x_train[idx[i * bs:(i + 1) * bs], ...]
        y = y_train[idx[i * bs:(i + 1) * bs], ...]

        # Forward pass
        y_hat = model.forward(x)

        # Compute cross-entropy loss (convert one-hot to class indices)
        target = torch.argmax(y, dim=1)
        l = -torch.sum(y * torch.log(y_hat))
        L += l.item()

        # Backward pass
        delta = y_hat - y  # Gradient for softmax + cross-entropy
        model.backward(delta)

    print("Train Loss:", L / ((N // bs) * bs))

def TestMLP(model: MLP, x_test: torch.tensor, y_test: torch.tensor) -> tuple[float, float]:
    bs = model.batch_size
    N = x_test.shape[0]

    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0
    A = 0

    for i in tqdm.tqdm(range(N // bs)):
        x = x_test[idx[i * bs:(i + 1) * bs], ...]
        y = y_test[idx[i * bs:(i + 1) * bs], ...]

        y_hat = model.forward(x)
        target = torch.argmax(y, dim=1)

        # Cross-entropy loss
        l = torch.nn.functional.cross_entropy(y_hat, target)
        L += l.item()

        # Accuracy calculation
        predictions = torch.argmax(y_hat, dim=1)
        A += torch.sum(predictions == target).item()

    print("Test Loss:", L / ((N // bs) * bs), "Test Accuracy: {:.2f}%".format(100 * A / N))

def normalize_mnist() -> tuple[torch.tensor, torch.tensor, torch.tensor, torch.tensor]:
    base_path = "/content/drive/MyDrive/Neural_Networks/Data/"

    mnist = MnistDataloader(base_path + "train-images.idx3-ubyte", base_path + "train-labels.idx1-ubyte",
                            base_path + "t10k-images.idx3-ubyte", base_path + "t10k-labels.idx1-ubyte")
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_mean = torch.mean(x_train, dim=0, keepdim=True)
    x_std = torch.std(x_train, dim=0, keepdim=True)

    x_train -= x_mean
    x_train /= x_std
    x_train[x_train != x_train] = 0

    x_test -= x_mean
    x_test /= x_std
    x_test[x_test != x_test] = 0

    return x_train, y_train, x_test, y_test

def main():
    x_train, y_train, x_test, y_test = normalize_mnist()

    model = MLP([784, 10, 10])
    model.initialize()
    model.set_hp(lr=1e-6, bs=512, activation=ReLU())

    E = 25
    for _ in range(E):
        TrainMLP(model, x_train, y_train)
        TestMLP(model, x_test, y_test)

if __name__ == "__main__":
    main()


100%|██████████| 117/117 [00:00<00:00, 435.06it/s]


Train Loss: 2.302205695046319


100%|██████████| 19/19 [00:00<00:00, 506.85it/s]


Test Loss: nan Test Accuracy: 16.06%


100%|██████████| 117/117 [00:00<00:00, 434.26it/s]


Train Loss: 2.2184677674220157


100%|██████████| 19/19 [00:00<00:00, 530.36it/s]


Test Loss: nan Test Accuracy: 20.79%


100%|██████████| 117/117 [00:00<00:00, 303.93it/s]


Train Loss: 2.143950425661527


100%|██████████| 19/19 [00:00<00:00, 402.93it/s]


Test Loss: nan Test Accuracy: 25.19%


100%|██████████| 117/117 [00:00<00:00, 312.24it/s]


Train Loss: 2.0760872251967077


100%|██████████| 19/19 [00:00<00:00, 449.04it/s]


Test Loss: nan Test Accuracy: 29.85%


100%|██████████| 117/117 [00:00<00:00, 293.93it/s]


Train Loss: 2.0141556782600207


100%|██████████| 19/19 [00:00<00:00, 311.91it/s]


Test Loss: nan Test Accuracy: 34.05%


100%|██████████| 117/117 [00:00<00:00, 325.37it/s]


Train Loss: 1.9571993473248603


100%|██████████| 19/19 [00:00<00:00, 445.67it/s]


Test Loss: nan Test Accuracy: 38.25%


100%|██████████| 117/117 [00:00<00:00, 320.38it/s]


Train Loss: 1.9040688378179176


100%|██████████| 19/19 [00:00<00:00, 439.88it/s]


Test Loss: nan Test Accuracy: 41.50%


100%|██████████| 117/117 [00:00<00:00, 309.96it/s]


Train Loss: 1.8544650240841074


100%|██████████| 19/19 [00:00<00:00, 478.01it/s]


Test Loss: nan Test Accuracy: 44.00%


100%|██████████| 117/117 [00:00<00:00, 307.86it/s]


Train Loss: 1.8075149405715811


100%|██████████| 19/19 [00:00<00:00, 362.01it/s]


Test Loss: nan Test Accuracy: 46.16%


100%|██████████| 117/117 [00:00<00:00, 289.56it/s]


Train Loss: 1.7632070851122212


100%|██████████| 19/19 [00:00<00:00, 391.23it/s]


Test Loss: nan Test Accuracy: 47.92%


100%|██████████| 117/117 [00:00<00:00, 297.84it/s]


Train Loss: 1.7210400461131692


100%|██████████| 19/19 [00:00<00:00, 405.13it/s]


Test Loss: nan Test Accuracy: 49.62%


100%|██████████| 117/117 [00:00<00:00, 284.15it/s]


Train Loss: 1.6810490320890377


100%|██████████| 19/19 [00:00<00:00, 464.24it/s]


Test Loss: nan Test Accuracy: 51.03%


100%|██████████| 117/117 [00:00<00:00, 288.54it/s]


Train Loss: 1.643571894393008


100%|██████████| 19/19 [00:00<00:00, 486.50it/s]


Test Loss: nan Test Accuracy: 52.56%


100%|██████████| 117/117 [00:00<00:00, 416.70it/s]


Train Loss: 1.6075060184185321


100%|██████████| 19/19 [00:00<00:00, 478.89it/s]


Test Loss: nan Test Accuracy: 53.51%


100%|██████████| 117/117 [00:00<00:00, 406.86it/s]


Train Loss: 1.5732674333784316


100%|██████████| 19/19 [00:00<00:00, 406.73it/s]


Test Loss: nan Test Accuracy: 54.64%


100%|██████████| 117/117 [00:00<00:00, 421.60it/s]


Train Loss: 1.5410104327731662


100%|██████████| 19/19 [00:00<00:00, 565.85it/s]


Test Loss: nan Test Accuracy: 55.85%


100%|██████████| 117/117 [00:00<00:00, 406.91it/s]


Train Loss: 1.509583436525785


100%|██████████| 19/19 [00:00<00:00, 423.05it/s]


Test Loss: nan Test Accuracy: 56.79%


100%|██████████| 117/117 [00:00<00:00, 420.70it/s]


Train Loss: 1.4797548045459976


100%|██████████| 19/19 [00:00<00:00, 538.63it/s]


Test Loss: nan Test Accuracy: 57.66%


100%|██████████| 117/117 [00:00<00:00, 405.35it/s]


Train Loss: 1.451578830042456


100%|██████████| 19/19 [00:00<00:00, 496.38it/s]


Test Loss: nan Test Accuracy: 58.64%


100%|██████████| 117/117 [00:00<00:00, 396.44it/s]


Train Loss: 1.4243504399927254


100%|██████████| 19/19 [00:00<00:00, 475.32it/s]


Test Loss: nan Test Accuracy: 59.49%


100%|██████████| 117/117 [00:00<00:00, 412.57it/s]


Train Loss: 1.398268432698698


100%|██████████| 19/19 [00:00<00:00, 617.51it/s]


Test Loss: nan Test Accuracy: 60.21%


100%|██████████| 117/117 [00:00<00:00, 410.99it/s]


Train Loss: 1.3727119111607218


100%|██████████| 19/19 [00:00<00:00, 521.43it/s]


Test Loss: nan Test Accuracy: 60.93%


100%|██████████| 117/117 [00:00<00:00, 394.12it/s]


Train Loss: 1.3486182047770574


100%|██████████| 19/19 [00:00<00:00, 468.63it/s]


Test Loss: nan Test Accuracy: 61.62%


100%|██████████| 117/117 [00:00<00:00, 406.20it/s]


Train Loss: 1.3252385976987007


100%|██████████| 19/19 [00:00<00:00, 464.51it/s]


Test Loss: nan Test Accuracy: 62.23%


100%|██████████| 117/117 [00:00<00:00, 408.94it/s]


Train Loss: 1.3024882017037807


100%|██████████| 19/19 [00:00<00:00, 547.67it/s]

Test Loss: nan Test Accuracy: 62.83%





# L1 Normaliation Implementation

In [None]:
import torch
import numpy as np
import tqdm

class MLP:
    '''
    This class implements a generic MLP learning framework. The core structure is provided,
    but key functions such as initialize(), forward(), backward(), and TrainMLP() have been completed.
    '''
    def __init__(self, layer_sizes: list[int]):
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes) - 1
        self.weights = []
        self.biases = []
        self.features = []

        # Hyperparameters with default values
        self.learning_rate = 1e-6
        self.batch_size = 512
        self.epoch = 25
        self.activation_function = ReLU()

    def set_hp(self, lr: float, bs: int, activation: object, l1_reg: float = 0.0) -> None: # Added l1_reg with a default value
        self.learning_rate = lr
        self.batch_size = bs
        self.activation_function = activation
        self.l1_reg = l1_reg # Assign l1_reg to the model

    def initialize(self) -> None:
        # Initialize weights with uniform distribution +/- sqrt(6 / (d_in + d_out))
        for i in range(self.num_layers):
            d_in, d_out = self.layer_sizes[i], self.layer_sizes[i + 1]
            bound = np.sqrt(6 / (d_in + d_out))
            self.weights.append(torch.empty((d_in, d_out)).uniform_(-bound, bound))
            #self.weights.append(torch.rand((d_in, d_out)) * 2 * bound - bound)
            self.biases.append(torch.zeros(1, d_out))

    def forward(self, x: torch.tensor) -> torch.tensor:
        # Forward propagation
        self.features = [x]
        for i in range(self.num_layers):
            x = x @ self.weights[i] + self.biases[i]
            x = self.activation_function.forward(x)
            self.features.append(x)
        logits = self.features[-1]
        softmax_x = torch.exp(logits) / torch.sum(torch.exp(logits), dim=1, keepdim=True)
        return softmax_x

    def backward(self, delta: torch.tensor) -> None:
      for i in reversed(range(self.num_layers)):
          # Gradient of weights and biases
          grad_w = self.features[i].T @ delta
          grad_b = delta.mean(dim=0)

          # Update weights and biases using gradient descent
          self.weights[i] -= self.learning_rate * (grad_w + self.l1_reg * torch.sign(self.weights[i]))
          self.biases[i] -= self.learning_rate * grad_b

          # Compute delta for the previous layer (only if it's not the input layer)
          if i > 0:
              # grad_w = self.weights[i]
              # delta = delta @ grad_w.T
              # delta = self.activation_function.backward(delta, self.features[i - 1])

              delta = self.activation_function.backward( (delta @ self.weights[i].T),(self.features[i]))

              # grad_b = delta.mean(dim=0)
              # delta = (delta @ self.weights[i].T) * (self.features[i] > 0).float()  # Apply ReLU derivative

def TrainMLP(model: MLP, x_train: torch.tensor, y_train: torch.tensor) -> MLP:
    bs = model.batch_size
    N = x_train.shape[0]
    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0  # Total loss

    for i in tqdm.tqdm(range(N // bs)):
        x = x_train[idx[i * bs:(i + 1) * bs], ...]
        y = y_train[idx[i * bs:(i + 1) * bs], ...]

        # Forward pass
        y_hat = model.forward(x)

        # Compute cross-entropy loss (convert one-hot to class indices)
        target = torch.argmax(y, dim=1)
        l = -torch.sum(y * torch.log(y_hat))
        L += l.item()

        # Backward pass
        delta = y_hat - y  # Gradient for softmax + cross-entropy
        model.backward(delta)

    print("Train Loss:", L / ((N // bs) * bs))

def TestMLP(model: MLP, x_test: torch.tensor, y_test: torch.tensor) -> tuple[float, float]:
    bs = model.batch_size
    N = x_test.shape[0]

    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0
    A = 0

    for i in tqdm.tqdm(range(N // bs)):
        x = x_test[idx[i * bs:(i + 1) * bs], ...]
        y = y_test[idx[i * bs:(i + 1) * bs], ...]

        y_hat = model.forward(x)
        target = torch.argmax(y, dim=1)

        # Cross-entropy loss
        l = torch.nn.functional.cross_entropy(y_hat, target)
        L += l.item()

        # Accuracy calculation
        predictions = torch.argmax(y_hat, dim=1)
        A += torch.sum(predictions == target).item()

    print("Test Loss:", L / ((N // bs) * bs), "Test Accuracy: {:.2f}%".format(100 * A / N))

def normalize_mnist() -> tuple[torch.tensor, torch.tensor, torch.tensor, torch.tensor]:
    base_path = "/content/drive/MyDrive/Neural_Networks/Data/"

    mnist = MnistDataloader(base_path + "train-images.idx3-ubyte", base_path + "train-labels.idx1-ubyte",
                            base_path + "t10k-images.idx3-ubyte", base_path + "t10k-labels.idx1-ubyte")
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_mean = torch.mean(x_train, dim=0, keepdim=True)
    x_std = torch.std(x_train, dim=0, keepdim=True)

    x_train -= x_mean
    x_train /= x_std
    x_train[x_train != x_train] = 0

    x_test -= x_mean
    x_test /= x_std
    x_test[x_test != x_test] = 0

    return x_train, y_train, x_test, y_test

def main():
    x_train, y_train, x_test, y_test = normalize_mnist()

    model = MLP([784, 256, 10])
    model.initialize()
    model.set_hp(lr=1e-6, bs=512, activation=ReLU(), l1_reg=0.01) # Pass l1_reg as an argument

    E = 25
    for _ in range(E):
        TrainMLP(model, x_train, y_train)
        TestMLP(model, x_test, y_test)

if __name__ == "__main__":
    main()



100%|██████████| 117/117 [00:01<00:00, 91.75it/s]


Train Loss: 2.4150270885891385


100%|██████████| 19/19 [00:00<00:00, 165.14it/s]


Test Loss: nan Test Accuracy: 17.93%


100%|██████████| 117/117 [00:01<00:00, 90.86it/s]


Train Loss: 2.1778263903071737


100%|██████████| 19/19 [00:00<00:00, 182.94it/s]


Test Loss: nan Test Accuracy: 31.49%


100%|██████████| 117/117 [00:01<00:00, 73.07it/s]


Train Loss: 1.9581532111534705


100%|██████████| 19/19 [00:00<00:00, 119.45it/s]


Test Loss: nan Test Accuracy: 44.32%


100%|██████████| 117/117 [00:02<00:00, 57.94it/s]


Train Loss: 1.767588411640917


100%|██████████| 19/19 [00:00<00:00, 103.58it/s]


Test Loss: nan Test Accuracy: 53.74%


100%|██████████| 117/117 [00:01<00:00, 66.97it/s]


Train Loss: 1.6076688623835897


100%|██████████| 19/19 [00:00<00:00, 167.31it/s]


Test Loss: nan Test Accuracy: 60.07%


100%|██████████| 117/117 [00:01<00:00, 91.53it/s]


Train Loss: 1.475569242086166


100%|██████████| 19/19 [00:00<00:00, 181.48it/s]


Test Loss: nan Test Accuracy: 64.28%


100%|██████████| 117/117 [00:01<00:00, 90.81it/s]


Train Loss: 1.365491582797124


100%|██████████| 19/19 [00:00<00:00, 181.48it/s]


Test Loss: nan Test Accuracy: 67.34%


100%|██████████| 117/117 [00:01<00:00, 89.81it/s]


Train Loss: 1.2732429718359923


100%|██████████| 19/19 [00:00<00:00, 184.79it/s]


Test Loss: nan Test Accuracy: 69.45%


100%|██████████| 117/117 [00:01<00:00, 90.71it/s]


Train Loss: 1.194974283886771


100%|██████████| 19/19 [00:00<00:00, 182.44it/s]


Test Loss: nan Test Accuracy: 71.26%


100%|██████████| 117/117 [00:01<00:00, 90.36it/s]


Train Loss: 1.1278455950256088


100%|██████████| 19/19 [00:00<00:00, 166.60it/s]


Test Loss: nan Test Accuracy: 72.74%


100%|██████████| 117/117 [00:01<00:00, 91.11it/s]


Train Loss: 1.0694005051229754


100%|██████████| 19/19 [00:00<00:00, 165.46it/s]


Test Loss: nan Test Accuracy: 74.09%


100%|██████████| 117/117 [00:01<00:00, 80.86it/s]


Train Loss: 1.0182970879424331


100%|██████████| 19/19 [00:00<00:00, 66.39it/s]


Test Loss: nan Test Accuracy: 75.36%


100%|██████████| 117/117 [00:02<00:00, 51.58it/s]


Train Loss: 0.9732567326635377


100%|██████████| 19/19 [00:00<00:00, 118.07it/s]


Test Loss: nan Test Accuracy: 76.08%


100%|██████████| 117/117 [00:01<00:00, 63.47it/s]


Train Loss: 0.9326199588612614


100%|██████████| 19/19 [00:00<00:00, 176.52it/s]


Test Loss: 0.003720993629509681 Test Accuracy: 77.03%


100%|██████████| 117/117 [00:01<00:00, 91.55it/s]


Train Loss: 0.8965492146646875


100%|██████████| 19/19 [00:00<00:00, 160.83it/s]


Test Loss: nan Test Accuracy: 77.58%


100%|██████████| 117/117 [00:01<00:00, 91.99it/s]


Train Loss: 0.8639485841123467


100%|██████████| 19/19 [00:00<00:00, 186.97it/s]


Test Loss: nan Test Accuracy: 78.31%


100%|██████████| 117/117 [00:01<00:00, 90.75it/s]


Train Loss: 0.8346190773523771


100%|██████████| 19/19 [00:00<00:00, 183.25it/s]


Test Loss: nan Test Accuracy: 78.72%


100%|██████████| 117/117 [00:01<00:00, 87.84it/s]


Train Loss: 0.807703585196764


100%|██████████| 19/19 [00:00<00:00, 192.07it/s]


Test Loss: nan Test Accuracy: 79.41%


100%|██████████| 117/117 [00:01<00:00, 90.88it/s]


Train Loss: 0.7832118904488719


100%|██████████| 19/19 [00:00<00:00, 172.41it/s]


Test Loss: nan Test Accuracy: 79.83%


100%|██████████| 117/117 [00:01<00:00, 90.03it/s]


Train Loss: 0.7607280515198015


100%|██████████| 19/19 [00:00<00:00, 161.42it/s]


Test Loss: nan Test Accuracy: 80.41%


100%|██████████| 117/117 [00:01<00:00, 86.61it/s]


Train Loss: 0.7401192870914427


100%|██████████| 19/19 [00:00<00:00, 116.07it/s]


Test Loss: nan Test Accuracy: 80.85%


100%|██████████| 117/117 [00:01<00:00, 59.09it/s]


Train Loss: 0.7209877494053963


100%|██████████| 19/19 [00:00<00:00, 122.41it/s]


Test Loss: nan Test Accuracy: 81.29%


100%|██████████| 117/117 [00:01<00:00, 58.88it/s]


Train Loss: 0.703241520967239


100%|██████████| 19/19 [00:00<00:00, 119.28it/s]


Test Loss: nan Test Accuracy: 81.78%


100%|██████████| 117/117 [00:01<00:00, 86.64it/s]


Train Loss: 0.6867105767258213


100%|██████████| 19/19 [00:00<00:00, 183.85it/s]


Test Loss: nan Test Accuracy: 81.89%


100%|██████████| 117/117 [00:01<00:00, 90.43it/s]


Train Loss: 0.671353425225641


100%|██████████| 19/19 [00:00<00:00, 180.32it/s]

Test Loss: nan Test Accuracy: 82.08%





# L2 Regulization

In [None]:
import torch
import numpy as np
import tqdm

class MLP:
    '''
    This class implements a generic MLP learning framework. The core structure is provided,
    but key functions such as initialize(), forward(), backward(), and TrainMLP() have been completed.
    '''
    def __init__(self, layer_sizes: list[int]):
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes) - 1
        self.weights = []
        self.biases = []
        self.features = []

        # Hyperparameters with default values
        self.learning_rate = 1e-6
        self.batch_size = 512
        self.epoch = 25
        self.activation_function = ReLU()

    def set_hp(self, lr: float, bs: int, activation: object, l2_reg: float = 0.0) -> None: # Added l2_reg with a default value
        self.learning_rate = lr
        self.batch_size = bs
        self.activation_function = activation
        self.l2_reg = l2_reg # Assign l2_reg to the model

    def initialize(self) -> None:
        # Initialize weights with uniform distribution +/- sqrt(6 / (d_in + d_out))
        for i in range(self.num_layers):
            d_in, d_out = self.layer_sizes[i], self.layer_sizes[i + 1]
            bound = np.sqrt(6 / (d_in + d_out))
            self.weights.append(torch.empty((d_in, d_out)).uniform_(-bound, bound))
            #self.weights.append(torch.rand((d_in, d_out)) * 2 * bound - bound)
            self.biases.append(torch.zeros(1, d_out))

    def forward(self, x: torch.tensor) -> torch.tensor:
        # Forward propagation
        self.features = [x]
        for i in range(self.num_layers):
            x = x @ self.weights[i] + self.biases[i]
            x = self.activation_function.forward(x)
            self.features.append(x)
        logits = self.features[-1]
        softmax_x = torch.exp(logits) / torch.sum(torch.exp(logits), dim=1, keepdim=True)
        return softmax_x

    def backward(self, delta: torch.tensor) -> None:
      for i in reversed(range(self.num_layers)):
          # Gradient of weights and biases
          grad_w = self.features[i].T @ delta
          grad_b = delta.mean(dim=0)

          # Update weights and biases using gradient descent
          self.weights[i] -= self.learning_rate * (grad_w + self.l2_reg * torch.sign(self.weights[i]))
          self.biases[i] -= self.learning_rate * grad_b

          # Compute delta for the previous layer (only if it's not the input layer)
          if i > 0:
              # grad_w = self.weights[i]
              # delta = delta @ grad_w.T
              # delta = self.activation_function.backward(delta, self.features[i - 1])

              delta = self.activation_function.backward( (delta @ self.weights[i].T),(self.features[i]))

              # grad_b = delta.mean(dim=0)
              # delta = (delta @ self.weights[i].T) * (self.features[i] > 0).float()  # Apply ReLU derivative

def TrainMLP(model: MLP, x_train: torch.tensor, y_train: torch.tensor) -> MLP:
    bs = model.batch_size
    N = x_train.shape[0]
    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0  # Total loss

    for i in tqdm.tqdm(range(N // bs)):
        x = x_train[idx[i * bs:(i + 1) * bs], ...]
        y = y_train[idx[i * bs:(i + 1) * bs], ...]

        # Forward pass
        y_hat = model.forward(x)

        # Compute cross-entropy loss (convert one-hot to class indices)
        target = torch.argmax(y, dim=1)
        l = -torch.sum(y * torch.log(y_hat))
        L += l.item()

        # Backward pass
        delta = y_hat - y  # Gradient for softmax + cross-entropy
        model.backward(delta)

    print("Train Loss:", L / ((N // bs) * bs))

def TestMLP(model: MLP, x_test: torch.tensor, y_test: torch.tensor) -> tuple[float, float]:
    bs = model.batch_size
    N = x_test.shape[0]

    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0
    A = 0

    for i in tqdm.tqdm(range(N // bs)):
        x = x_test[idx[i * bs:(i + 1) * bs], ...]
        y = y_test[idx[i * bs:(i + 1) * bs], ...]

        y_hat = model.forward(x)
        target = torch.argmax(y, dim=1)

        # Cross-entropy loss
        l = torch.nn.functional.cross_entropy(y_hat, target)
        L += l.item()

        # Accuracy calculation
        predictions = torch.argmax(y_hat, dim=1)
        A += torch.sum(predictions == target).item()

    print("Test Loss:", L / ((N // bs) * bs), "Test Accuracy: {:.2f}%".format(100 * A / N))

def normalize_mnist() -> tuple[torch.tensor, torch.tensor, torch.tensor, torch.tensor]:
    base_path = "/content/drive/MyDrive/Neural_Networks/Data/"

    mnist = MnistDataloader(base_path + "train-images.idx3-ubyte", base_path + "train-labels.idx1-ubyte",
                            base_path + "t10k-images.idx3-ubyte", base_path + "t10k-labels.idx1-ubyte")
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_mean = torch.mean(x_train, dim=0, keepdim=True)
    x_std = torch.std(x_train, dim=0, keepdim=True)

    x_train -= x_mean
    x_train /= x_std
    x_train[x_train != x_train] = 0

    x_test -= x_mean
    x_test /= x_std
    x_test[x_test != x_test] = 0

    return x_train, y_train, x_test, y_test

def main():
    x_train, y_train, x_test, y_test = normalize_mnist()

    model = MLP([784, 256, 10])
    model.initialize()
    model.set_hp(lr=1e-6, bs=512, activation=ReLU(), l2_reg=0.01) # Pass l2_reg as an argument

    E = 25
    for _ in range(E):
        TrainMLP(model, x_train, y_train)
        TestMLP(model, x_test, y_test)

if __name__ == "__main__":
    main()



100%|██████████| 117/117 [00:01<00:00, 93.10it/s]


Train Loss: 2.3131020904606223


100%|██████████| 19/19 [00:00<00:00, 181.96it/s]


Test Loss: nan Test Accuracy: 23.64%


100%|██████████| 117/117 [00:01<00:00, 88.56it/s]


Train Loss: 2.1006237604679208


100%|██████████| 19/19 [00:00<00:00, 164.45it/s]


Test Loss: nan Test Accuracy: 35.78%


100%|██████████| 117/117 [00:01<00:00, 92.01it/s]


Train Loss: 1.892641266187032


100%|██████████| 19/19 [00:00<00:00, 186.72it/s]


Test Loss: nan Test Accuracy: 46.48%


100%|██████████| 117/117 [00:01<00:00, 70.75it/s]


Train Loss: 1.710998530061836


100%|██████████| 19/19 [00:00<00:00, 120.80it/s]


Test Loss: nan Test Accuracy: 54.15%


100%|██████████| 117/117 [00:02<00:00, 58.25it/s]


Train Loss: 1.5596820309630826


100%|██████████| 19/19 [00:00<00:00, 122.35it/s]


Test Loss: nan Test Accuracy: 59.85%


100%|██████████| 117/117 [00:01<00:00, 67.14it/s]


Train Loss: 1.4342170372987404


100%|██████████| 19/19 [00:00<00:00, 162.42it/s]


Test Loss: 0.004044992138484591 Test Accuracy: 63.76%


100%|██████████| 117/117 [00:01<00:00, 90.83it/s]


Train Loss: 1.3288457699311085


100%|██████████| 19/19 [00:00<00:00, 182.37it/s]


Test Loss: nan Test Accuracy: 67.00%


100%|██████████| 117/117 [00:01<00:00, 89.71it/s]


Train Loss: 1.2392904452788525


100%|██████████| 19/19 [00:00<00:00, 179.11it/s]


Test Loss: nan Test Accuracy: 69.43%


100%|██████████| 117/117 [00:01<00:00, 89.73it/s]


Train Loss: 1.1627229523454976


100%|██████████| 19/19 [00:00<00:00, 161.43it/s]


Test Loss: nan Test Accuracy: 71.71%


100%|██████████| 117/117 [00:01<00:00, 87.20it/s]


Train Loss: 1.0965666669046776


100%|██████████| 19/19 [00:00<00:00, 166.36it/s]


Test Loss: nan Test Accuracy: 73.42%


100%|██████████| 117/117 [00:01<00:00, 89.61it/s]


Train Loss: 1.0386668148203793


100%|██████████| 19/19 [00:00<00:00, 172.54it/s]


Test Loss: nan Test Accuracy: 74.68%


100%|██████████| 117/117 [00:01<00:00, 87.68it/s]


Train Loss: 0.9884375690394996


100%|██████████| 19/19 [00:00<00:00, 164.88it/s]


Test Loss: nan Test Accuracy: 75.73%


100%|██████████| 117/117 [00:01<00:00, 78.03it/s]


Train Loss: 0.9432454791843382


100%|██████████| 19/19 [00:00<00:00, 118.54it/s]


Test Loss: nan Test Accuracy: 76.78%


100%|██████████| 117/117 [00:02<00:00, 57.15it/s]


Train Loss: 0.9039876410084912


100%|██████████| 19/19 [00:00<00:00, 103.17it/s]


Test Loss: nan Test Accuracy: 77.55%


100%|██████████| 117/117 [00:01<00:00, 64.47it/s]


Train Loss: 0.8686666722990509


100%|██████████| 19/19 [00:00<00:00, 187.64it/s]


Test Loss: nan Test Accuracy: 78.21%


100%|██████████| 117/117 [00:01<00:00, 89.62it/s]


Train Loss: 0.8368792472741543


100%|██████████| 19/19 [00:00<00:00, 190.40it/s]


Test Loss: nan Test Accuracy: 78.91%


100%|██████████| 117/117 [00:01<00:00, 92.37it/s]


Train Loss: 0.8082772931482038


100%|██████████| 19/19 [00:00<00:00, 184.48it/s]


Test Loss: nan Test Accuracy: 79.47%


100%|██████████| 117/117 [00:01<00:00, 89.83it/s]


Train Loss: 0.782122876909044


100%|██████████| 19/19 [00:00<00:00, 160.79it/s]


Test Loss: nan Test Accuracy: 80.01%


100%|██████████| 117/117 [00:01<00:00, 90.02it/s]


Train Loss: 0.7584307051112509


100%|██████████| 19/19 [00:00<00:00, 184.41it/s]


Test Loss: nan Test Accuracy: 80.32%


100%|██████████| 117/117 [00:01<00:00, 91.52it/s]


Train Loss: 0.7371624636853862


100%|██████████| 19/19 [00:00<00:00, 183.87it/s]


Test Loss: nan Test Accuracy: 80.88%


100%|██████████| 117/117 [00:01<00:00, 88.12it/s]


Train Loss: 0.7169902523358663


100%|██████████| 19/19 [00:00<00:00, 188.00it/s]


Test Loss: 0.003540409973969585 Test Accuracy: 81.34%


100%|██████████| 117/117 [00:02<00:00, 56.11it/s]


Train Loss: 0.6988479586747977


100%|██████████| 19/19 [00:00<00:00, 115.77it/s]


Test Loss: nan Test Accuracy: 81.70%


100%|██████████| 117/117 [00:01<00:00, 58.67it/s]


Train Loss: 0.682037250099019


100%|██████████| 19/19 [00:00<00:00, 117.08it/s]


Test Loss: nan Test Accuracy: 81.98%


100%|██████████| 117/117 [00:01<00:00, 64.15it/s]


Train Loss: 0.6662619990161341


100%|██████████| 19/19 [00:00<00:00, 183.66it/s]


Test Loss: nan Test Accuracy: 82.24%


100%|██████████| 117/117 [00:01<00:00, 90.75it/s]


Train Loss: 0.6516790858700744


100%|██████████| 19/19 [00:00<00:00, 180.00it/s]

Test Loss: nan Test Accuracy: 82.35%



