# Neural Networks with PyTorch

In this assignment, we are going to train a Neural Networks on the Japanese MNIST dataset. It is composed of 70000 images of handwritten Hiragana characters. The target variables has 10 different classes.

Each image is of dimension 28 by 28. But we will flatten them to form a dataset composed of vectors of dimension (784, 1). The training process will be similar as for a structured dataset.

<img src='https://drive.google.com/uc?id=16TqEl9ESfXYbUpVafXD6h5UpJYGKfMxE' width="500" height="200">

Your goal is to run at least 3 experiments and get a model that can achieve 80% accuracy with not much overfitting on this dataset.

Some of the code have already been defined for you. You need only to add your code in the sections specified (marked with **TODO**). Some assert statements have been added to verify the expected outputs are correct. If it does throw an error, this means your implementation is behaving as expected.

Note: You can only use fully-connected and dropout layers for this assignment. You can not convolution layers for instance

# 1. Import Required Packages

[1.1] We are going to use numpy, matplotlib and google.colab packages

In [None]:
from google.colab import drive
import numpy as np
import matplotlib.pyplot as plt

# 2. Download Dataset

We will store the dataset into your personal Google Drive.


[2.1] Mount Google Drive

In [None]:
drive.mount('/content/gdrive')

[2.2] Create a folder called `DL_ASG_1` on your Google Drive at the root level

In [None]:
! mkdir -p /content/gdrive/MyDrive/DL_ASG_1

[2.3] Navigate to this folder

In [None]:
%cd '/content/gdrive/MyDrive/DL_ASG_1'

[2.4] Show the list of item on the folder

In [None]:
!ls

[2.4] Dowload the dataset files to your Google Drive if required

In [None]:
import requests
from tqdm import tqdm
import os.path

def download_file(url):
    path = url.split('/')[-1]
    if os.path.isfile(path):
        print (f"{path} already exists")
    else:
      r = requests.get(url, stream=True)
      with open(path, 'wb') as f:
          total_length = int(r.headers.get('content-length'))
          print('Downloading {} - {:.1f} MB'.format(path, (total_length / 1024000)))
          for chunk in tqdm(r.iter_content(chunk_size=1024), total=int(total_length / 1024) + 1, unit="KB"):
              if chunk:
                  f.write(chunk)

url_list = [
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-labels.npz'
]

for url in url_list:
    download_file(url)

[2.5] List the content of the folder and confirm files have been dowloaded properly

In [None]:
! ls

# 3. Load Data

[3.1] Import the required modules from PyTorch

In [None]:
# TODO (Students need to fill this section)
import torch
from torch.utils.data import DataLoader, TensorDataset
import torchvision.transforms as transforms


[3.2] **TODO** Create 2 variables called `img_height` and `img_width` that will both take the value 28

In [None]:
# TODO (Students need to fill this section)
img_height = 28
img_width =28

[3.3] Create a function that loads a .npz file using numpy and return the content of the `arr_0` key

In [None]:
def load(f):
    return np.load(f)['arr_0']

[3.4] **TODO** Load the 4 files saved on your Google Drive into their respective variables: x_train, y_train, x_test and y_test

In [None]:
# TODO (Students need to fill this section)
x_train = load('/content/gdrive/MyDrive/DL_ASG_1/kmnist-train-imgs.npz')
y_train = load('/content/gdrive/MyDrive/DL_ASG_1/kmnist-train-labels.npz')
x_test = load('/content/gdrive/MyDrive/DL_ASG_1/kmnist-test-imgs.npz')
y_test = load('/content/gdrive/MyDrive/DL_ASG_1/kmnist-test-labels.npz')


[3.5] **TODO** Using matplotlib display the first image from the train set and its target value

In [None]:
# TODO (Students need to fill this section)
plt.imshow(x_train[0].reshape(img_height, img_width), cmap='gray')
plt.title(f'Label: {y_train[0]}')
plt.show()


# 4. Prepare Data

[4.1] **TODO** Reshape the images from the training and testing set to have the channel dimension last. The dimensions should be: (row_number, height, width, channel)

In [None]:
# TODO (Students need to fill this section)
x_train = x_train.reshape(-1, 1, img_height, img_width)
x_test = x_test.reshape(-1, 1, img_height, img_width)


[4.2] **TODO** Cast `x_train` and `x_test` into `float32` decimals

In [None]:
x_train = torch.tensor(x_train, dtype=torch.float32) / 255.0
x_test = torch.tensor(x_test, dtype=torch.float32) / 255.0


[4.3] **TODO** Standardise the images of the training and testing sets. Originally each image contains pixels with value ranging from 0 to 255. after standardisation, the new value range should be from 0 to 1.

In [None]:
# TODO (Students need to fill this section)
# Standardize images in training set
x_train = x_train / 255.0

# Standardize images in testing set
x_test = x_test / 255.0


[4.4] **TODO** Create a variable called `num_classes` that will take the value 10 which corresponds to the number of classes for the target variable

In [None]:
# TODO (Students need to fill this section)
num_classes =10

[4.5] **TODO** Convert the target variable for the training and testing sets to a binary class matrix of dimension (rows, num_classes).

For example:
- class 0 will become [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- class 1 will become [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
- class 5 will become [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
- class 9 will become [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

In [None]:
# TODO (Students need to fill this section)
y_train = torch.nn.functional.one_hot(torch.tensor(y_train).to(torch.int64), num_classes)
y_test = torch.nn.functional.one_hot(torch.tensor(y_test).to(torch.int64), num_classes)


# 5. Define Neural Networks Architecure

[5.1] Set the seed in PyTorch for reproducing results



In [None]:
# TODO (Students need to fill this section)

torch.manual_seed(42)  # Example seed


[5.2] **TODO** Define the architecture of your Neural Networks and save it into a variable called `model`

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Define three neural network architectures in a single code block.

# Neural Network 1: Basic Fully Connected Network
class NeuralNet(nn.Module):
    def __init__(self, num_classes=10):
        super(NeuralNet, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

# Neural Network 2 : Network with Batch Normalization and Dropout
class CustomNet(nn.Module):
    def __init__(self, input_size=784, num_classes=10):
        super(CustomNet, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(input_size, 512),
            nn.ReLU(),
            nn.Dropout(0.25),  # Adjust dropout rate to prevent overfitting
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Dropout(0.25),  # Adjust dropout rate to prevent overfitting
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.25),  # Adjust dropout rate to prevent overfitting
            nn.Linear(128, num_classes)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

# Neural Network 3 :Advanced Network with Layer Normalization and LeakyReLU
class DeepCustomNet(nn.Module):
    def __init__(self, input_size=784, num_classes=10):
        super(DeepCustomNet, self).__init__()
        self.flatten = nn.Flatten()
        self.network = nn.Sequential(
            nn.Linear(input_size, 1024),
            nn.ReLU(),
            nn.Dropout(0.2),  # Prevent overfitting
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Dropout(0.2),  # Further prevent overfitting
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Dropout(0.3),  # Increased dropout for deeper layer
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.3),  # Consistent with prior layer to manage complexity
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(0.4),  # Higher dropout in a deeper section
            nn.Linear(64, num_classes)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.network(x)
        return logits

# Instantiate and print the summaries of each model
model1 = NeuralNet()
model2 = CustomNet()
model3 = DeepCustomNet()

model_summaries = [model1, model2, model3]

# Instead of printing model summaries (which would be too verbose and not insightful in this format),
# let's just confirm the models are created by printing their class types.
[model.__class__.__name__ for model in model_summaries]


[5.2] **TODO** Print the summary of your model

In [None]:
# TODO (Students need to fill this section)
print(model1)
print(model2)
print(model3)

# 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
# TODO (Students need to fill this section)
batch_size =128
epochs =500

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
# TODO (Students need to fill this section)
import torch.optim as optim

# Define optimizers and loss functions for each of the three models.

# For model 1 (NeuralNet)
optimizer1 = optim.Adam(model1.parameters(), lr=0.001)
criterion1 = nn.CrossEntropyLoss()

# For model 2 (CustomNet)
optimizer2 = optim.Adam(model2.parameters(), lr=0.001)
criterion2 = nn.CrossEntropyLoss()

# For model 3 (DeepCustomNet)
optimizer3 = optim.Adam(model3.parameters(), lr=0.001)
criterion3 = nn.CrossEntropyLoss()

# Let's confirm the optimizer types and loss function types for each model.
{
    "optimizer1_type": type(optimizer1).__name__,
    "criterion1_type": type(criterion1).__name__,
    "optimizer2_type": type(optimizer2).__name__,
    "criterion2_type": type(criterion2).__name__,
    "optimizer3_type": type(optimizer3).__name__,
    "criterion3_type": type(criterion3).__name__,
}


[6.3] **TODO** Train your model
using the number of epochs defined. Calculate the total loss and save it to a variable called total_loss.

In [None]:
    # TODO (Students need to fill this section)For model 1
total_loss = 0.0  # Initialize total loss
for epoch in range(epochs):
     model1.train()  # Set model to training mode
    running_loss = 0.0
    for images, labels in DataLoader(TensorDataset(x_train, y_train), batch_size=batch_size, shuffle=True):
        optimizer1.zero_grad()  # Zero the gradients
        outputs = model1(images)  # Forward pass
        loss = criterion1(outputs, torch.max(labels, 1)[1])  # Calculate loss
        loss.backward()  # Backward pass
        optimizer1.step()  # Optimize
        running_loss += loss.item() * images.size(0)  # Multiply loss by batch size
    epoch_loss = running_loss / len(x_train)  # Calculate average loss per epoch
    total_loss += epoch_loss  # Update total loss
    print(f"Epoch {epoch+1}, Loss: {epoch_loss}")
print(f"Total loss over all epochs: {total_loss}")

In [None]:
    # TODO (Students need to fill this section)For model 2
total_loss = 0.0  # Initialize total loss
for epoch in range(epochs):
     model2.train()  # Set model to training mode
    running_loss = 0.0
    for images, labels in DataLoader(TensorDataset(x_train, y_train), batch_size=batch_size, shuffle=True):
        optimizer2.zero_grad()  # Zero the gradients
        outputs = model2(images)  # Forward pass
        loss = criterion2(outputs, torch.max(labels, 1)[1])  # Calculate loss
        loss.backward()  # Backward pass
        optimizer2.step()  # Optimize
        running_loss += loss.item() * images.size(0)  # Multiply loss by batch size
    epoch_loss = running_loss / len(x_train)  # Calculate average loss per epoch
    total_loss += epoch_loss  # Update total loss
    print(f"Epoch {epoch+1}, Loss: {epoch_loss}")
print(f"Total loss over all epochs: {total_loss}")

In [None]:
    # TODO (Students need to fill this section)For model 3
total_loss = 0.0  # Initialize total loss
for epoch in range(epochs):
     model3.train()  # Set model to training mode
    running_loss = 0.0
    for images, labels in DataLoader(TensorDataset(x_train, y_train), batch_size=batch_size, shuffle=True):
        optimizer3.zero_grad()  # Zero the gradients
        outputs = model3(images)  # Forward pass
        loss = criterion3(outputs, torch.max(labels, 1)[1])  # Calculate loss
        loss.backward()  # Backward pass
        optimizer3.step()  # Optimize
        running_loss += loss.item() * images.size(0)  # Multiply loss by batch size
    epoch_loss = running_loss / len(x_train)  # Calculate average loss per epoch
    total_loss += epoch_loss  # Update total loss
    print(f"Epoch {epoch+1}, Loss: {epoch_loss}")
print(f"Total loss over all epochs: {total_loss}")

In [None]:
def train_and_evaluate_model(model, optimizer, criterion, x_train, y_train, epochs, batch_size):
    epoch_losses = []  # Initialize a list to store the average loss of each epoch

    for epoch in range(epochs):
        model.train()  # Set model to training mode
        running_loss = 0.0

        # Iterate over the DataLoader for training data
        for images, labels in DataLoader(TensorDataset(x_train, y_train), batch_size=batch_size, shuffle=True):
            optimizer.zero_grad()  # Zero the gradients
            outputs = model(images)  # Forward pass

            # Calculate loss; adjust if your labels are not one-hot encoded
            loss = criterion(outputs, labels.argmax(dim=1))
            loss.backward()  # Backward pass
            optimizer.step()  # Optimize

            # Update the running loss
            running_loss += loss.item() * images.size(0)

        # Calculate the average loss for the epoch
        epoch_loss = running_loss / len(x_train)
        epoch_losses.append(epoch_loss)  # Append the average loss for this epoch to the list

        print(f"Model {model.__class__.__name__}: Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}")

    # Calculate and print the total loss across all epochs for the current model
    total_loss = sum(epoch_losses)
    print(f"Model {model.__class__.__name__}: Total loss over all epochs: {total_loss:.4f}\n")
    return epoch_losses  # Optionally return the list of epoch losses

# Assuming x_train, y_train, epochs, and batch_size are already defined,
# and model1, model2, model3, optimizer1, optimizer2, optimizer3, criterion1, criterion2, criterion3 are properly initialized.
epoch_losses_model1 = train_and_evaluate_model(model1, optimizer1, criterion1, x_train, y_train, epochs, batch_size)
epoch_losses_model2 = train_and_evaluate_model(model2, optimizer2, criterion2, x_train, y_train, epochs, batch_size)
epoch_losses_model3 = train_and_evaluate_model(model3, optimizer3, criterion3, x_train, y_train, epochs, batch_size)

# Now, if you wish to plot the training losses for each model, you can use the stored `epoch_losses_model1`, `epoch_losses_model2`, `epoch_losses_model3`.


[6.4] **TODO** Test your model.  Initiate the model.eval() along with torch.no_grad() to turn off the gradients.


In [None]:
def evaluate_model(model, x_train, y_train, x_test, y_test, batch_size):
    model.eval()  # Set the model to evaluation mode
    total_train, correct_train, total_test, correct_test = 0, 0, 0, 0

    with torch.no_grad():  # Turn off gradients for evaluation
        # Evaluate on training data
        for images, labels in DataLoader(TensorDataset(x_train, y_train), batch_size=batch_size):
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total_train += labels.size(0)
            correct_train += (predicted == labels.argmax(dim=1)).sum().item()

        # Evaluate on testing data
        for images, labels in DataLoader(TensorDataset(x_test, y_test), batch_size=batch_size):
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total_test += labels.size(0)
            correct_test += (predicted == labels.argmax(dim=1)).sum().item()

    # Calculate and print accuracy
    train_accuracy = correct_train / total_train * 100
    test_accuracy = correct_test / total_test * 100
    print(f'Model {model.__class__.__name__} Training Accuracy: {train_accuracy:.2f}%')
    print(f'Model {model.__class__.__name__} Testing Accuracy: {test_accuracy:.2f}%\n')

# Assuming x_train, y_train, x_test, y_test, and batch_size are defined,
# and model1, model2, model3 are properly initialized.
evaluate_model(model1, x_train, y_train, x_test, y_test, batch_size)
evaluate_model(model2, x_train, y_train, x_test, y_test, batch_size)
evaluate_model(model3, x_train, y_train, x_test, y_test, batch_size)


# 7. Analyse Results

[7.1] **TODO** Display the performance of your model on the training and testing sets

In [None]:
evaluate_model(model1, x_train, y_train, x_test, y_test, batch_size)
evaluate_model(model2, x_train, y_train, x_test, y_test, batch_size)
evaluate_model(model3, x_train, y_train, x_test, y_test, batch_size)


[7.2] **TODO** Plot the learning curve of your model

In [None]:
import matplotlib.pyplot as plt

epochs_range = range(1, epochs + 1)

plt.figure(figsize=(12, 7))  # Adjust the figure size as necessary

# Plot for Model 1
plt.plot(epochs_range, epoch_losses_model1, label='Model 1 Training Loss')

# Plot for Model 2
plt.plot(epochs_range, epoch_losses_model2, label='Model 2 Training Loss')

# Plot for Model 3
plt.plot(epochs_range, epoch_losses_model3, label='Model 3 Training Loss')

plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss Over Epochs for All Models')
plt.legend()
plt.show()


[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
import torch
from torch.utils.data import DataLoader, TensorDataset

def plot_confusion_matrix(model, x_test, y_test, batch_size, num_classes):
    model.eval()  # Set the model to evaluation mode
    y_true = [label.argmax().item() for label in y_test]
    y_pred = []

    with torch.no_grad():
        for images, _ in DataLoader(TensorDataset(x_test, y_test), batch_size=batch_size):
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            y_pred.extend(predicted.tolist())

    cm = confusion_matrix(y_true, y_pred, labels=range(num_classes))

    plt.figure(figsize=(10, 10))
    sns.heatmap(cm, annot=True, fmt='g', cmap='Blues', xticklabels=range(1, num_classes + 1), yticklabels=range(1, num_classes + 1))
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.title(f'Confusion Matrix for {model.__class__.__name__}')
    plt.show()

# Assuming x_test, y_test, batch_size, and num_classes are defined and correct
# Call the function for each model
plot_confusion_matrix(model1, x_test, y_test, batch_size, num_classes)
plot_confusion_matrix(model2, x_test, y_test, batch_size, num_classes)
plot_confusion_matrix(model3, x_test, y_test, batch_size, num_classes)
