# D2Lab-C. Building a Model

## About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.0 (01/02/2025)

**Requirements:**
- Python 3
- Matplotlib
- Numpy
- Pandas
- Torch
- Torchmetrics

## 0. Imports and CUDA

In [None]:
# Matplotlib
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
# Numpy
import numpy as np
# Pandas
import pandas as pd
# Torch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torchmetrics.classification import BinaryAccuracy
# Helper functions (additional file)
from helper_functions import *
#from hidden_functions import *

In [None]:
# Use GPU if available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

## 0. Before we start

Please copy-paste the Notebook A and Notebook B codes for your *CustomDataset* and *WeirdLayer* classes below.

In [None]:
class CustomDataset(Dataset):
    pass

In [None]:
class WeirdLayer(nn.Module):
    pass

We will also reload the dataset from earlier in Notebook 1-A.

In [None]:
# Dataset parameters
np.random.seed(17)
min_val = -1
max_val = 1
n_points = 1000
# Load dataset from file
excel_file_path = 'dataset.xlsx'
val1_list, val2_list, inputs, outputs = load_dataset(excel_file_path = excel_file_path)
# Visualize data in arrays
print(inputs.shape, outputs.shape)
print("Number of samples with class 0:", len(outputs) - sum(outputs))
print("Number of samples with class 1:", sum(outputs))
# Visualize the dataset
plot_dataset(min_val, max_val, val1_list, val2_list, outputs)

The cell below also needs to be updated with your solution for the Dataloader part in Notebook A.

In [None]:
# Create Dataset object
pt_dataset = CustomDataset()
# Define batch size
batch_size = 128
# Create DataLoader object
pt_dataloader = DataLoader(None)

## 5. Defining a Neural Network for this task - Part1: Architecture and Forward Propagation

In this section, we will establish our Neural Network model for this task. The architecture will consist of:
- One WeirdOperation Layer (weird_layer), which takes 2 inputs and produces 32 outputs,
- First Linear Layer (hidden1): Takes 32 inputs and produces 64 outputs, followed by a GELU activation function.
- Second Linear Layer (hidden2): Takes 64 inputs and produces 32 outputs, followed by a GELU activation function.
- Third Linear Layer (hidden3): Takes 32 inputs and produces 16 outputs, followed by a GELU activation function.
- Final Linear Layer (fc): Takes 16 inputs and produces 1 output.
- Final Activation (sigmoid): Applies a Sigmoid activation to map the output, making it suitable for binary classification.

Our Neural Network will use Binary Cross-Entropy as a loss functions (to be stored in the self.loss attribute) and the binary accuracy from the torchmetrics library, stored in the self.accuracy attribute. 

<div class="alert alert-block alert-info">
<b>Question 10:</b> Show your final code for the NeuralNetwork class in your report.
</div>

The code below, will describe our neural network model and a few None variables need to be replaced.

In [None]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.weird_layer = None
        self.hidden1 = None 
        self.hidden2 = None
        self.hidden3 = None 
        self.fc = None 
        self.activation = None
        self.sigmoid = None
        self.loss = None
        self.accuracy = None

    def forward(self, x):
        return None

In [None]:
# Create Neural Network model
model = NeuralNetwork()

<div class="alert alert-block alert-info">
<b>Question 11:</b> What is the purpose of the Sigmoid activation used in the final layer? Why can't we simply use a Weird layer as the final operation in the forward method?
</div>

In [None]:
print(model)

## 6. Defining a Neural Network for this task - Part2: BackPropagation and Training

Have a look at the code below. The code:
- Initializes our model.
- Initializes an Adam Optimizer, and asks for 15 iterations of the forward-backprop.
- For each mini-batch of data drawn on each iteration, it produces predictions, calculates a loss and calculates the binary accuracy on said samples.
- It then backpropagates on the *loss_value* and adjusts the model parameters using *optimizer.step()*.
- Finally, it shows a nice display.
- 
<div class="alert alert-block alert-info">
<b>Question 12:</b> Why do we use nn.BCELoss(), instead of the Mean Square Error loss for binary classification? 
</div>

In [None]:
# Create Neural Network model
model = NeuralNetwork().to(device)

# Gradient descent parameters: optimizers, repetitions, etc.
num_epochs = 15
optimizer = torch.optim.Adam(model.parameters(),
                             lr = 1000,
                             betas = (0.9, 0.999),
                             eps = 1e-08)
optimizer.zero_grad()

for epoch in range(num_epochs):
    for batch in pt_dataloader:
        # Unpack the mini-batch data
        inputs_batch, outputs_batch = batch
        outputs_re = outputs_batch.to(device).reshape(-1, 1)
        inputs_re = inputs_batch.to(device)
        
        # Forward pass
        pred = model(inputs_re)
        loss_value = model.loss(pred.float(), outputs_re.float())
        # Compute binary accuracy
        binary_accuracy_value = model.accuracy(pred, outputs_re)
    
        # Backward pass and optimization
        loss_value.backward()
        optimizer.step()
        optimizer.zero_grad()
        
    # Print loss and accuracy
    print(f'Epoch [{epoch+1}/{num_epochs}], Training Loss: {loss_value.item():.4f}, Training Accuracy: {binary_accuracy_value.item():.4f}')

<div class="alert alert-block alert-info">
<b>Question 13:</b> When running the cell above, it seems the model is not capable of achieving a great accuracy. If anything, it seems to remain stuck at an accuracy of 30% or so. Should we make adjustments in one of our hyperparameters (e.g. the learning rate)? Why is it important to test some values of the hyperparameters?
</div>

<div class="alert alert-block alert-info">
<b>Question 14:</b> Assuming you figured out what was wrong in Question 13, show in your report how you resolved the problem in the code above.
Your model should be able to produce a final training accuracy above 90%.
</div>

<div class="alert alert-block alert-info">
<b>Question 15:</b> How can we improve the generalization ability of a model? Please list at least two methods and explain their principles.
</div>

<div class="alert alert-block alert-info">
<b>Question 16:</b> Having figured out how to prove/disprove generalization in Question 15, we leave the rest of the notebook cells for you to play with the code and figure out how to prove that your model is indeed capable of generalization (or not). Show your code, your results and how it matches the reasoning you described in Question 16. Conclude and show the performance of your final model!
</div>