# NNTI Assignment 8 (Q8.4)

Name 1: <br>
Student id 1: <br>
Email 1: <br>

Name 2: <br>
Student id 2:  <br>
Email 2:  <br>

Name 3: <br>
Student id 3:  <br>
Email 3: <br>

**Instructions:** Read each question carefully. <br/>
Make sure you appropriately comment your code wherever required. Your final submission should contain the completed Notebook and the respective  files for any additional exercises necessary. There is no need to resubmit the data files should they be provided separately. <br>


Upload the zipped folder on CMS. Please follow the naming convention of **Name1_id1_Name2_id2_Name3_id3.zip **. Only one member of the group should make the submisssion.


In this exercise you will build your own neural networks, but this time you need to add regularization in the form of dropout, weight-decay and early-stopping.

Each layer should have the option of using dropout. Your code needs to allow for this flexibility.

Additionally, adding weight-decay and early-stopping should also be optional upon creation.

**NOTE**:
1. You are allowed to use built-in functions from pytorch to incorporate this functionality.

2. We recommend the use of GPUs or Google collab for this exercise.

3. During training and validation, remember when to use `model.train()` and `model.eval()`

Use the below imports, as usual you are allowed to import additional packages, but mention the reason you're using them

In [None]:
import torch
import torch.nn as nn
from torchvision import transforms
from torchvision import datasets
import matplotlib.pyplot as plt
import numpy as np

## a. Implement a regularized model [0.5 points]

In this task, you will implement a custom neural network model using PyTorch. The model should incorporate key features such as **dropout** to improve generalization and prevent overfitting.

**Tasks to implement**:

1. Define the Model Architecture:
  - The model consists of a series of fully connected (FC) layers with ReLU activations in between.
  - Dropout layers are added after each hidden layer, with the probability of dropout specified by the `dropout_p` parameter.
  - The final output layer produces a result that is passed through a Softmax activation for multi-class classification tasks.

**Hint**:
Since you're not implementing a CNN, but rather a simple ANN network, it is recommended to flatten your input images when pushing into the network.

In [None]:
class Model(nn.Module):
    """
    A neural network model incorporating dropout.

    Args:
        input_dim (int): Dimensionality of the input features.
        hidden_dim (int): Number of units in each hidden layer.
        out_dim (int): Number of output units (number of classes).
        num_layers (int): Number of hidden layers.
        dropout (list of bool): Specifies which hidden layers will have dropout.
        dropout_p (float): Dropout probability used for the Dropout layers.
    """

    def __init__(self, input_dim, hidden_dim, out_dim, num_layers, dropout, dropout_p):
      #TODO
      pass

    def forward(self, x):
      #TODO
      pass

### b. Data and code setup [1 + 0.25 + 0.25 = 1.5 points]

You will use the MNIST dataset for these experiments. The data setup has been provided for you.<br> **DO NOT CHANGE THE CODE HERE.**

In [None]:
# Load the data
# DO NOT CHANGE THE CODE IN THIS CELL
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_size = int(0.8 * len(mnist_train))  # 80% for training
val_size = len(mnist_train) - train_size  # 20% for validation

# Split the dataset into training and validation
train_dataset, val_dataset = torch.utils.data.random_split(mnist_train, [train_size, val_size])

train_dl = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
val_dl = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)

mnist_test = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
test_dl = torch.utils.data.DataLoader(mnist_test, batch_size=64, shuffle=False)

#### Training code
The `trainer()` function trains a model using the provided data loaders, criterion (loss function), optimizer, and various options for regularization and early stopping. You will implement this function for training models for the experiments.

Few things to keep in mind:
- The function should accept model, data loaders, loss function, optimizer, and training configurations (epochs, early stopping).
- The training loop should include forward pass, loss computation, backward pass, and weight update.
- Track and return average training and validation losses for each epoch.
- Use tqdm for progress bars during training and validation. (**optional**, but recommended)
- Implement **early stopping** to halt training if validation loss doesn't improve for a set number of epochs. Provide a `patience` parameter as the number of epochs to wait until validation loss improves.
  - Make it optional by passing a boolean param `early_stopping`.

In [None]:
def trainer(model, train_loader, val_loader, criterion, optimizer, epochs=50, early_stopping=False, patience=10):
    """
    Train the model with optional early stopping.

    Args:
        model (torch.nn.Module): The model to be trained.
        train_loader (DataLoader): The training data loader.
        val_loader (DataLoader): The validation data loader.
        criterion (loss function): The loss function.
        optimizer (Optimizer): The optimizer to use.
        epochs (int, optional): The number of epochs to train. Default is 50.
        early_stopping (bool, optional): Whether to apply early stopping. Default is False.
        patience (int, optional): The patience for early stopping. Default is 10.

    Returns:
        model (torch.nn.Module): The trained model.
        train_losses (list): List of average training losses per epoch.
        val_losses (list): List of average validation losses per epoch.
    """
    #TODO
    pass

#### Evaluation code

Complete the `plot_losses()` function and `evaluate_model()` to visualize the training and validation losses and to evaluate the model over the test set.

**NOTE**:
1. Add a legend, title, and grid to improve plot readability for `plot_losses()`
2. Report the average test loss, accuracy, and F1 score metrics using `evaluate_model()`.


In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, f1_score

def plot_losses(train_losses, val_losses):
    """
    Plot training and validation losses.

    Args:
        train_losses (list): List of average training losses per epoch.
        val_losses (list): List of average validation losses per epoch.
    """
    #TODO
    pass


def evaluate_model(model, test_loader, criterion):
    """
    Evaluate the model on the test set and report accuracy and F1 score.

    Args:
        model (torch.nn.Module): The trained model to be evaluated.
        test_loader (DataLoader): The test data loader.
        criterion (loss function): The loss function to use for evaluation.

    Returns:
        float: The average test loss.
        float: The accuracy of the model on the test set.
        float: The F1 score of the model on the test set.
    """
    #TODO
    pass

## c. Experiments: [0.25+0.25+0.25+0.25 = 1 point]
Build a deep network using 3 hidden layers, so in total including input and output layers, it shoudl be a 5-layer network. You will run the following 4 experiments on this network with the given configurations:

1. Deep network (at least 3 hidden layers)
2. Deep regularized network (with weight-decay enabled)
3. Deep regularized network (with weight-decay and dropout)
4. Deep regularized network (with weight-decay and early-stopping)

Report Accuracy and $F_1$ metrics on the `test set` for your experiments and discuss your results. What did you expect to see and what did you end up seeing.

**NOTE**:
- You can choose how you use regularization. Ideally you would experiment with various parameters for this regularization, the 4 listed variants are merely what you must cover as a minimum. You are free to run more experiments if you want to.
- In the end, report results for all your experiments on the test set concisely  in a table at the end.
- Use the Adam optimizer for all of your experiments.

### Experiment 1: Deep network (at least 3 hidden layers) (No Regularization)

Use the given model configs and hyperparams to run the experiments.

In [None]:
import torch.optim as optim

# Deep network (3 hidden layers) with no dropout and no weight-decay
model_1_config = {
    "input_dim": 28 * 28,
    "hidden_dim": 400,
    "out_dim": 10,
    "num_layers": 3,
    "dropout": [False, False, False],
    "dropout_p": 0.5
}


learning_rate = 5e-5
weight_decay = 0  # Use this only if weight-decay is needed

In [None]:
# Train the model


# Plot the training and validation losses


# Evaluate the model on the test set

### Experiment 2: Deep regularized network (with weight-decay enabled)

Use the given model configs to run the experiments.

In [None]:
# Deep network (3 hidden layers) with weight-decay but no dropout
model_2_config = {
    "input_dim": 28 * 28,
    "hidden_dim": 400,
    "out_dim": 10,
    "num_layers": 3,
    "dropout": [False, False, False],
    "dropout_p": 0.5
}


learning_rate = 5e-5
weight_decay = 1e-4  # Use this only if weight-decay is needed

In [None]:
# Train the model


# Plot the training and validation losses


# Evaluate the model on the test set

### Experiment 3: Deep regularized network (with weight-decay and dropout)

Use the given model configs to run the experiments.

In [None]:
# Deep regularized network (3 hidden layers) with weight-decay and dropout after every layer
model_3_config = {
    "input_dim": 28 * 28,
    "hidden_dim": 400,
    "out_dim": 10,
    "num_layers": 3,
    "dropout": [True, True, True],
    "dropout_p": 0.5
}

learning_rate = 5e-5
weight_decay = 1e-4  # Use this only if weight-decay is needed

In [None]:
# Train the model


# Plot the training and validation losses


# Evaluate the model on the test set

### Experiment 4: Deep regularized network (with weight-decay and early-stopping)

Use the given model configs to run the experiments.

In [None]:
# Deep regularized network (3 hidden layers) with weight-decay and early stopping
model_4_config = {
    "input_dim": 28 * 28,
    "hidden_dim": 400,
    "out_dim": 10,
    "num_layers": 3,
    "dropout": [False, False, False],
    "dropout_p": 0.5
}

learning_rate = 5e-5
weight_decay = 1e-4  # Use this only if weight-decay is needed

In [None]:
# Train the model


# Plot the training and validation losses


# Evaluate the model on the test set

In [None]:
#Report the model accuracies and F1-score on the test set