# Assignment 2 (due Dec 9th at 11:59pm)

This assignment aims to explore the impact of using mini-batches in training Multilayer Perceptron Networks.  The assignment is worth a total of **100 points**.  Students shall submit a copy of this notebook (`.ipynb`) via [Gradescope](https://gradescope.com), including **well-documented answers** (commented code and any usage of GenAI tools).

We expect students run the assignment on Google Colab notebooks with a **GPU runtime**. You can go to *Edit/Notebook settings* and change the *hardware accelerator* to GPU.

>  **Important**: Make sure all cells are executed before saving/downloading a copy of the notebook you will submit.

In [None]:
#######################################################
# TODO: Add all your imports here under this comment
#######################################################

## Question 1: MLP Class (20 points)
Define a class `MLP` that implements a Multilayer Perceptron network.
The class should have the following methods:
- `__init__(self, in_dim, out_dim, hid_layers)`

Initializes the MLP with the given parameters. `in_dim` is the input dimension, `out_dim` is the output dimension, and `hid_layers` is a list of integers representing the number of neurons in each hidden layer.  The number of hidden  layers should be the length of this list. The output layer should use a linear activation function, and all hidden layers should use a `ReLU` activation function.

- `forward(self, x)`

Performs the forward pass. The method should return the output of the network for a given mini-batch.

In [None]:
#######################################################
# TODO: Add your class definition under this comment
#######################################################

## Question 2: Test function (15 points)

Define a function that evaluates a trained MLP using mini-batches. The function should have the following signature:

```python
def test(model, criterion, loader, device)
```

- `model` is a trained model
- `criterion` is the loss function
- `loader` is a data loader
- `device` is the device where the model is stored

The function should return the average loss and average accuracy of the model using the given data loader.

Consider that PyTorch requires `model.eval()` to set the model in evaluation mode and `torch.no_grad()` is recommended to disable gradient computation during inference.

In [None]:
#######################################################
# TODO: Add your function definition under this comment
#######################################################

## Question 3: Train function (25 points)

Define a function that trains the MLP using mini-batches. The function should have the following signature:

```python
def train(model, criterion, optimizer, tr_loader, va_loader, n_epochs, device)
```

- `model` is an instance of the MLP class
- `criterion` is the loss function
- `optimizer` is the optimization algorithm
- `tr_loader` is the training data loader
- `va_loader` is the validation data loader
- `n_epochs` is the number of epochs to train the model
- `device` is the device where the model should be trained

The function should return four lists, respectively containing the training loss, training accuracy, validation loss, and validation accuracy at each epoch. Make sure to report/print these values at every iteration of the training loop (print one line per epoch).

Consider that PyTorch requires `model.train()` to set the model in training mode.  If you are calling `test` inside the training loop, you should set the model back to training mode after calling `test`.  Additionally, for every mini-batch inside the training loop, you should send it to the device using the `.to(device)` method.

In [None]:
#######################################################
# TODO: Add your function definition under this comment
#######################################################

# Question 4: Loading a Dataset function (15 points)

Define a function that loads the FashionMNIST dataset. The function should have the following signature:

```python
def load_dataset(batch_size)
```

- `batch_size` is the size of the mini-batches

The function should download/load the FashionMNIST dataset, preprocess it, and return the training, validation, and test data loaders.  

The training data loader should only include a *subset of 10000 datapoints* drawn randomly from the *train* partition of the FashionMNIST dataset (there are 60000 samples in this partition).  This function should also split the *test* partition of the FashionMNIST dataset evenly (there are 10000 samples in this partition) into validation (50%) and test (50%).

The training data loader should shuffle the data, while the validation and test data loaders should not shuffle the data.  

Optionally, inside this function you may want to divide all values by 255 to normalize the data as the FashionMNIST images range from 0 to 255.

At the end of the function include a `print` statement that reports the shapes of the input and target tensors for each data loader.  We expect to see 10000, 5000, and 5000 samples on the train, valid, and test data loaders respectively.

In [None]:
#######################################################
# TODO: Add your function definition under this comment
#######################################################

## Question 5: Experiment 1 (5 points)

1. Create a variable that determines the device to be used (CPU or GPU) as follows:

- `device = 'cuda' if torch.cuda.is_available() else 'cpu'`

2. Create a configuration dictionary `config` with the following key-value pairs:

- `input_size`: 28 * 28
- `output_size`: 10
- `hidden_layers`: [256, 64]
- `batch_size`: 64
- `n_epochs`: 20
- `learning_rate`: # Tune a few values and choose the best one by looking at the validation accuracies (use a small number of epochs and perhaps a small subset of the data to speed up this process)

3. Define a `criterion` and an `optimizer` as follows:

- `criterion = nn.CrossEntropyLoss()`
- `optimizer = optim.Adam(model.parameters(), lr=config['learning_rate'])`

4. Load the FashionMNIST data, create a model, send the model to the proper device, and finally train the model.  For all these steps, you MUST use the class and functions you created in the previous questions.

5. Create a plot for the training and validation losses, and separately a plot for the training and validation accuracies.  The x-axis on both plots is the total number of epochs.

5. Finally, evaluate the model on the test set and print the returned average loss and accuracy.

In [None]:
#######################################################
# TODO: Add your code under this comment
#######################################################

## Question 6: Experiment 2 (5 points)

Repeat the experiment above with the following configuration:

- `input_size`: 28 * 28
- `output_size`: 10
- `hidden_layers`: [256, 64]
- `batch_size`: 512
- `n_epochs`: 20
- `learning_rate`: # Tune a few values ...

In [None]:
#######################################################
# TODO: Add your code under this comment
#######################################################

## Question 7: Experiment 3 (5 points)

Repeat the experiment above with the following configuration:

- `input_size`: 28 * 28
- `output_size`: 10
- `hidden_layers`: [256, 64]
- `batch_size`: 4096
- `n_epochs`: 20
- `learning_rate`: # Tune a few values ...

In [None]:
#######################################################
# TODO: Add your code under this comment
#######################################################

# Question 8: Analysis (10 points)
Provide a comprehensive analysis of the results obtained in the experiments above.  Discuss the impact of using different mini-batch sizes and the effect of multiple learning rates on the training process, in particular, in regards to the batch size.  Finally, indicate any final thoughts on the experiments, what are your preferred configurations, and why.


\<TODO: ADD your answer here -- double click to edit this cell>