# Multilayer Perceptron

### Libraries and Variables

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
import numpy as np

import torch
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
from torch import nn
import torch.nn.functional as nnf

home_dir = os.path.expanduser('~')
raw_data_dir = os.path.join(home_dir, 'repos/DaNuMa2024/data/raw_data')
output_data_dir = os.path.join(home_dir, 'repos/DaNuMa2024/data/output_data')

### Overview

In this notebook, you will implement and train the simplest neural network architecture there is: a multilayer perceptron. You will see how it is able to solve non-linear function approximation problems. Also, you will demonstrate the advantage of validation during model training to prevent overfitting.\
Furthermore, the convolution layer will be introduced.

### Data

First, we load the toy dataset we use for model training and validation:

In [2]:
train_path = os.path.join(raw_data_dir, '3_mlp/train.csv')
val_path = os.path.join(raw_data_dir, '3_mlp/val.csv')

train_data = pd.read_csv(train_path, header=None)
val_data = pd.read_csv(val_path, header=None)

x_train = train_data.iloc[:, 0].to_numpy()
y_train = train_data.iloc[:, 1].to_numpy()
x_val = val_data.iloc[:, 0].to_numpy()
y_val = val_data.iloc[:, 1].to_numpy()

Visualize the data! (It is very unusual to have a bigger validation than training set. Don't do this in practice! :P It serves demonstration purposes here to get a smoother validation loss later)

In [None]:
######### YOUR CODE HERE:

### Dataset and model

Now define the dataset that loads the x and y values as tuples.
* The dataset is usually the point where other data types are transformed to tensors.
* Remember that a dataset must have a constructor as it is an object (\_\_init\_\_) as well as a \_\_len\_\_ and \_\_getitem\_\_ method.

In [4]:
####################### dataset
class MLPDataset(Dataset):
    ######### YOUR CODE HERE:
    pass

Now define the MLP. 
* Remember that a model must have a constructor as it is an object (\_\_init\_\_) as well as a forward method that calculates the model output.
* The layer sizes should be [1, 32, 64, 128, 256, 128, 64, 32, 1]. 
* The 1 at the beginning and end denote the input and output size of the model, which is one-dimensional in our case. 
* The numbers in-between denote the hidden layer sizes. So the network has 7 hidden layers.
* Bonus: If you want to define the network more elegantly, you can also pass the layer sizes to the constructor and define the network dynamically based on what layer sizes you provide as an argument to the network. You can make use of a loop together with a list and nn.Sequential to define the layers this way.

In [5]:
####################### model

########### elegant solution
class MLP(nn.Module):
    ######### YOUR CODE HERE:
    pass

### Training loop

Implement a simple training loop without validation. 
* Use the mean squared error as a loss function (mse_loss) and AdamW as the optimizer. 
* Store the average loss in every epoch in a list so that you can later plot it.

In [None]:
####################### Training loop without validation
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"The model is running on {device}.")

# training parameters
epochs = 4000
batch_size = 32
lr = 0.001

# instantiate dataset and dataloader
######### YOUR CODE HERE:

# instantiate model and optimizer
######### YOUR CODE HERE:

# training loop
all_train_losses = []
for epoch in tqdm(range(epochs)):
    ######### YOUR CODE HERE:
    pass

Now plot the training losses

In [None]:
######### YOUR CODE HERE:

Now plot the data together with the predictions 
* Hint: you can use the torch.linspace function to create a sequence of x values for which to generate model predictions for plotting.
* What do you observe? Is the model you obtained in the last epoch a good approximation for the underlying function?

In [None]:
######### YOUR CODE HERE:

### Training loop with validation

To prevent overfitting, it is often useful to also calculate the loss on a held out validation set that is not used during training from time to time. Your task is to implement this in the training loop.
* For clearer code, implement the training of one epoch and the validation in separate functions
* Implement a logic that saves the parameters of the model (torch.save(model.state_dict(), save_path)) whenever the validation loss decreases. This way you can later use the model with the best validation loss.
* This time, plot both the training and validation loss curve. What do you observe?
* Take the model with the best validation loss to make a model prediction. You can load saved parameters to a model via model.load_state_dict(torch.load(save_path)). How does the prediction look compared to the model obtained without validation?

In [9]:
def train_one_epoch(model, trainloader, optimizer, device):
    ######### YOUR CODE HERE:
    pass

def validate(model, valloader, device):
    ######### YOUR CODE HERE:
    pass

In [None]:
####################### Training loop with modular definition of training and validation
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"The model is running on {device}.")

# training parameters
epochs = 4000
batch_size = 32
lr = 0.001
val_interval = 1
save_path_state_dict = os.path.join(output_data_dir, '3_mlp/best_model_state_dict.pth')

# instantiate dataset and dataloader (train and val this time!)
######### YOUR CODE HERE:
trainloader = None
valloader = None

# instantiate model and optimizer
######### YOUR CODE HERE:
model = None
optimizer = None

# training loop
all_train_losses = []
all_val_losses = []
min_val_loss = float('inf')

for epoch in tqdm(range(epochs)):
    train_loss = train_one_epoch(model, trainloader, optimizer, device)
    all_train_losses.append(train_loss)

    if epoch % val_interval == 0:
        val_loss = validate(model, valloader, device)
        all_val_losses.append(val_loss)

    ######### YOUR CODE HERE:
    # Implement a logic to save the model state dict if the validation loss has decreased

In [None]:
####################### plot training and validation losses.
######### YOUR CODE HERE:

In [None]:
####################### plot train and val data together with the model predictions from the model with the best validation loss
######### YOUR CODE HERE:

# Introduction of the convolution layer

Let's take a look at a more complex model component, the so-called convolution layer. \
The convolution layer is an essential module when processing images with neural networks. \
Once again we set the bias to zero for simplicity and set all parameter values of the layer to 1. \
The convolution layer takes three arguments: in_channels, out_channels and kernel_size:

In [None]:
conv_layer = nn.Conv2d(in_channels=8, out_channels=8, kernel_size=3, bias=False)
torch.nn.init.constant_(conv_layer.weight, 1.0)
print(conv_layer.weight.shape)

The parameters of the convolution layer will have a shape of out_channels x in_channels x kernel_size x kernel_size (in this case 8x8x3x3). \
However, it is easier to think of these parameters as 8 separate "kernels" which each have a size of 8x3x3. \
When you apply the convolution layer on an input, these kernels will "slide" over the input and perform pairwise multiplication with the repsective part of the input. \
Since we initialized the weights of the kernel with all ones, the result will be the sum of the values at the respective part of the input. \
Also notice that the input size shrinks from 7 to 5 since no padding applied automatically. This would have to be specified as a separate argument when initializing the conv_layer. \
A really nice animated example of a convolution layer with the same input/output channels and kernel size can be found here: https://animatedai.github.io/

In [None]:
input = torch.ones(8, 7, 7)
output = conv_layer(input)
print(f'Shape of the output: {output.shape}')
print(f'As expected, the output values are just the sums of the values of the respective part of the input: \n{output}')

Bonus: Feel free to explore the documentation of the conv module or build your own mini convnet! Is there an argument that prevents the shrinking of the input size?
