# Module 1 - Implementing and training a neural network

## Environment verification
Start by confirming you have PyTorch, TorchVision and TensorBoard installed.


In [1]:
import torch
import torchvision
from torch.utils.data import DataLoader

## Dataset
The used dataset is the well-known MNIST, which is composed of images of handwritten digits with 28 pixels wide and 28 pixels high.

The goals of most of the models using this dataset is to classify the digit of the image, which is our case.

Download the training and validation dataset:

In [2]:
training_set: torch.utils.data.Dataset = torchvision.datasets.MNIST("./data", train=True, download=True, transform=torchvision.transforms.ToTensor())
validation_set: torch.utils.data.Dataset = torchvision.datasets.MNIST("./data", train=False, download=True, transform=torchvision.transforms.ToTensor())

## Question 1 - MLP evaluation

Import the example MLP:

In [3]:
from bobnet import BobNet

Create an instance of this model:

In [4]:
model1 = BobNet()

Define the hyperparameters for this model:

In [5]:
# batch size
MLP_BATCH_SIZE=64

# learning rate
MLP_LEARNING_RATE=0.001

# momentum
MLP_MOMENTUM=0.9

# training epochs to run
MLP_EPOCHS=10

Create the training and validation dataloaders from the datasets downloaded earlier:

In [6]:
# create the training loader
mlp_training_loader = DataLoader(training_set, batch_size=MLP_BATCH_SIZE, shuffle=True) 

# create the validation loader
mlp_validation_loader = DataLoader(validation_set, batch_size=MLP_BATCH_SIZE, shuffle=True)

Define the loss function and the optimizer:

In [7]:
mlp_loss_fn = torch.nn.CrossEntropyLoss()

mlp_optimizer = torch.optim.SGD(model1.parameters(), lr=MLP_LEARNING_RATE, momentum=MLP_MOMENTUM)

Run the training and validation:

In [8]:
import utils

# how many batches between logs
LOGGING_INTERVAL=100

utils.train_model(model1, MLP_EPOCHS, mlp_optimizer, mlp_loss_fn, mlp_training_loader, mlp_validation_loader, LOGGING_INTERVAL)

Epoch 0 (99/938): training_loss = 2.3245476737166895
Epoch 0 (199/938): training_loss = 2.311334980193095
Epoch 0 (299/938): training_loss = 2.3059090244331486
Epoch 0 (399/938): training_loss = 2.3022876287761487
Epoch 0 (499/938): training_loss = 2.299366777072211
Epoch 0 (599/938): training_loss = 2.2966790517701927
Epoch 0 (699/938): training_loss = 2.2939833336122044
Epoch 0 (799/938): training_loss = 2.2911752616061136
Epoch 0 (899/938): training_loss = 2.288015873466636
Epoch 0 (99/157): validation_loss = 2.273643732070923
Epoch 1 (99/938): training_loss = 2.2649861995619958
Epoch 1 (199/938): training_loss = 2.2487360820099336
Epoch 1 (299/938): training_loss = 2.2371010915890186
Epoch 1 (399/938): training_loss = 2.227154220853533
Epoch 1 (499/938): training_loss = 2.2175696302272514
Epoch 1 (599/938): training_loss = 2.2068267899482996
Epoch 1 (699/938): training_loss = 2.19608653938992
Epoch 1 (799/938): training_loss = 2.183740225840868
Epoch 1 (899/938): training_loss = 2.

tensor(1.6506, device='cuda:0')

### Questions
Explore the architecture on the script `mod1/bobnet.py`.
1. Why does the input layer have 784 input features?
2. Why does the output layer have 10 output features?
3. What would happen if the dataset had a ratio of 100 samples of the number 7 to 1 sample of the number 1?

## Question 2 - CNN implementation

Head over to the `cnn.py` file and implement a convolutional architecture capable of surpassing the MLP results.

Now, import the model:

In [9]:
from cnn import CNN

Create an instance of this model:

In [10]:
model2 = CNN()

NotImplementedError: Define the layers here!

Train the model:

In [None]:
# TODO: run training

### Questions

1. The tanh and softmax activation functions are differentiable. Can you explain why not using non-differentiable functions?
2. What changed in the results when comparing with the MLP? Do you have any guess why?
3. What results would you expect if you used an attention mechanism like CBAM (Convolutional Block Attention Module)? What do these mechanisms do?
4. Why does the MLP implementation start with a `torch.nn.Flatten` layer? Was it needed in the CNN?