<!--NAVIGATION-->
# < [Regression](3_Regression_Gradient_Descent.ipynb) | MLP for Digit Classification | [CNN on CIFAR10](5_CNN_CIFAR.ipynb) >

# PyTorch Modules

### What is this notebook about?

In this notebook, we will learning about PyTorch modules and the great functionalities they provide. Later on, we'll create a small a multilayer perceptron to perform image classification on MNIST.

___

In [1]:
import torch
import torch.nn as nn

print("Torch version:", torch.__version__)

Torch version: 1.10.0+cpu


In [2]:
import matplotlib.pyplot as plt

In PyTorch, there are many predefined layer like Convolutions, RNN, Pooling, Linear, etc.

These functions are wrapped in **modules** and inherit from the **torch.nn.Module** base class.

When designing a custom model in PyTorch, you should follow this strategy and derive your class from **torch.nn.Module**.

For more information about PyTorch Modules see:
<!--NAVIGATION-->
## [Modules](App-PyTorch-Modules.ipynb)

___

## Simple network for image classification

### Let's create a Multi Layer Perceptron Network (MLP)

Implement a simple multilayer perceptron with two hidden layers and the following structure:

![](https://raw.githubusercontent.com/ledell/sldm4-h2o/master/mlp_network.png)

- Input-size: *input_size*
- 1st hidden layer: 75
- 2nd hidden layer: 50
- Output layer: *num_classes*

Additionally, we use `ReLU`s as activation functions.

You will need some PyTorch NN modules - Find them in the [PyTorch doc](https://pytorch.org/docs/master/nn.html) (especially nn.Linear)!

In [3]:
from torch.nn import Parameter
import torch.nn.functional as F  # provides some helper functions like Relu's, Sigmoids, Tanh, etc.


class MyMultilayerPerceptron(nn.Module):
    def __init__(self, input_size, h1_dim, h2_dim, num_classes):
        super(MyMultilayerPerceptron, self).__init__()
        
        self.input_size = input_size
        self.num_classes = num_classes
        
        self.linear_1 = nn.Linear(input_size, 75)
        self.linear_2 = nn.Linear(h1_dim, h2_dim)
        self.linear_3 = nn.Linear(50, num_classes)
        
    
    def forward(self, x):
        x = F.relu(self.linear_1(x))
        x = F.relu(self.linear_2(x))
        x = F.relu(self.linear_3(x))
        return F.softmax(x)

### Print your network's parameters

In [4]:
model = MyMultilayerPerceptron(784, 75, 50, 10)
print(model)

MyMultilayerPerceptron(
  (linear_1): Linear(in_features=784, out_features=75, bias=True)
  (linear_2): Linear(in_features=75, out_features=50, bias=True)
  (linear_3): Linear(in_features=50, out_features=10, bias=True)
)


### Feed an input to your network

In [5]:
x = torch.rand(1, 784)  # the first dimension is reserved for the 'batch_size'
out = model(x)  # equivalent to model.forward(x)
out[0, :]



tensor([0.1174, 0.1020, 0.1117, 0.0969, 0.0941, 0.0941, 0.0941, 0.0941, 0.0986,
        0.0971], grad_fn=<SliceBackward0>)

___

## Training a model

Most of the functions to train a model follow a similar pattern in PyTorch.
In most of the cases in consists of the following steps:
- Loop over data (in batches)
- Forward pass
- Zero gradients!
- Backward pass
- Parameter update (Optimizer)

In [6]:
def train(model, num_epochs, data_loader, device):
    model = model.to(device)
    
    # Define the Loss function and Optimizer that you want to use
    criterion = nn.CrossEntropyLoss()  
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)  # NOTE: model.parameters()
    
    # Outter training loop
    for epoch in range(num_epochs):
        # Inner training loop
        cum_loss = 0
        for (inputs, labels) in data_loader:
            # Prepare inputs and labels for processing by the model (e.g. reshape, move to device, ...)
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            # original shape is [batch_size, 28, 28] because it's an image of size 28x28
            inputs = inputs.view(-1, 28*28)

            # Do Forward -> Loss Computation -> Backward -> Optimization
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            cum_loss += loss.item()
        print("Epoch %d, Loss=%.4f" % (epoch+1, cum_loss/len(train_loader)))

Note:
- we can use the `.to` function on the model directly. Indeed, since PyTorch knows all the model parameters, it can put all the parameters on the correct device.
- we use `model.parameters()` to get all the parameters of the model and we can instantiate an optimizer that will optimize these parameters `torch.optim.SGD(model.parameters())`.
- to apply the forward function of the module, we write `model(input)`. In most cases, `model.forward(inputs)` would also work, but there is a slight difference : PyTorch allows you to register hook functions for a model that are automatically called when you do a forward pass on your model. Using `model(input)` will call these hooks and then call the forward function, while using `model.forward(inputs)` will just silently ignore them.

Do you feel the power of Modules ?

## Loss functions

PyTorch comes with a lot of predefined loss functions :
- L1Loss
- MSELoss
- CrossEntropyLoss
- NLLLoss
- PoissonNLLLoss
- KLDivLoss
- BCELoss
- MarginRankingLoss
- HingeEmbeddingLoss
- MultiLabelMarginLoss
- CosineEmbeddingLoss
- TripletMarginLoss
- ...

Check out the [PyTorch Documentation](https://pytorch.org/docs/master/nn.html#loss-functions).

___

## Let's train our model on the MNIST digit classification task


![MNIST](figures/mnist.jpeg)

First, we have to load the training and test images. MNIST is a widely used dataset, therefore the torchvision package provides simple functionalities to load images from it.

In [7]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms

batch_size = 64

# MNIST Dataset (Images and Labels)
train_dataset = datasets.MNIST(root='../data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='../data', train=False, transform=transforms.ToTensor())

# Dataset Loader (Input Batcher)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

In PyTorch, `Dataset` and `Dataloaders` are classes that can help to quickly define how to access and iterate over your data. This is specially interesting when your data is distributed over several files (for instance, if you have several images in some directory structure).

In [8]:
# Number of parameters, without bias
784*75+75*50+50*10

63050

### Call the actual training function

In [9]:
%%time
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MyMultilayerPerceptron(input_size=784, h1_dim=75, h2_dim=50, num_classes=10)
num_epochs = 50

train(model, num_epochs, train_loader, device)



Epoch 1, Loss=1.7481
Epoch 2, Loss=1.6947
Epoch 3, Loss=1.6770
Epoch 4, Loss=1.6630
Epoch 5, Loss=1.6575
Epoch 6, Loss=1.6528
Epoch 7, Loss=1.6498
Epoch 8, Loss=1.6465
Epoch 9, Loss=1.6447
Epoch 10, Loss=1.6431
Epoch 11, Loss=1.6413
Epoch 12, Loss=1.6399
Epoch 13, Loss=1.6388
Epoch 14, Loss=1.6373
Epoch 15, Loss=1.6369
Epoch 16, Loss=1.6359
Epoch 17, Loss=1.6353
Epoch 18, Loss=1.6346
Epoch 19, Loss=1.6343
Epoch 20, Loss=1.6336
Epoch 21, Loss=1.6332
Epoch 22, Loss=1.6332
Epoch 23, Loss=1.6326
Epoch 24, Loss=1.6327
Epoch 25, Loss=1.6317
Epoch 26, Loss=1.6315
Epoch 27, Loss=1.6314
Epoch 28, Loss=1.6316
Epoch 29, Loss=1.6304
Epoch 30, Loss=1.6306
Epoch 31, Loss=1.6308
Epoch 32, Loss=1.6302
Epoch 33, Loss=1.6301
Epoch 34, Loss=1.6301
Epoch 35, Loss=1.6298
Epoch 36, Loss=1.6301
Epoch 37, Loss=1.6297
Epoch 38, Loss=1.6297
Epoch 39, Loss=1.6296
Epoch 40, Loss=1.6294
Epoch 41, Loss=1.6298
Epoch 42, Loss=1.6290
Epoch 43, Loss=1.6295
Epoch 44, Loss=1.6288
Epoch 45, Loss=1.6293
Epoch 46, Loss=1.62

### How can we now assess the model's performance?

This function loops over another `data_loader` (usually containing test/validation data) and computes the model's accuracy on it.

In [10]:
def accuracy(model, data_loader, device):
    with torch.no_grad(): # during model evaluation, we don't need the autograd mechanism (speeds things up)
        correct = 0
        total = 0
        for inputs, labels in data_loader:
            inputs = inputs.to(device)     
            inputs = inputs.view(-1, 28*28)
            
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            
            correct += (predicted.cpu() == labels).sum().item()
            total += labels.size(0)
            
    acc = correct / total
    return acc

In [11]:
accuracy(model, test_loader, device)  # look at: accuracy(model, train_loader, device)



0.7935

____

## How can we now store our trained model?

In [12]:
# torch.save(model, "my_model.pt")

In [13]:
# my_model_loaded = torch.load("my_model.pt")

In [14]:
# model.linear_3.bias, my_model_loaded.linear_3.bias

____

<!--NAVIGATION-->
# < [Regression](3_Regression_Gradient_Descent.ipynb) | MLP for Digit Classification | [CNN on CIFAR10](5_CNN_CIFAR.ipynb) >