# PyTorch Modules

### What is this notebook about?

In this notebook, we will learning about PyTorch modules and the great functionalities they provide. Later on, we'll create a small a multilayer perceptron to perform image classification on MNIST.

___

## Google Colab only!

In [1]:
# execute only if you're using Google Colab
!wget -q https://raw.githubusercontent.com/ahug/amld-pytorch-workshop/master/binder/requirements.txt -O requirements.txt
!pip install -qr requirements.txt

[33mYou are using pip version 18.1, however version 19.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


____

In [2]:
import torch
import torch.nn as nn

print("Torch version:", torch.__version__)

Torch version: 0.4.1.post2


In [3]:
import matplotlib.pyplot as plt

In PyTorch, there are many predefined layer like Convolutions, RNN, Pooling, Linear, etc.

These functions are wrapped in **modules** and inherit from the **torch.nn.Module** base class.

When designing a custom model in PyTorch, you should follow this strategy and derive your class from **torch.nn.Module**.

## Modules

In [4]:
print(torch.nn.Module.__doc__)

Base class for all neural network modules.

    Your models should also subclass this class.

    Modules can also contain other Modules, allowing to nest them in
    a tree structure. You can assign the submodules as regular attributes::

        import torch.nn as nn
        import torch.nn.functional as F

        class Model(nn.Module):
            def __init__(self):
                super(Model, self).__init__()
                self.conv1 = nn.Conv2d(1, 20, 5)
                self.conv2 = nn.Conv2d(20, 20, 5)

            def forward(self, x):
               x = F.relu(self.conv1(x))
               return F.relu(self.conv2(x))

    Submodules assigned in this way will be registered, and will have their
    parameters converted too when you call `.cuda()`, etc.
    


### Modules are doing a lot of "magic" under the hood.

- It registers all the parameters of your model.
- It simplifies the saving/loading of your model.
- It provides helper functions to reset/freeze/update the gradients.
- It provides helper functions to put all parameters on a device (GPU).

### What is a torch.nn.Parameter?

A Parameter is a Tensor with `requires_grad` to `True` by default, and which is automatically added to the list of parameters when used within a model.

Let's have a look at the documentation ([torch.nn.Parameter](https://pytorch.org/docs/stable/_modules/torch/nn/parameter.html))

In [5]:
print(torch.nn.Parameter.__doc__)

A kind of Tensor that is to be considered a module parameter.

    Parameters are :class:`~torch.Tensor` subclasses, that have a
    very special property when used with :class:`Module` s - when they're
    assigned as Module attributes they are automatically added to the list of
    its parameters, and will appear e.g. in :meth:`~Module.parameters` iterator.
    Assigning a Tensor doesn't have such effect. This is because one might
    want to cache some temporary state, like last hidden state of the RNN, in
    the model. If there was no such class as :class:`Parameter`, these
    temporaries would get registered too.

    Arguments:
        data (Tensor): parameter tensor.
        requires_grad (bool, optional): if the parameter requires gradient. See
            :ref:`excluding-subgraphs` for more details. Default: `True`
    


In [6]:
mod = nn.Conv1d(10, 2, 3)
print(mod.weight)

Parameter containing:
tensor([[[ 0.0236,  0.0067,  0.0104],
         [ 0.0949,  0.0441, -0.0942],
         [ 0.0506, -0.1818, -0.0155],
         [-0.0487,  0.0186, -0.0384],
         [ 0.1200,  0.0932, -0.0640],
         [ 0.0726,  0.0050,  0.0149],
         [-0.1086,  0.0171,  0.0924],
         [ 0.0247,  0.0874,  0.1692],
         [-0.1381,  0.1730, -0.0163],
         [-0.1236,  0.0842,  0.1532]],

        [[ 0.0724,  0.1428, -0.0615],
         [ 0.1243,  0.1041,  0.1055],
         [ 0.1430, -0.1725, -0.0646],
         [-0.0439,  0.1180, -0.0785],
         [ 0.0135,  0.1214,  0.1648],
         [-0.0340,  0.1622,  0.1039],
         [ 0.0864,  0.0921,  0.0904],
         [-0.0386, -0.1658,  0.1679],
         [ 0.0398, -0.0278,  0.1036],
         [-0.0038, -0.0723,  0.0673]]], requires_grad=True)


___

## Very simple example of a module

A module has to implemented the `forward` function which is executed during the forward pass.

All your model's submodules and parameters should be instantiated in the `__init__` function. This way PyTorch know that they exist and registers them.

In [7]:
# A simple module
class MySuperSimpleModule(nn.Module):
    def __init__(self, input_size, num_classes):
        super(MySuperSimpleModule, self).__init__()
        self.linear = nn.Linear(input_size, num_classes)
    
    def forward(self, x):
        out = self.linear(x)
        return out

You can use the print function to list a model's submodules/parameters:

In [8]:
model = MySuperSimpleModule(input_size=20, num_classes=5)
print(model)

MySuperSimpleModule(
  (linear): Linear(in_features=20, out_features=5, bias=True)
)


You can use **`model.parameters()`** to get the list of parameters of your model automatically inferred by PyTorch.

In [9]:
for name, p in model.named_parameters():  # Here we use a sligtly different version of the parameters() function
    print(name, ":\n", p)                 # which also returns the parameter name

linear.weight :
 Parameter containing:
tensor([[ 0.2004,  0.0563,  0.1745,  0.0697,  0.0947,  0.1832, -0.0452, -0.0017,
         -0.1560,  0.1838,  0.1933, -0.1326,  0.1312,  0.1696, -0.1486,  0.1382,
         -0.2122,  0.0248,  0.0662,  0.0797],
        [ 0.1165,  0.0055, -0.0324,  0.1490,  0.1233, -0.0380,  0.0742,  0.1925,
         -0.0545,  0.0063,  0.0238, -0.1700, -0.2219,  0.0376, -0.1953, -0.0789,
          0.1992,  0.1166,  0.0563,  0.0777],
        [-0.1723,  0.0720,  0.2057,  0.1241, -0.0255, -0.0383, -0.2165,  0.1208,
          0.1533, -0.2165,  0.1670, -0.0550, -0.0251, -0.0367, -0.0623, -0.1304,
         -0.1005,  0.0034, -0.0318, -0.0436],
        [ 0.2082, -0.1200,  0.1100,  0.1590, -0.0658, -0.0524, -0.1643,  0.2146,
         -0.0387,  0.1262,  0.1354,  0.1369, -0.0160,  0.1223,  0.0821,  0.0104,
         -0.2074, -0.0318, -0.0194,  0.0273],
        [-0.0614, -0.1570,  0.1563, -0.1889,  0.0329, -0.0987, -0.1242, -0.0954,
         -0.1158,  0.0420, -0.0477, -0.1129, -0.

___

## Simple network for image classification

![We need to go depper](figures/deeper.jpeg)

## Your turn!

### Let's create a more complicated model.

Implement a simple multilayer perceptron with two hidden layers and the following structure:

![](https://raw.githubusercontent.com/ledell/sldm4-h2o/master/mlp_network.png)

- Input-size: *input_size*
- 1st hidden layer: 75
- 2nd hidden layer: 50
- Output layer: *num_classes*

Additionally, we use `ReLU`s as activation functions.

You will need some PyTorch NN modules - Find them in the [PyTorch doc](https://pytorch.org/docs/master/nn.html) (especially nn.Linear)!

In [10]:
from torch.nn import Parameter
import torch.nn.functional as F  # provides some helper functions like Relu's, Sigmoids, Tanh, etc.


class MyMultilayerPerceptron(nn.Module):
    def __init__(self, input_size, num_classes):
        super(MyMultilayerPerceptron, self).__init__()
        
        self.input_size = input_size
        self.num_classes = num_classes
        
        self.linear_1 = nn.Linear(input_size, 75)
        self.linear_2 = nn.Linear(75, 50)
        self.linear_3 = nn.Linear(50, num_classes)
        
    
    def forward(self, x):
        out = F.relu(self.linear_1(x))
        out = F.relu(self.linear_2(out))
        out = self.linear_3(out)
        return out

### Print your network's parameters

In [11]:
model = MyMultilayerPerceptron(784, 10)
print(model)

MyMultilayerPerceptron(
  (linear_1): Linear(in_features=784, out_features=75, bias=True)
  (linear_2): Linear(in_features=75, out_features=50, bias=True)
  (linear_3): Linear(in_features=50, out_features=10, bias=True)
)


### Feed an input to your network

In [12]:
x = torch.rand(16, 784)  # the first dimension is reserved for the 'batch_size'
out = model(x)  # equivalent to model.forward(x)
out[0, :] ## looking at the output of 1 batch

tensor([-0.0186,  0.0303,  0.1133,  0.0693,  0.1682, -0.0290,  0.3097,  0.2018,
        -0.1038, -0.1881], grad_fn=<SelectBackward>)

___

## Training a model

Most of the functions to train a model follow a similar pattern in PyTorch.
In most of the cases in consists of the following steps:
- Loop over data (in batches)
- Forward pass
- Zero gradients!
- Backward pass
- Parameter update (Optimizer)

In [13]:
def train(model, num_epochs, data_loader, device):
    model = model.to(device)
    
    # Define the Loss function and Optimizer that you want to use
    criterion = nn.CrossEntropyLoss()  
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)  # NOTE: model.parameters()
    
    # Outter training loop
    for epoch in range(num_epochs):
        # Inner training loop
        cum_loss = 0
        for (inputs, labels) in data_loader:
            # Prepare inputs and labels for processing by the model (e.g. reshape, move to device, ...)
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            # original shape is [batch_size, 28, 28] because it's an image of size 28x28
            inputs = inputs.view(-1, 28*28)

            # Do Forward -> Loss Computation -> Backward -> Optimization
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            cum_loss += loss.item()
        print("Epoch %d, Loss=%.4f" % (epoch+1, cum_loss/len(train_loader)))

Note:
- we can use the `.to` function on the model directly. Indeed, since PyTorch knows all the model parameters, it can put all the parameters on the correct device.
- we use `model.parameters()` to get all the parameters of the model and we can instantiate an optimizer that will optimize these parameters `torch.optim.SGD(model.parameters())`.
- to apply the forward function of the module, we write `model(input)`. In most cases, `model.forward(inputs)` would also work, but there is a slight difference : PyTorch allows you to register hook functions for a model that are automatically called when you do a forward pass on your model. Using `model(input)` will call these hooks and then call the forward function, while using `model.forward(inputs)` will just silently ignore them.

Do you feel the power of Modules ?

## Loss functions

PyTorch comes with a lot of predefined loss functions :
- L1Loss
- MSELoss
- CrossEntropyLoss
- NLLLoss
- PoissonNLLLoss
- KLDivLoss
- BCELoss
- MarginRankingLoss
- HingeEmbeddingLoss
- MultiLabelMarginLoss
- CosineEmbeddingLoss
- TripletMarginLoss
- ...

Check out the [PyTorch Documentation](https://pytorch.org/docs/master/nn.html#loss-functions).

___

## Let's train our model on the MNIST digit classification task


![MNIST](figures/mnist.jpeg)

First, we have to load the training and test images. MNIST is a widely used dataset, therefore the torchvision package provides simple functionalities to load images from it.

In [14]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms

batch_size = 64

# MNIST Dataset (Images and Labels)
train_dataset = datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor())

# Dataset Loader (Input Batcher)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

### Call the actual training function

In [15]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MyMultilayerPerceptron(input_size=784, num_classes=10)
num_epochs = 5

train(model, num_epochs, train_loader, device)

Epoch 1, Loss=0.3895
Epoch 2, Loss=0.1762
Epoch 3, Loss=0.1273
Epoch 4, Loss=0.0979
Epoch 5, Loss=0.0791


### How can we now assess the model's performance?

This function loops over another `data_loader` (usually containing test/validation data) and computes the model's accuracy on it.

In [16]:
def accuracy(model, data_loader, device):
    with torch.no_grad(): # during model evaluation, we don't need the autograd mechanism (speeds things up)
        correct = 0
        total = 0
        for inputs, labels in data_loader:
            inputs = inputs.to(device)     
            inputs = inputs.view(-1, 28*28)
            
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            
            correct += (predicted.cpu() == labels).sum().item()
            total += labels.size(0)
            
    acc = correct / total
    return acc

In [17]:
accuracy(model, test_loader, device)  # look at: accuracy(model, train_loader, device)

0.9717

### We get an accuracy of ~97.9%, can we do better?

____

## How can we now store our trained model?

In [18]:
torch.save(model, "my_model.pt")

  "type " + obj.__name__ + ". It won't be checked "


In [19]:
my_model_loaded = torch.load("my_model.pt")

In [20]:
model.linear_3.bias, my_model_loaded.linear_3.bias

(Parameter containing:
 tensor([-0.0806, -0.0041,  0.0795, -0.2120, -0.0777,  0.2158,  0.0327,  0.0568,
         -0.1043,  0.0588], requires_grad=True), Parameter containing:
 tensor([-0.0806, -0.0041,  0.0795, -0.2120, -0.0777,  0.2158,  0.0327,  0.0568,
         -0.1043,  0.0588], requires_grad=True))

____

## Don't forget to download the notebook, otherwise your changes may be lost!

![Download the notebook](figures/notebook-download.png)