<!--NAVIGATION-->
# < [Regression](3-Regression_Gradient_Descent.ipynb) | MLP for Digit Recog | [CNN on CIFAR10](5-CNN-CIFAR.ipynb) >

# PyTorch Modules

### What is this notebook about?

In this notebook, we will learning about PyTorch modules and the great functionalities they provide. Later on, we'll create a small a multilayer perceptron to perform image classification on MNIST.

___

In [2]:
import torch
import torch.nn as nn

print("Torch version:", torch.__version__)

Torch version: 1.5.1


In [3]:
import matplotlib.pyplot as plt

In PyTorch, there are many predefined layer like Convolutions, RNN, Pooling, Linear, etc.

These functions are wrapped in **modules** and inherit from the **torch.nn.Module** base class.

When designing a custom model in PyTorch, you should follow this strategy and derive your class from **torch.nn.Module**.

For more information about PyTorch Modules see:
<!--NAVIGATION-->
## [Modules](App-PyTorch-Modules.ipynb)

___

## Simple network for image classification

### Let's create a Multi Layer Perceptron Network (MLP)

Implement a simple multilayer perceptron with two hidden layers and the following structure:

![](https://raw.githubusercontent.com/ledell/sldm4-h2o/master/mlp_network.png)

- Input-size: *input_size*
- 1st hidden layer: 75
- 2nd hidden layer: 50
- Output layer: *num_classes*

Additionally, we use `ReLU`s as activation functions.

You will need some PyTorch NN modules - Find them in the [PyTorch doc](https://pytorch.org/docs/master/nn.html) (especially nn.Linear)!

In [16]:
from torch.nn import Parameter
import torch.nn.functional as F  # provides some helper functions like Relu's, Sigmoids, Tanh, etc.


class MyMultilayerPerceptron(nn.Module):
    def __init__(self, input_size, h1_dim, h2_dim, num_classes):
        super(MyMultilayerPerceptron, self).__init__()
        
        self.input_size = input_size
        self.num_classes = num_classes
        
        self.linear_1 = nn.Linear(input_size, 75)
        self.linear_2 = nn.Linear(h1_dim, h2_dim)
        self.linear_3 = nn.Linear(50, num_classes)
        
    
    def forward(self, x):
        x = F.relu(self.linear_1(x))
        x = F.relu(self.linear_2(x))
        x = F.relu(self.linear_3(x))
        return F.softmax(x)

### Print your network's parameters

In [17]:
model = MyMultilayerPerceptron(784, 75, 50, 10)
print(model)

MyMultilayerPerceptron(
  (linear_1): Linear(in_features=784, out_features=75, bias=True)
  (linear_2): Linear(in_features=75, out_features=50, bias=True)
  (linear_3): Linear(in_features=50, out_features=10, bias=True)
)


### Feed an input to your network

In [18]:
x = torch.rand(16, 784)  # the first dimension is reserved for the 'batch_size'
out = model(x)  # equivalent to model.forward(x)
out[0, :]



tensor([0.0957, 0.0957, 0.1084, 0.0968, 0.1146, 0.0992, 0.0957, 0.0981, 0.0974,
        0.0985], grad_fn=<SliceBackward>)

___

## Training a model

Most of the functions to train a model follow a similar pattern in PyTorch.
In most of the cases in consists of the following steps:
- Loop over data (in batches)
- Forward pass
- Zero gradients!
- Backward pass
- Parameter update (Optimizer)

In [19]:
def train(model, num_epochs, data_loader, device):
    model = model.to(device)
    
    # Define the Loss function and Optimizer that you want to use
    criterion = nn.CrossEntropyLoss()  
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)  # NOTE: model.parameters()
    
    # Outter training loop
    for epoch in range(num_epochs):
        # Inner training loop
        cum_loss = 0
        for (inputs, labels) in data_loader:
            # Prepare inputs and labels for processing by the model (e.g. reshape, move to device, ...)
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            # original shape is [batch_size, 28, 28] because it's an image of size 28x28
            inputs = inputs.view(-1, 28*28)

            # Do Forward -> Loss Computation -> Backward -> Optimization
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            cum_loss += loss.item()
        print("Epoch %d, Loss=%.4f" % (epoch+1, cum_loss/len(train_loader)))

Note:
- we can use the `.to` function on the model directly. Indeed, since PyTorch knows all the model parameters, it can put all the parameters on the correct device.
- we use `model.parameters()` to get all the parameters of the model and we can instantiate an optimizer that will optimize these parameters `torch.optim.SGD(model.parameters())`.
- to apply the forward function of the module, we write `model(input)`. In most cases, `model.forward(inputs)` would also work, but there is a slight difference : PyTorch allows you to register hook functions for a model that are automatically called when you do a forward pass on your model. Using `model(input)` will call these hooks and then call the forward function, while using `model.forward(inputs)` will just silently ignore them.

Do you feel the power of Modules ?

## Loss functions

PyTorch comes with a lot of predefined loss functions :
- L1Loss
- MSELoss
- CrossEntropyLoss
- NLLLoss
- PoissonNLLLoss
- KLDivLoss
- BCELoss
- MarginRankingLoss
- HingeEmbeddingLoss
- MultiLabelMarginLoss
- CosineEmbeddingLoss
- TripletMarginLoss
- ...

Check out the [PyTorch Documentation](https://pytorch.org/docs/master/nn.html#loss-functions).

___

## Let's train our model on the MNIST digit classification task


![MNIST](figures/mnist.jpeg)

First, we have to load the training and test images. MNIST is a widely used dataset, therefore the torchvision package provides simple functionalities to load images from it.

In [20]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms

batch_size = 64

# MNIST Dataset (Images and Labels)
train_dataset = datasets.MNIST(root='../data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='../data', train=False, transform=transforms.ToTensor())

# Dataset Loader (Input Batcher)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100.1%

Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


113.5%

Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100.4%

Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz




Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw
Processing...
Done!


In PyTorch, `Dataset` and `Dataloaders` are classes that can help to quickly define how to access and iterate over your data. This is specially interesting when your data is distributed over several files (for instance, if you have several images in some directory structure).

In [None]:
# Number of parameters, without bias
784*75+75*50+50*10

### Call the actual training function

In [28]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MyMultilayerPerceptron(input_size=784, h1_dim=75, h2_dim=50, num_classes=10)
num_epochs = 50

train(model, num_epochs, train_loader, device)



Epoch 1, Loss=1.7563
Epoch 2, Loss=1.6533
Epoch 3, Loss=1.6112
Epoch 4, Loss=1.5978
Epoch 5, Loss=1.5831
Epoch 6, Loss=1.5770
Epoch 7, Loss=1.5734
Epoch 8, Loss=1.5709
Epoch 9, Loss=1.5683
Epoch 10, Loss=1.5664
Epoch 11, Loss=1.5643
Epoch 12, Loss=1.5633
Epoch 13, Loss=1.5620
Epoch 14, Loss=1.5612
Epoch 15, Loss=1.5605
Epoch 16, Loss=1.5591
Epoch 17, Loss=1.5589
Epoch 18, Loss=1.5584
Epoch 19, Loss=1.5581
Epoch 20, Loss=1.5569
Epoch 21, Loss=1.5567
Epoch 22, Loss=1.5563
Epoch 23, Loss=1.5564
Epoch 24, Loss=1.5564
Epoch 25, Loss=1.5553
Epoch 26, Loss=1.5554
Epoch 27, Loss=1.5553
Epoch 28, Loss=1.5550
Epoch 29, Loss=1.5542
Epoch 30, Loss=1.5543
Epoch 31, Loss=1.5538
Epoch 32, Loss=1.5541
Epoch 33, Loss=1.5534
Epoch 34, Loss=1.5541
Epoch 35, Loss=1.5532
Epoch 36, Loss=1.5530
Epoch 37, Loss=1.5534
Epoch 38, Loss=1.5536
Epoch 39, Loss=1.5530
Epoch 40, Loss=1.5527
Epoch 41, Loss=1.5523
Epoch 42, Loss=1.5527
Epoch 43, Loss=1.5522
Epoch 44, Loss=1.5527
Epoch 45, Loss=1.5526
Epoch 46, Loss=1.55

### How can we now assess the model's performance?

This function loops over another `data_loader` (usually containing test/validation data) and computes the model's accuracy on it.

In [29]:
def accuracy(model, data_loader, device):
    with torch.no_grad(): # during model evaluation, we don't need the autograd mechanism (speeds things up)
        correct = 0
        total = 0
        for inputs, labels in data_loader:
            inputs = inputs.to(device)     
            inputs = inputs.view(-1, 28*28)
            
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            
            correct += (predicted.cpu() == labels).sum().item()
            total += labels.size(0)
            
    acc = correct / total
    return acc

In [30]:
accuracy(model, test_loader, device)  # look at: accuracy(model, train_loader, device)



0.9738

### We get an accuracy of ~97%, can we do better?

____

## How can we now store our trained model?

In [31]:
# torch.save(model, "my_model.pt")

In [32]:
# my_model_loaded = torch.load("my_model.pt")

In [33]:
# model.linear_3.bias, my_model_loaded.linear_3.bias

____

<!--NAVIGATION-->
# < [Regression](3-Regression_Gradient_Descent.ipynb) | MLP for Digit Recog | [CNN on CIFAR10](5-CNN-CIFAR.ipynb) >