<a href="https://colab.research.google.com/github/kreshuklab/teaching-dl-course-2019/blob/master/Webinars/exercise2/example_mnist_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is an example Notebook of training a neural network on [MNIST](https://en.wikipedia.org/wiki/MNIST_database) with pytorch.

Adapted from the Keras example of Pejman Rasti and the pytorch example of Constantin Pape.

In [0]:
# to show images directly in the notebook
%matplotlib inline
import numpy as np    # scientific computing 
import matplotlib.pyplot as plt   # plotting and visualisation
# import torch and its libraries
import torch  # main torch library for tensor functionality
import torch.nn as nn  # layers, activation functions, etc
import torch.nn.functional as F  # functions
import torch.optim as optim  # optimizers
from torchvision import datasets, transforms  # standard datasets, transformations
from torch.utils.tensorboard import SummaryWriter # keeping track of training
# to display tensorboard directly in the notebook
%load_ext tensorboard

In [0]:
# load (downloaded if needed) the MNIST dataset
mnist_train = datasets.MNIST('./mnist_data', train=True, download=True)

In [0]:
# plot 3 images as gray scale
f, axarr = plt.subplots(1, 3)   # three images in a row
axarr[0].imshow(np.asarray(mnist_train[0][0]), cmap='gray')
axarr[1].imshow(np.asarray(mnist_train[1][0]), cmap='gray')
axarr[2].imshow(np.asarray(mnist_train[2][0]), cmap='gray')
_ = [ax.axis('off') for ax in axarr]   # remove the axis ticks
# show the plot
plt.show()


Now we apply some transformations to the dataset.
First, we transform data to a `torch.tensor` with `ToTensor` and 
then normalize the data to have (roughly) zero mean and unit standard deviation with `Normalize`.
The second step is a common data normalization in deep learning.

In more advanced training pipelines we would normally also apply data augmentation, that we looked through in the first exercise (e.g., rotation, translation, rescaling, affine transformation, etc.)

Afterwards we create a dataloader - a class that would fetch a `batch_size` number of images and labels (randomly, if `shuffle=True`) and apply given transformations.

In [0]:
# the values here represent the mean value (0.1307) and standard deviation (0.3081) of the whole dataset
trafos = transforms.Compose([transforms.ToTensor(),
                             transforms.Normalize((0.1307,), (0.3081,))])

# the train loader that fetches data from the 60.000 mnist training examples
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./mnist_data', train=True, download=True, transform=trafos),
    batch_size=100, shuffle=True
)

# the validation loader that fetches data from the 10.000 mnist validation examples
val_loader =  torch.utils.data.DataLoader(
    datasets.MNIST('./mnist_data', train=False, transform=trafos),
    batch_size=100, shuffle=True
)



Now, we define the main functions that will perform the training and validation loops. Both functions are given the model (= network) and a loader that provides data (input and target). The train function iterates over the training batches and performs gradient descent. For this purpose, it also needs an optimizer, which performs some flavor of (stochastic) gradient descent. The validation function iterates over the validation batches and measures validation loss and accuracy.

In general, torch code reads similar to numpy code; instead of np.ndarray the main datastructure is torch.tensor. For debugging, we can simply use prints in the code or set arbitrary breakpoints.

Additionally, to log our results in a nice way we would use a tool called [tensorboard](https://www.tensorflow.org/tensorboard). It is developed by TensorFlow, but can be integrated with PyTorch as well. There is a tutorial on what can be visualised with TensorBoard in PyTorch [here](https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html). For now we are just interested in tracking our metrics (scalars).

In [0]:
def train(model, loader, optimizer, epoch, log_interval=100, input_is_1d=False, tb_logger=None):
    # set model to train mode
    model.train()
    for batch_id, (x, y) in enumerate(loader):
        if input_is_1d:
          # if we have a fully connected network
          # the input has to be reshaped to be 1d instead of 2d
          x = x.view(-1, 784)

        # set the gradients to zero, to start with "clean" gradients
        # in this training iteration
        optimizer.zero_grad()
        
        # apply the model and get our prediction
        prediction = model(x)
        
        # calculate the loss (negative log likelihood loss)
        loss = F.nll_loss(prediction, y)
        # calculate the gradients (`loss.backward()`) and apply them (`optimizer.ste()`)
        loss.backward()
        optimizer.step()
        
        # logging
        if batch_id % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                  epoch, batch_id * len(x),
                  len(loader.dataset),
                  100. * batch_id / len(loader), loss.item()))
        
        # if we have a valid tb summary writer, we also log the loss there
        if tb_logger is not None:
            tb_logger.add_scalar(tag='train_loss', # the name of the scalar to log
                                 scalar_value=loss.item(), # the value of the scalar at this step
                                 # the step number is (epoch number * dataset size + iteration count)
                                 global_step=epoch * len(loader) + batch_id)

In [0]:
def validate(model, loader, step=None, input_is_1d=False, tb_logger=None):
    # set model to evaluation mode
    model.eval()
    test_loss = 0
    correct = 0
    # we don't need gradients during the validation, so we
    # disable them via `with torch.no_grad()` to save compute
    with torch.no_grad():
        for x, y in loader:
            if input_is_1d:
              x = x.view(-1, 784)
            prediction = model(x)
            
            # during validation, we sum up the loss of all batches
            test_loss += F.nll_loss(prediction, y, reduction='sum').item()
            
            # we also compute the accuracy. To this end, we compute the
            # predictions with highest likelihood and compare with the actual 
            # target classes
            class_pred = prediction.max(1, keepdim=True)[1]
            correct += sum(class_pred == y.view_as(class_pred)).item()
    # log validation results
    test_loss /= len(loader.dataset)
    accuracy = 100 * correct / len(loader.dataset)

    print('\nValidate: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
          test_loss, correct, len(loader.dataset), accuracy))
    
    if tb_logger is not None:
        assert step is not None, "Need to know the current step to log validation results"
        tb_logger.add_scalar(tag='val_loss', scalar_value=test_loss, global_step=step)
        tb_logger.add_scalar(tag='val_accuracy', scalar_value=accuracy, global_step=step)

Now we will create a [a summary writer](https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html#tensorboard-setup) that will write down all the tensorboard logs to a specified folder, and call the tensorboard from inside the Notebook.


In [0]:
writer = SummaryWriter('runs/Softmax_model')
%tensorboard --logdir runs

## Linear Softmax Model

Next, we define the model / network that we are going to train. Here, we define a single fully connected layer followed by a softmax. We can concatenate the individual layers by `nn.Sequential`, which pipes the output of the previous to the next layer. A fully connected layer is instantiated via `nn.Linear`, which takes the number of input / output channels as first / second argument. In the end, we apply a softmax to output class probabilities.

Later, we will see how to define more complex models.


In [0]:
model = nn.Sequential(nn.Linear(784, 10),
                      nn.LogSoftmax(dim=1))

# we use the Adam optimizer that performs a version of stochastic gradient descent
# to update the model parameters during training
optimizer = optim.Adam(model.parameters())

And now we come to the actual training.

In [0]:
# we train our model for 5 epochs
# 1 epoch consists of iterating once over the training set
# and running validation on the complete validation set
n_epochs = 10
for epoch in range(n_epochs):
    train(model, train_loader, optimizer, epoch, input_is_1d=True, tb_logger=writer)
    # in addition, we need to keep track of the current step for the
    # validation tensorboard logs
    step = epoch * len(train_loader.dataset)
    validate(model, val_loader, step=step, input_is_1d=True, tb_logger=writer)

## Fully Connected Softmax Network

In this example, we extend the previous model and train fully connected network with 5 hidden layers. After each layer we use a sigmoid as non-linear activation functions.

In [0]:
model = nn.Sequential(nn.Linear(784, 200),
                      nn.Sigmoid(),
                      nn.Linear(200, 100),
                      nn.Sigmoid(),
                      nn.Linear(100, 60),
                      nn.Sigmoid(),
                      nn.Linear(60, 30),
                      nn.Sigmoid(),
                      nn.Linear(30, 10),
                      nn.LogSoftmax(dim=1))

# we can reuse the same tensorboard session, but we neeb to create a new writer
writer = SummaryWriter('runs/FC_Softmax_model')

optimizer = optim.Adam(model.parameters())

# train the model for 10 epochs
n_epochs = 10
for epoch in range(n_epochs):
    train(model, train_loader, optimizer, epoch,
          input_is_1d=True, tb_logger=writer)
    step = epoch * len(train_loader.dataset)
    validate(model, val_loader, step=step, input_is_1d=True,
             tb_logger=writer)



## Fully Connected Relu Softmax

In this example, we extend the FC_SoftMax Model and use rectified linear units (ReLU) instead of sigmoids as activation functions. ReLUs are simpler activations that return 0 for negative values and the identity for positive values. In practice, this activation yields better performance than sigmoids.

In [0]:
# use nn.ReLu instead of nn.Sigmoid
model = nn.Sequential(nn.Linear(784, 200),
                      nn.ReLU(),
                      nn.Linear(200, 100),
                      nn.ReLU(),
                      nn.Linear(100, 60),
                      nn.ReLU(),
                      nn.Linear(60, 30),
                      nn.ReLU(),
                      nn.Linear(30, 10),
                      nn.LogSoftmax(dim=1))

writer = SummaryWriter('runs/FC_Relu_Softmax_model')

optimizer = optim.Adam(model.parameters())

# train the model for 10 epochs
n_epochs = 10
for epoch in range(n_epochs):
    train(model, train_loader, optimizer, epoch,
          input_is_1d=True, tb_logger=writer)
    step = epoch * len(train_loader.dataset)
    validate(model, val_loader, step=step, input_is_1d=True,
             tb_logger=writer)


## FC_Relu_Dropout_SoftMax

In this example, we extend FC_ReLU_SoftMax Model and use [dropout](https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/).

In [0]:
# the probability to drop neurons with dropout
p_drop = .2
# use nn.Dropout after relu activations
model = nn.Sequential(nn.Linear(784, 200),
                      nn.ReLU(),
                      nn.Dropout(p=p_drop),
                      nn.Linear(200, 100),
                      nn.ReLU(),
                      nn.Dropout(p=p_drop),
                      nn.Linear(100, 60),
                      nn.ReLU(),
                      nn.Dropout(p=p_drop),
                      nn.Linear(60, 30),
                      nn.ReLU(),
                      nn.Dropout(p=p_drop),
                      nn.Linear(30, 10),
                      nn.LogSoftmax(dim=1))

writer = SummaryWriter('runs/FC_Relu_Dropout_Softmax_model')

optimizer = optim.Adam(model.parameters())

# train the model for 10 epochs
n_epochs = 10
for epoch in range(n_epochs):
    train(model, train_loader, optimizer, epoch,
          input_is_1d=True, tb_logger=writer)
    step = epoch * len(train_loader.dataset)
    validate(model, val_loader, step=step, input_is_1d=True,
             tb_logger=writer)



## CNN Relu Dropout SoftMax

In this example, we will use a [Convolutional](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1) Network instead of a fully connected network.

For more complex models, we stop using nn.Sequential. Instead, we implement models via classes that inherit from nn.Module and implement a forward method, in which the sub-modules are applied.



In [0]:
class ConvNet(nn.Module):
    def __init__(self):
        # we need to call the super constructor before adding any
        # nn.modules as members
        super().__init__()
        # add the convolutional block
        self.convs = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3),
                                   nn.ReLU(),
                                   nn.MaxPool2d(2),
                                   nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3),
                                   nn.ReLU(),
                                   nn.Conv2d(in_channels=8, out_channels=12, kernel_size=3))
        # add the fully connected block
        self.fc = nn.Sequential(nn.Linear(972, 200),
                                nn.ReLU(),
                                nn.Dropout(0.2),
                                nn.Linear(200, 10),
                                nn.LogSoftmax(dim=1))

    def forward(self, input):
        x = self.convs(input)
        # convs returns outpu of size (batches, 12, 9, 9)
        # to feed it into the fully conneted layer, we need to reshape
        # to 1d input (batchess, 12 * 9 * 9 = 972)
        x = x.view(-1, 972)
        return self.fc(x)


model = ConvNet()

writer = SummaryWriter('runs/CNN_Relu_Dropout_Softmax_model')

optimizer = optim.Adam(model.parameters())

# train otheur model for 10 epochs
n_epochs = 10
for epoch in range(n_epochs):
    train(model, train_loader, optimizer, epoch,
          tb_logger=writer)
    step = epoch * len(train_loader.dataset)
    validate(model, val_loader, step=step,
             tb_logger=writer)

