# Introduction to Pytorch
TP Time : 3 hours

In this TP we will introduce the Pytorch framework. We will see how to define a neural network, how to train it and how to use it to make predictions.
We will implement two types of neural networks: a simple multi-layer perceptron and a convolutional neural network.
Training will be done on the MNIST dataset, which is a dataset of handwritten digits. The goal is to be able to recognize the digit from an image of size 28x28 pixels.

## I - Multi-Layer Perceptron (MLP)

MLP is a class of feedforward artificial neural network (ANN). A MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function.

Lets start by importing the libraries we will need for this lab.

In [None]:
import torch
import torchvision
from torch.utils.data import DataLoader, random_split
import torch.optim as optim
import torch.nn as nn


### 1 - Dataset and DataLoader

Now that we have the tools, let us define a function which allows us to load the MNIST data. For doing so, we need a dataset (`torch.utils.data.Dataset`) and a loader (`torch.utils.data.Dataloader`), allowing us to loop over the dataset. For MNIST PyTorch already contains a dataset definition, which you can find [here](https://pytorch.org/docs/stable/torchvision/datasets.html#mnist). For what concerns the dataloader, default ones can be found [here](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader). We must create train, validation and test set and a loader for each of them.

In [None]:
def get_data(batch_size, test_batch_size=256):
  # This function is needed to convert the PIL images to Tensors
  transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

  # Load data
  mnist_dataset = torchvision.datasets.MNIST(
        root='./data',
        train=True,
        transform=transform,
        download=True
    )
  mnist_test = torchvision.datasets.MNIST(
        root='./data',
        train=False,
        transform=transform,
        download=True
    )


  # Create train and validation splits
  dataset_size = len(mnist_dataset)
  # We will use 80% of the data for training and 20% for validation
  train_size = int(0.8 * dataset_size)
  valid_size = dataset_size - train_size

  mnist_train, mnist_valid = torch.utils.data.random_split(mnist_dataset, [train_size, valid_size])

  # Initialize dataloaders
  train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
  val_loader = DataLoader(mnist_valid, batch_size=batch_size, shuffle=False)
  test_loader = DataLoader(mnist_test, batch_size=test_batch_size, shuffle=False)


  return train_loader, val_loader, test_loader


### 2 - Network definition

Now that we have the data, what we need is a network. For now let us instantiate an MLP :
1. 2 fully-connected layers (input-to-hidden and hidden-to-output).  The fully-connected layers are defined as `torch.nn.Linear`.  
2. Between the layers we must put a non-linear activation. For now let us use a sigmoid (`torch.nn.Sigmoid`).
3. For other layers and activation functions please have a look at the [doc](https://pytorch.org/docs/stable/nn.html).
4. Do not forget that a network must extend a `torch.nn.Module`.

In [None]:
# Our network
class MLP(torch.nn.Module):
  def __init__(self, input_dim, hidden_dim, output_dim):
    super(MLP, self).__init__()
    # TODO: Implement the constructor with the given parameters
    self.fc1 = nn.Linear(input_dim, hidden_dim)
    self.sigmoid = nn.Sigmoid()
    self.fc2 = nn.Linear(hidden_dim, output_dim)

  def forward(self, x):
    # Flatten the input
    x = x.view(x.shape[0],-1)

    # TODO: Implement the forward pass
    x = self.fc1(x)
    x = self.sigmoid(x)
    x = self.fc2(x)

    return x

### 3 - Loss / coast function

To train the network, we obviously need a loss function. The task is classification with multiple classes, thus a proper loss could be a cross-entropy with softmax. We can again use `torch.nn` which contains several losses, among which `torch.nn.CrossEntropyLoss`.

Notice that this loss already contains the softmax activation, thus we do not need to apply the softmax to the output of our network.

In [None]:
def get_cost_function():
  cost_function = nn.CrossEntropyLoss()
  return cost_function

### 4 - Optimizer

Now we must devise a way to update the parameters of our network. This can be easily held out by having a look at [`torch.optim`](https://pytorch.org/docs/stable/optim.html) which contains a large variety of optimizers.

In [None]:
def get_optimizer(model, lr, **kwargs):
  optimizer = optim.SGD(model.parameters(), lr=lr, **kwargs)
  return optimizer

### 5 - Train and test functions

We are ready to merge everything by creating a training and test functions. Both of them must:

1. Loop over the data (exploiting the dataloader, which is just an iterator)
2. Forward the data through the network
3. Comparing the output with the target labels for computing either the loss (train), the accuracy (test) or both.

Additionally, during training we must:

1. Compute the gradient with the backward pass (`loss.backward()`)
2. Using the optimizer to update the weights (`optimizer.step()`)
3. Cleaning the gradient of the weights in order to not accumulating it (`optimizer.zero_grad()`)

With these steps in mind, we are ready to define everything.

In [None]:
import tqdm

def train(model, data_loader, optimizer, cost_function, device='cuda'):
  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.

  # Set the model in train mode
  model.train()

  progress_bar = tqdm.tqdm(data_loader, desc='Training', leave=False)

  # Loop over the dataset using tqdm for progress bar
  # Find a way to plot the loss on the tqdm bar
  for batch_idx, (inputs, targets) in enumerate(progress_bar):

      # Load data into GPU
      inputs = inputs.to(device)
      targets = targets.to(device)

      # Zero the gradients
      optimizer.zero_grad()

      # Forward pass
      outputs = model(inputs)

      # Apply the loss
      loss = cost_function(outputs, targets)

      # Backward pass
      loss.backward()

      # Update parameters
      optimizer.step()

      # Update progress bar
      samples += inputs.size(0)
      cumulative_loss += loss.item()
      _, predicted = outputs.max(1)
      cumulative_accuracy += predicted.eq(targets).sum().item()

      # Update progress bar description with loss
      progress_bar.set_description(f'Training (Loss: {cumulative_loss / samples:.4f}, Acc: {cumulative_accuracy / samples * 100:.2f}%)')

  return cumulative_loss/samples, cumulative_accuracy/samples*100


def test(model, data_loader, cost_function, device='cuda'):
  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.

  #Set the model in eval mode
  model.eval()

  with torch.no_grad(): # torch.no_grad() disables the autograd machinery, thus not saving the intermediate activations
    # Loop over the dataset using tqdm for progress bar
    progress_bar = tqdm.tqdm(data_loader, desc='Testing', leave=False)
    # Find a way to plot the loss on the tqdm bar
    for batch_idx, (inputs, targets) in enumerate(progress_bar):

      # Load data into GPU
      inputs = inputs.to(device)
      targets = targets.to(device)

      # Forward pass
      outputs = model(inputs)

      # Calculate the loss
      loss = cost_function(outputs, targets)

      # Update progress bar
      samples += inputs.size(0)
      cumulative_loss += loss.item()
      _, predicted = outputs.max(1)
      cumulative_accuracy += predicted.eq(targets).sum().item()

      # Update progress bar description with loss
      progress_bar.set_description(f'Testing (Loss: {cumulative_loss / samples:.4f}, Acc: {cumulative_accuracy / samples * 100:.2f}%)')


  return cumulative_loss/samples, cumulative_accuracy/samples*100

### 6 - Trainer

Finally, we need a main trainer function which initializes everything + the needed hyperparameters and loops over multiple epochs (printing the results).

In [None]:
def trainer(batch_size=128, input_dim=28*28, hidden_dim=100, output_dim=10, device='cuda:0', learning_rate=0.01, epochs=10):
  # TODO: Complete this initializations for dataset, model, optimizer and cost function
  train_loader, val_loader, test_loader = get_data(batch_size=batch_size)
  model = MLP(input_dim, hidden_dim, output_dim).to(device)
  optimizer = get_optimizer(model, lr=learning_rate)
  cost_function = get_cost_function()

  for e in range(epochs):
    train_loss, train_accuracy = train(model, train_loader, optimizer, cost_function)
    val_loss, val_accuracy = test(model, val_loader, cost_function)

  print('After training:')
  train_loss, train_accuracy = test(model, train_loader, cost_function)
  val_loss, val_accuracy = test(model, val_loader, cost_function)
  test_loss, test_accuracy = test(model, test_loader, cost_function)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Validation loss {:.5f}, Validation accuracy {:.2f}'.format(val_loss, val_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')

## II - Convolutional Neural Networks

In this section  we will learn how to train a CNN from scratch for classifying MNIST digits.

In [None]:
# import necessary libraries
import torch
import torchvision
from torchvision import transforms as T
import torch.nn.functional as F

### 1 - Define LeNet

Here we are going to define our first CNN which is **LeNet** in this case. To construct a LeNet we will be using some convolutional layers followed by some fully-connected layers. The convolutional layers can be simply defined using `torch.nn.Conv2d` module of `torch.nn` package. Details can be found [here](https://pytorch.org/docs/stable/nn.html#conv2d). Moreover, we will use pooling operation to reduce the size of convolutional feature maps. For this case we are going to use `torch.nn.functional.max_pool2d`. Details about maxpooling can be found [here](https://pytorch.org/docs/stable/nn.html#max-pool2d)

Differently from our previous Lab, we will use a Rectified Linear Units (ReLU) as activation function with the help of `torch.nn.functional.relu`, replacing `torch.nn.Sigmoid`. Details about ReLU can be found [here](https://pytorch.org/docs/stable/nn.html#id26).

In [None]:
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()

        # Define the convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)  # First Convolutional Layer
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)  # Second Convolutional Layer

        # Define the fully connected layers
        self.fc1 = nn.Linear(in_features=16 * 4 * 4, out_features=120)  # Fully Connected Layer 1
        self.fc2 = nn.Linear(in_features=120, out_features=84)  # Fully Connected Layer 2
        self.fc3 = nn.Linear(in_features=84, out_features=10)  # Output Layer

    def forward(self, x):
        # Convolutional Layer 1 with ReLU activation and max pooling
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)

        # Convolutional Layer 2 with ReLU activation and max pooling
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)

        # Flatten the feature maps into a long vector
        x = x.view(x.size(0), -1)

        # Fully Connected Layers with ReLU activations
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))

        # Output Layer
        x = self.fc3(x)

        return x


### 2 - Define cost function

In [None]:
def get_cost_function():
  cost_function = nn.CrossEntropyLoss()
  return cost_function

### 3 - Define the optimizer

In [None]:
def get_optimizer(model, lr, wd, momentum):
  optimizer = optim.SGD(
        model.parameters(),
        lr=lr,
        weight_decay=wd,
        momentum=momentum
    )
  return optimizer

### 4 - Train and test functions

In [None]:
import tqdm

def train(model, data_loader, optimizer, cost_function, device='cuda'):
    samples = 0.
    cumulative_loss = 0.
    cumulative_accuracy = 0.

    # Set the model in train mode
    model.train()

    # Create a tqdm progress bar
    progress_bar = tqdm.tqdm(data_loader, desc='Training', leave=False)

    for batch_idx, (inputs, targets) in enumerate(progress_bar):
        # Load data into GPU
        inputs = inputs.to(device)
        targets = targets.to(device)

        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)

        # Calculate the loss
        loss = cost_function(outputs, targets)

        # Backward pass
        loss.backward()

        # Update parameters
        optimizer.step()

        # Update progress bar
        samples += inputs.size(0)
        cumulative_loss += loss.item()
        _, predicted = outputs.max(1)
        cumulative_accuracy += predicted.eq(targets).sum().item()

        # Update progress bar description with loss
        progress_bar.set_description(f'Training (Loss: {cumulative_loss / samples:.4f}, Acc: {cumulative_accuracy / samples * 100:.2f}%)')

    return cumulative_loss / samples, cumulative_accuracy / samples * 100


def test(model, data_loader, cost_function, device='cuda'):
    samples = 0.
    cumulative_loss = 0.
    cumulative_accuracy = 0.

    # Set the model in evaluation mode
    model.eval()

    with torch.no_grad():
        # Create a tqdm progress bar
        progress_bar = tqdm.tqdm(data_loader, desc='Testing', leave=False)

        for batch_idx, (inputs, targets) in enumerate(progress_bar):
            # Load data into GPU
            inputs = inputs.to(device)
            targets = targets.to(device)

            # Forward pass
            outputs = model(inputs)

            # Calculate the loss
            loss = cost_function(outputs, targets)

            # Update progress bar
            samples += inputs.size(0)
            cumulative_loss += loss.item()
            _, predicted = outputs.max(1)
            cumulative_accuracy += predicted.eq(targets).sum().item()

            # Update progress bar description with loss
            progress_bar.set_description(f'Testing (Loss: {cumulative_loss / samples:.4f}, Acc: {cumulative_accuracy / samples * 100:.2f}%)')

    return cumulative_loss / samples, cumulative_accuracy / samples * 100


### 5 - Dataset and Dataloader

We will learn a new thing in this function as how to Normalize the inputs given to the network.

***Why Normalization is needed***?

To have nice and stable training of the network it is recommended to normalize the network inputs between \[-1, 1\].

***How it can be done***?

This can be simply done using `torchvision.transforms.Normalize()` transform. Details can be found [here](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Normalize).

In [None]:
import torchvision
import torchvision.transforms as T
import torch.utils.data

def get_data(batch_size, test_batch_size=256):
    # Prepare data transformations and then combine them sequentially
    transform = [
        T.ToTensor(),  # Converts Numpy to PyTorch Tensor
        T.Normalize(mean=[0.5], std=[0.5])  # Normalizes the Tensors between [-1, 1]
    ]
    transform = T.Compose(transform)  # Composes the above transformations into one.

    # Load data
    mnist_dataset = torchvision.datasets.MNIST('./data', train=True, transform=transform, download=True)
    mnist_test = torchvision.datasets.MNIST('./data', train=False, transform=transform, download=True)

    # Create train and validation splits, we will take 80% of the training data for training and 20% for validation
    dataset_size = len(mnist_dataset)
    train_size = int(0.8 * dataset_size)
    valid_size = dataset_size - train_size

    mnist_train, mnist_valid = torch.utils.data.random_split(mnist_dataset, [train_size, valid_size])

    # Initialize data loaders
    train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
    val_loader = torch.utils.data.DataLoader(mnist_valid, batch_size=batch_size, shuffle=False)
    test_loader = torch.utils.data.DataLoader(mnist_test, batch_size=test_batch_size, shuffle=False)

    return train_loader, val_loader, test_loader

### 6 - Trainer

Finally, we need a main trainer function which initializes everything + the needed hyperparameters and loops over multiple epochs.

In [None]:
def main(batch_size=128,
         device='cuda:0',
         learning_rate=0.01,
         weight_decay=0.000001,
         momentum=0.9,
         epochs=50):

    # Load the dataset
    train_loader, val_loader, test_loader = get_data(batch_size=batch_size)

    # Define LeNet-5
    model = LeNet().to(device)

    # Define the optimizer
    optimizer = get_optimizer(model, lr=learning_rate, wd=weight_decay, momentum=momentum)

    # Define the cost function
    cost_function = get_cost_function()

    for e in range(epochs):
        train_loss, train_accuracy = train(model, train_loader, optimizer, cost_function, device=device)
        val_loss, val_accuracy = test(model, val_loader, cost_function, device=device)

    print('After training:')
    train_loss, train_accuracy = test(model, train_loader, cost_function, device=device)
    val_loss, val_accuracy = test(model, val_loader, cost_function, device=device)
    test_loss, test_accuracy = test(model, test_loader, cost_function, device=device)

    print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
    print('\t Validation loss {:.5f}, Validation accuracy {:.2f}'.format(val_loss, val_accuracy))
    print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
    print('-----------------------------------------------------')