# Exercise: Training a convolutional neural network on CIFAR-10
---

In this exercise, you shall build a convolutional neural network and play with **batch normlization**, **dropout**, **regularization** and **augmentation** to imporve accuracy on the Cifar-10 dataset.

**Important!**
You will need to add code at locations indicated with "ToDo" only for the program to run. However, feel free to change what you like.

**Note:**
If you want to use the "ML servers" remember to change kernal to python 3.6. (Kernal->Change kernal->python 3.6(Conda)). Do the programming and the initial tests on the CPU, then select a GPU for effective traning. To select a particular GPU, use the keys "cuda"->"device" within the config dictionary.

Software verion:
- Python 3.6
- Pytorch 1.0


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
from torchvision import datasets, transforms

%matplotlib inline
%load_ext autoreload
%autoreload 2
np.random.seed(1)

--
## Step 0: Check CPU, memory and GPU usage on the ML server

If you use one of the ML servers and what to know the resorese being used.

In [None]:
using_ML_sever = True
if using_ML_sever:
    from IPython.display import Image, display
    display(Image('/tmp/gpu.png'))    

### Step 1: Configuration
---
To keep track of important parameters, we use dictionary "config". You should play around with the values.


In [None]:
config = {
          'batch_size':64,          # Training batch size
          'cuda': {'use_cuda':True,  # Use_cuda=True: use GPU
                   'device_idx': 0}, # Select gpu index: 0,1,2,3
          'log_interval':20,         # How often to dislay (batch) loss during training
          'epochs': 50,              # Number of epochs
          'learningRate': 0.001,     # learning rate to the optimizer
          'momentum': 0.9,            # momentum in the SGD optimizer
          'use_augmentation': True,  # Use augmentation
          'weight_decay': 0.0001     # weight_decay value
         }


---
### Step 2: The Cifar-10 dataset

Torchvision is a pytorch package which consists of popular datasets, model architectures, and common image transformations for computer vision. Torchvision includes a "dataloader" for the Cifar-10 dataset which we will use. We will also use torchvision's "transforms" module to perform augmentation and normalization. 


The Cifar-10 dataset have 10 classes: ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'].  

The training set consists of 50,000 images and the test set consists of 10,000 images. The images are of size [3,32,32].



In [None]:
#The output of torchvision datasets are PILImage images of range [0, 1]. 
#We transform them to Tensors of normalized range [-1, 1].
# Data

#train transforms
train_transform_list = []
if config['use_augmentation']:
    train_transform_list.append(transforms.RandomCrop(32, padding=4))
    train_transform_list.append(transforms.RandomHorizontalFlip())
train_transform_list.append(transforms.ToTensor()) 
train_transform_list.append(transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)))
transform_train = transforms.Compose(train_transform_list)

#test transforms
transform_val = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

#Path to the Cifar-10 dataset
dataPath = './data/Cifar-10/'

# Create dataset objects
train_dataset = datasets.CIFAR10(root=dataPath, train=True, download=True, transform=transform_train)
val_dataset   = datasets.CIFAR10(root=dataPath, train=False, download=True, transform=transform_val)

# Create dataLoaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=config['batch_size'], shuffle=True, num_workers=2)
val_loader   = torch.utils.data.DataLoader(val_dataset, batch_size=config['batch_size'], shuffle=False, num_workers=2)

In [None]:
# Visualize some examples from the dataset.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
plt.figure(figsize=(18, 16), dpi=80)
labels = np.array([x[1] for x in val_dataset])
for y, cls in enumerate(classes):              
    idxs = np.flatnonzero(labels == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        img = (val_dataset[idx][0]*0.2 + 0.5)*255
        img = img.permute(1, 2, 0).numpy()
        img = np.minimum(img, 255)
        plt.imshow(img.astype(np.uint8))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()

---
### Step 3: Build the model

The input has shape [batch size, 3,32,32]. Use what we have learnt to build a convultional neural network. Have a look at the useful classes:

- nn.Conv2D
- nn.MaxPool2d
- nn.Linear
- nn.BatchNorm2d
- nn.Dropout2d



Note that the model inherits from "torch.nn.Module", which requires the two class methods "__init__" and "forward". As discussed in the lecture, the former defines the layers used by the model, while the latter defines how the layers are stacked inside the model.


In [None]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        #ToDO

    def forward(self, x):
        #ToDO       

        return y


In [None]:
# Creat an instance of Model
model = Model()
if config['cuda']['use_cuda']:
    model.to(f'cuda:{config["cuda"]["device_idx"]}')

---
### Step 4: Define optimizer and loss function

Instantiate an optimizer, e.g. stochastic gradient descent, from the "torch.optim" module (https://pytorch.org/docs/stable/optim.html) with your model. Remember that we have defined "learning rate" inside the config-dictionary.


In [None]:
# Create an instance of "torch.optim.SGD"

#optimizer = optim.SGD(model.parameters(), lr=config['learningRate'], momentum=config['momentum'])
optimizer = optim.Adam(model.parameters(), lr=config['learningRate'], weight_decay=config['weight_decay'])

---
Here we want to define the loss function (often called criterion). As we are dealing with a classification problem, you should use the softmax cross entropy loss.

Hint, have a look here: (https://pytorch.org/docs/stable/nn.html#torch-nn-functional)


In [None]:
def loss_fn(prediction, labels):
    """Returns softmax cross entropy loss."""
    loss = F.cross_entropy(input=prediction, target=labels)
    return loss

---
### Step 5: Set up the training process and train the model




In [None]:
def run_epoch(model, epoch, data_loader, optimizer, is_training, config):
    """
    Args:
        model        (obj): The neural network model
        epoch        (int): The current epoch
        data_loader  (obj): A pytorch data loader "torch.utils.data.DataLoader"
        optimizer    (obj): A pytorch optimizer "torch.optim"
        is_training (bool): Whether to use train (update) the model/weights or not. 
        config      (dict): Configuration parameters

    Intermediate:
        totalLoss: (float): The accumulated loss from all batches. 
                            Hint: Should be a numpy scalar and not a pytorch scalar

    Returns:
        loss_avg         (float): The average loss of the dataset
        accuracy         (float): The average accuracy of the dataset
        confusion_matrix (float): A 10x10 matrix
    """
    
    if is_training==True: 
        model.train()
    else:
        model.eval()

    total_loss        = 0 
    correct          = 0 
    confusion_matrix = np.zeros(shape=(10,10))
    labels_list      = [0,1,2,3,4,5,6,7,8,9]

    for batch_idx, data_batch in enumerate(data_loader):
        if config['cuda']['use_cuda']:
            images = data_batch[0].to(f'cuda:{config["cuda"]["device_idx"]}') # send data to GPU
            labels = data_batch[1].to(f'cuda:{config["cuda"]["device_idx"]}') # send data to GPU
        else:
            images = data_batch[0]
            labels = data_batch[1]

        if not is_training:
            with torch.no_grad():
                prediction = model.forward(images)
                # Note: It can be beneficial to detach "total_loss" from the graph, consider convert "total_loss" to numpy.
                loss        = loss_fn(prediction, labels)
                total_loss += loss.item()    
            
        elif is_training: 
            prediction = model.forward(images)
            # Note: It can be beneficial to detach "total_loss" from the graph, consider convert "total_loss" to numpy.
            loss        = loss_fn(prediction, labels)
            total_loss += loss.item()  

            # take a gradient update
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        # Compute the correct classification
        predicted_label  = prediction.max(1, keepdim=True)[1][:,0]
        correct          += predicted_label.eq(labels).cpu().sum().numpy()
        confusion_matrix += metrics.confusion_matrix(labels.cpu().numpy(), predicted_label.cpu().numpy(), labels_list)

        # Print statistics
        batchSize = len(labels)
        if batch_idx % config['log_interval'] == 0:
            print(f'Epoch={epoch} | {batch_idx/len(data_loader)*100:.2f}% | loss = {loss/batchSize:.5f}')

    loss_avg         = total_loss / len(data_loader)
    accuracy         = correct / len(data_loader.dataset)
    confusion_matrix = confusion_matrix / len(data_loader.dataset)

    return loss_avg, accuracy, confusion_matrix


---
Here is where the action takes place!

In [None]:
# training the model
train_loss = np.zeros(shape=config['epochs'])
train_acc  = np.zeros(shape=config['epochs'])
val_loss   = np.zeros(shape=config['epochs'])
val_acc    = np.zeros(shape=config['epochs'])
val_confusion_matrix   = np.zeros(shape=(10,10,config['epochs']))
train_confusion_matrix = np.zeros(shape=(10,10,config['epochs']))

for epoch in range(config['epochs']):
    train_loss[epoch], train_acc[epoch], train_confusion_matrix[:,:,epoch] = \
                               run_epoch(model, epoch, train_loader, optimizer, is_training=True, config=config)

    val_loss[epoch], val_acc[epoch], val_confusion_matrix[:,:,epoch]     = \
                               run_epoch(model, epoch, val_loader, optimizer, is_training=False, config=config)

---
### Step 6. Show results
Plot the loss and the accuracy as a function of epochs to monitor the training.


In [None]:
# Plot the training accuracy and the training loss
#plt.figure()
plt.figure(figsize=(18, 16), dpi= 80, facecolor='w', edgecolor='k')
ax = plt.subplot(2, 1, 1)
# plt.subplots_adjust(hspace=2)
ax.plot(train_loss, 'b', label='train loss')
ax.plot(val_loss, 'r', label='validation loss')
ax.grid()
plt.ylabel('Loss', fontsize=18)
plt.xlabel('Epochs', fontsize=18)
ax.legend(loc='upper right', fontsize=16)

ax = plt.subplot(2, 1, 2)
plt.subplots_adjust(hspace=0.4)
ax.plot(train_acc, 'b', label='train_acc')
ax.plot(val_acc, 'r', label='validation accuracy')
ax.grid()
plt.ylabel('Accuracy', fontsize=18)
plt.xlabel('Iterations', fontsize=18)
val_acc_max = np.max(val_acc)
val_acc_max_ind = np.argmax(val_acc)
plt.axvline(x=val_acc_max_ind, color='g', linestyle='--', label='Highest validation accuracy')
plt.title('Highest validation accuracy = %0.1f %%' % (val_acc_max*100), fontsize=16)
ax.legend(loc='lower right', fontsize=16)
plt.ion()

---
Let us study the accuracy per class on the validation dataset. We use the result from the epoch with highest validation accuracy.


In [None]:
ind = np.argmax(val_acc)
class_accuracy = val_confusion_matrix[:,:,ind]
for ii in range(len(classes)):
    acc = val_confusion_matrix[ii,ii,ind] / np.sum(val_confusion_matrix[ii,:,ind])
    print(f'Accuracy of {str(classes[ii]).ljust(15)}: {acc*100:.01f}%')

---
In order to see how the network learns to distinguish the different classes as the training progresses we can plot the confusion matrices as heatmaps. 

In [None]:
from mpl_toolkits.axes_grid1 import make_axes_locatable

epoch_step                  = 2    
set_colorbar_max_percentage = 10 
    
# Plot confusion matrices
ticks = np.linspace(0,9,10)
gridspec_kwargs = dict(top=0.9, bottom=0.1, left=0.0, right=0.9, wspace=0.5, hspace=0.2)
for i in range(0, config['epochs'], epoch_step):
    f, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 16), gridspec_kw=gridspec_kwargs)
    im = ax1.imshow(val_confusion_matrix[:, :, i]*100)
    ax1.set_title(f'Validation: Epoch #{i}', fontsize=18)
    ax1.set_xticks(ticks=ticks)
    ax1.set_yticks(ticks=ticks)
    ax1.set_yticklabels(classes)
    im.set_clim(0.0, set_colorbar_max_percentage)
    ax1.set_xticklabels(classes, rotation=45)
    ax1.set_ylabel('Prediction', fontsize=16)
    ax1.set_xlabel('Groundtruth', fontsize=16)
    divider = make_axes_locatable(ax1)
    cax     = divider.append_axes('right', size='5%', pad=0.15)
    f.colorbar(im, cax=cax, orientation='vertical')
    
    im = ax2.imshow(train_confusion_matrix[:, :, i]*100)
    ax2.set_title(f'Train: Epoch #{i}', fontsize=18)
    ax2.set_xticks(ticks=ticks)
    ax2.set_yticks(ticks=ticks)
    ax2.set_yticklabels(classes)
    im.set_clim(0.0, set_colorbar_max_percentage)
    ax2.set_xticklabels(classes, rotation=45)
    ax2.set_ylabel('Prediction', fontsize=16)
    ax2.set_xlabel('Ground truth', fontsize=16)
    divider = make_axes_locatable(ax2)
    cax     = divider.append_axes('right', size='5%', pad=0.15)
    f.colorbar(im, cax=cax, orientation='vertical')    