**Important Note:** 
- You should have first uploaded the **entire** folder for Problem Sheet 2 to your Google Drive, and open this file with Google Colab through your drive. (You are advised against opening this file directly in Google Colab.)
- Remember to use GPU in your runtime

# Question 2: Classifying images in the CIFAR10 dataset

We would like to now classify some images from the CIFAR10dataset, with the data in the form $(x_i, y_i)$ with $x_i \in \mathbb{R}^{d\times d}$ ($d=32$) being the images that corresponds to one of the ten objects $y_i \in \mathbb{R}^K$ ($K=10$). Here we will implement a deep convolutional neural network with the architecture specified on page 81-82 of slides for lectures 4-6.

Once again, we have setup the environments for you. This is done by running the following cell. Remember <font color=red>**always**</font> run the code cell below each time before you attempt this question.

In [None]:
# import necessary package
import numpy as np
import time
import copy
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
from random import randint

**Part (1).** Run the following cell to import our CIFAR10 dataset. Note that the first line of code provides a recipe of augmenting the training dataset.
- Briefly describe what kind of data augmentation we have performed on the training dataset. (*Hint.* Look at the `PyTorch` documentation)
- Briefly explain why we want to augment the data.

*You may edit this cell to include your response.*

In [None]:
# Defining the data augmentation transformations for the training set
transformations = transforms.Compose([transforms.RandomHorizontalFlip(p=0.5),
                                      transforms.RandomVerticalFlip(p=0.5), 
                                      transforms.ToTensor(),
                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Batch size
batch_size = 100

train = torchvision.datasets.CIFAR10(root='./CIFARdata', train=True,
                                        download=True, transform=transformations) #data augmentation transformations
data_train = torch.utils.data.DataLoader(train, batch_size=batch_size, #Batch size = 100
                                          shuffle=True, num_workers=0)

#Transforming data and normalizing for the test set
transform_testset = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

test = torchvision.datasets.CIFAR10(root='./CIFARdata', train=False,
                                       download=True, transform=transform_testset)
data_test = torch.utils.data.DataLoader(test, batch_size=batch_size,
                                         shuffle=False, num_workers=0)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./CIFARdata/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./CIFARdata/cifar-10-python.tar.gz to ./CIFARdata
Files already downloaded and verified


**Part (2).** Edit the following cell to define our CNN architecture. You are more than free to choose the parameters for dropout.

In [None]:
class CIFAR10Model(nn.Module):
  def __init__(self, num_outputs=10):
    super(CIFAR10Model, self).__init__() #Specifying the model parameters
    # input is 3x32x32
    "your architecture here"

  def forward(self, x): #Specifying the NN architecture
    "your answer here"
    return x

**Part (3).** Fill in the following code to train our model. (*Hint.* Again, remember to use the `.cuda()` command at appropriate places to speed up your calculations.)

*Hint.* To achieve a decent accuracy we will need to set the `num_epoch` to be sufficiently high. You may wish to first test the code for a smaller number of epochs, before performing the actual experiment with higher number of epochs.

In [None]:
model = CIFAR10Model(num_outputs=10)
# model.load_state_dict(torch.load('params_cifar10_dcnn_LR001.ckpt')) #To load a saved model
# print ('\nModel Architecture is:\n', model)

model.cuda()                      # Sending the model to the GPU
loss_func = nn.CrossEntropyLoss() # Cross entropy loss function
model.train()
LR = 0.001 #Learning rate
train_accuracy = []
num_epoch = 250

# Train Mode
# Create the batch -> Zero the gradients -> Forward Propagation -> Calculating the loss
# -> Backpropagation -> Optimizer updating the parameters -> Prediction

start_time = time.time()
for epoch in range(num_epoch):  # loop over the dataset multiple times

    # Defining the learning rate based on the no. of epochs
    if (epoch > 50):
        LR = 0.001
    if (epoch > 100):
        LR = 0.0001
    if (epoch > 200):
        LR = 0.00001

    # This time we use the Adam optimiser here.
    optimizer = "your answer here"

    running_loss = 0.0
    for i, batch in tqdm(enumerate(data_train, 0)):
        data, target = batch
        data, target = Variable(data).cuda(), Variable(target).cuda()

        # Implement the following routine:
        # - reset the gradient to zero, 
        # - perform forward propagation, 
        # - compute the loss and its gradient, and
        # - step the optimiser
        "your answer here"

        # Compute the prediction and accuracy here (refer to the code in question 1 part b), and 
        # append the accuracy in the train_accuracy list
        "your answer here"

    # we will then compute the average of the train_accuracy list to obtain the overall accuracy for the epoch.
    accuracy_epoch = np.mean(train_accuracy)
    print('\nIn epoch ', epoch,' the accuracy of the training set =', accuracy_epoch)

end_time = time.time()
# torch.save(model.state_dict(), 'params_cifar10_dcnn_LR001.ckpt') # To save the trained model

500it [00:38, 13.04it/s]



In epoch  0  the accuracy of the training set = 34.802


500it [00:38, 13.16it/s]



In epoch  1  the accuracy of the training set = 40.649


500it [00:37, 13.32it/s]



In epoch  2  the accuracy of the training set = 44.95133333333333


500it [00:37, 13.35it/s]



In epoch  3  the accuracy of the training set = 48.3905


500it [00:40, 12.39it/s]



In epoch  4  the accuracy of the training set = 51.2012


500it [00:41, 11.94it/s]



In epoch  5  the accuracy of the training set = 53.548


500it [00:41, 11.93it/s]



In epoch  6  the accuracy of the training set = 55.44914285714286


500it [00:38, 13.11it/s]



In epoch  7  the accuracy of the training set = 57.0665


500it [00:37, 13.18it/s]



In epoch  8  the accuracy of the training set = 58.507333333333335


500it [00:38, 13.12it/s]



In epoch  9  the accuracy of the training set = 59.779


500it [00:37, 13.22it/s]



In epoch  10  the accuracy of the training set = 60.912727272727274


500it [00:37, 13.24it/s]



In epoch  11  the accuracy of the training set = 61.94616666666667


500it [00:37, 13.31it/s]



In epoch  12  the accuracy of the training set = 62.90692307692308


500it [00:37, 13.40it/s]



In epoch  13  the accuracy of the training set = 63.77314285714286


500it [00:37, 13.42it/s]



In epoch  14  the accuracy of the training set = 64.57173333333333


500it [00:37, 13.41it/s]



In epoch  15  the accuracy of the training set = 65.295375


500it [00:37, 13.24it/s]



In epoch  16  the accuracy of the training set = 65.97164705882354


500it [00:37, 13.47it/s]



In epoch  17  the accuracy of the training set = 66.615


500it [00:37, 13.45it/s]



In epoch  18  the accuracy of the training set = 67.20242105263158


500it [00:37, 13.45it/s]



In epoch  19  the accuracy of the training set = 67.7647


500it [00:37, 13.45it/s]



In epoch  20  the accuracy of the training set = 68.28038095238095


500it [00:37, 13.31it/s]



In epoch  21  the accuracy of the training set = 68.76509090909092


500it [00:37, 13.45it/s]



In epoch  22  the accuracy of the training set = 69.23173913043479


500it [00:37, 13.43it/s]



In epoch  23  the accuracy of the training set = 69.67275


30it [00:02, 13.44it/s]

**Part (4).** Fill in the following code to test our trained model.

In [None]:
# Calculate accuracy of trained model on the Test Set
# Create the batch ->  Forward Propagation -> Prediction

correct = 0
total = 0
test_accuracy = []
model.eval()

for batch in data_test:
  data, target = batch

  # compute the test accuracy by following the codes in question 1
  "your answer here"

print('\nAccuracy on the test set = ', accuracy_test)

Final test accuracy: 97.65%




**Part (5).** Run the following cell the plot some images, as well as their predicted and true labels. Comment on which kind of images is often misclassified.

In [None]:
X_train = torchvision.datasets.CIFAR10(root='./CIFARdata', train=True, download=False)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog','frog', 'horse', 'ship', 'truck')

# Predictions
normalize = transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])

# setting the figure parameters
num_row = 4
num_col = 4 # to get 4 * 4 = 16 images together
imageId = np.random.randint(0, len(X_train), num_row * num_col)
fig, axes = plt.subplots(num_row, num_col)
for i in range(0, num_row):
    for j in range(0, num_col):
        k = (i*num_col)+j
        img, target = X_train[imageId[k]]
        pred = torch.argmax(model(normalize(transforms.ToTensor()(img)).unsqueeze(dim=0)))
        axes[i,j].imshow(img)
        axes[i,j].set_title(classes[target] + "/" + classes[pred])
        axes[i,j].axis('off')
        
plt.subplots_adjust(left=0.1, bottom=0.0,  
                    right=0.9, top=1.0,  
                    wspace=0.1, hspace=0.4) 