# INFO8010: Homework 2

Last week you learned how to program your first neural network starting from the very first principles of deep learning. If you managed to solve last week's assignment without any problems **congratulations!**, if that was not the case **don't worry**, here's a second assignment for you which you can use to get better at deep learning.

In this homework we will see some slighly more complicated deep learning concepts: we will start by taking a look at some of PyTorch's functionalities that are necessary for training deep networks efficiently. We will then train our first neural networks for tackling different image classification tasks, learn to build custom datasets and explore how to train a CNN.  

The strucutre of the notebook is identical to the one of the previous assignment. Similarly to last time you will have to handle in the notebook with your solutions to the exercises. When you will encounter the following instruction <span style="color:red; font-style: italic"> Your code comes below </span> you will again have to write some code yourself, while when you will see the instruction: <span style="color:green; font-style: italic"> Your discussion comes here </span> you will just have to discuss the results you obtained.

Without further ado let's start by importing the libraries we will need throughout this assignment!

In [None]:
import numpy as np
import torch, torchvision
import torch.nn as nn
from torchvision import datasets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt 
from PIL import Image

## 1. Dataloaders

Today's first concept are PyTorch's dataloaders. As you have seen during the theoretical lectures, one of the main ingredients for successfully training deep learning models is data: **lots of data!**. 

As you can easily imagine, it is computationally not suitable to load datasets of millions of images into the memory of your machine, furthermore these images do also come in a form that does not make it possible to exploit the tensor operations we have seen in the previous assignment. 

To deal with these issues (and many more of them) we can use [dataloaders](https://pytorch.org/docs/stable/data.html), a data loading utility that allows us to deal with large datasets very efficiently. In what follows you are given your first example of dataloader which will use the popular [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. 

In [None]:
transform = transforms.Compose([transforms.ToTensor()])
trainset = datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
testset = datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)

Let's explain what we just did. Thanks to the PyTorch [Torchvision](https://pytorch.org/vision/stable/index.html) package we just downloaded the CIFAR10 dataset on our machine. The dataset was stored in the `./data` folder and comes in two different forms thanks to the use of the `train` flag: a version that can be used as training set, and a version that can be used as testing set. These two datasets are abstracted through Torchvision `Dataset` sub-classes, we will see later in what this `Dataset` class consists exactly. Torchvision also allows us to define a set of image transformations which we have defined at the beginning of this cell: in this case we would like to convert our images to tensors, see the [doc](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.ToTensor) for an exact description of this transformation. 

Now that we have defined which dataset we would like to use, and the form in which we would like to have our images we can create our very first dataloader that will load, transform and return us mini-batches of 4 images at the time that we can later on use for training. The advantage of dataloaders is that they can perform pre-processing of the data in parallel, e.g. while a batch is being processed in another thread of by your GPU.

In [None]:
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

Before training anything however, let's take a look at the images we just downloaded!

In [None]:
classes = ('plane', 'car', 'bird', 'cat', 'deer', 
           'dog', 'frog', 'horse', 'ship', 'truck')

def show_images(img):
    img = img 
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()
    
dataiter = iter(trainloader)
images, labels = dataiter.next()
show_images(torchvision.utils.make_grid(images))
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

There we go: here's your first four images of the CIFAR10 dataset!

**Some additional data transformations**

The `transforms` module comes also in as very handy for performing other type of data transformations: here's an example which transforms the CIFAR10 images into gray scaled images.

In [None]:
transform = transforms.Compose([transforms.Grayscale(), transforms.ToTensor()])
gray_scaled_trainset = datasets.CIFAR10(root="./data", train=True, download=False, transform=transform)
gray_scaled_trainloader = torch.utils.data.DataLoader(gray_scaled_trainset, batch_size=4, 
                                                      shuffle=True, num_workers=2)

dataiter = iter(gray_scaled_trainloader)
images, labels = dataiter.next()
show_images(torchvision.utils.make_grid(images))
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

### Find the bug

Al remembered from the theoretical lectures that one way to make neural networks more robust to overfitting can be based on data augmentation. Therefore he programmed this code snippet for performing random horizontal flips on his training set. However he encountered the following bug:

In [None]:
transform = transforms.Compose([transforms.ToTensor(), transforms.RandomHorizontalFlip()])
bugged_trainset = datasets.CIFAR10(root="./data", train=True, download=False, transform=transform)
bugged_trainloader = torch.utils.data.DataLoader(bugged_trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

dataiter = iter(bugged_trainloader)
images, labels = dataiter.next()
show_images(torchvision.utils.make_grid(images))
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

Fix his mistake and discuss what he did wrong: put
<span style="color:red; font-style: italic">your code below</span> together <span style="color:green; font-style: italic"> with your explanation</span>:

In [None]:
# your code

However these images are not completely ready for a neural network, in fact Al forgot to perform a **very important** pre-processing step which could lead to vanishing gradient problems! 

What did Al not do? Fix his code snippet and candidate yourself as the next deep learning teaching assistant!

<span style="color:red; font-style: italic"> Your code comes below </span>

In [None]:
transform = transforms.Compose() # write the sequence of appropriate transformations
trainset = datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

### Running operations on a GPU
As you may know, one important aspect of deep learning is that large models can be trained efficiently on specialized hardwares such as Graphical Processing Units (GPUs) or Tensorial Processing Units (TPUs). Pytorch allows you to perform operations on GPUs very easily by simply transfering the concerned models and/or tensors to the GPU, see the following examples.

First you need to check that you have access to a GPU.

In [None]:
print(torch.cuda.is_available())

If you see `True`, everything is ready to run on the GPU and you can go to the next cell. Otherwise it means you do not have any GPU that is compatible with the torch version that is installed on your machine. We invite you to use [Google Colab](https://colab.research.google.com/) to do the rest of this homework. Do not forget to ask Colab for a GPU.

Let's compare the speed of tensor operations on GPU and CPU. 

In [None]:
%%timeit
# On CPU:
A = torch.randn(1000, 100000)
B = torch.randn(100000, 1)
x = A @ B

In [None]:
%%timeit
# On GPU:
device = 'cuda:0'
for i in range(10):
    # We directly create random tensors on the GPU
    A = torch.randn((1000, 100000), device=device)
    B = torch.randn((100000, 1), device=device)
    x = A @ B

Instead of directly creating a tensor on the GPU you may also transfer a model or a tensor on the GPU, for example we can transfer a simple MLP on the GPU and then back to the CPU as follows.

In [None]:
# Create MLP on CPU
mlp = nn.Sequential(nn.Linear(2, 200), nn.ReLU(), 
                    nn.Linear(200, 200), nn.ReLU(), 
                    nn.Linear(200, 200), nn.ReLU(), 
                    nn.Linear(200, 200), nn.ReLU(), 
                    nn.Linear(200, 200), nn.ReLU(), 
                    nn.Linear(200, 1), nn.Sigmoid())

In [None]:
%%timeit
# Forward pass on CPU
y_pred = mlp(torch.randn(100, 2))

In [None]:
%%timeit
# Forward pass on GPU, be careful the input data must be on the GPU as well
mlp = mlp.to('cuda:0')
y_pred = mlp(torch.randn((100, 2), device='cuda:0')

In [None]:
%%timeit
# Forward pass back on CPU, be careful the input data must be on the CPU as well
mlp = mlp.to('cpu')
y_pred = mlp(torch.randn((100, 2), device='cpu')

As you may notice computations are more efficient on the GPU, however data transfer between GPU and CPU (and vice-versa) may be very slow, we therefore recommend you to reduce the transfer of data between GPU and CPU as much as possible. For example when you want to save your loss after each iteration, in order to avoid a memory leak you would prefer doing `.detach` (which will run on GPU without data transfer) instead of `.cpu()`, `.item()` or `.numpy()` (which would transfer data back to the CPU).

## 2.  Classifying the CIFAR10 dataset with an MLP

Now that you have all the necessary information about how to handle datasets, we are ready to properly train today's first deep learning model on the CIFAR10 dataset. Before we dive into it **do not underestimate** the importance of properly pre-processing the data before training neural networks. This step is as important as defining the neural architectures themeselves which however gets very often overlooked. Al Dente forgets to do it every year, and each year he complains about the bad grade he receives for the course.
    
In this exercise you will be provided with an already defined multi-layer perceptron that you can train to classify CIFAR10 images. The structure of the network is already defined, yet some crucial hyperparameters for training are missing. It is your job to fill them in and successfully train the network. As part of the exercise you are also required to monitor the evolution of training: this usually consists in keeping track of the accuracy that the model obtains on the training and testing sets, and in checking how the training and testing losses evolve while optimizing the network. Report these statistics with some plots. In addition, transfer the model and the data on GPU in order to fasten up the training.


<span style="color:red; font-style: italic"> Fill in the code below </span> and <span style="color:green; font-style: italic"> describe </span> why you chose some parameters over others. Also discuss your results, are you satisifed with the final accuracy of the model?

In [None]:
input_dim =              # fill in with appropriate values
hidden_dim = 
output_dim = 
learning_rate = 
num_epochs = 

class net(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(net, self).__init__()
        self.input_dim = input_dim
        self.net = nn.Sequential(nn.Linear(input_dim, hidden_dim), nn.ReLU(),
                                 nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
                                 nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
                                 nn.Linear(hidden_dim, output_dim))

    def forward(self, x):
        return self.net(x.view(x.size(0), self.input_dim))
    
device = 'cuda:0'
model = net(input_dim, hidden_dim, output_dim).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
transform = transforms.Compose([transforms.Resize((32, 32)),
                                transforms.ToTensor(), 
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) 

trainset = datasets.CIFAR10(root = "./data", train=True, download=True, transform=transform)
testset = datasets.CIFAR10(root = "./data", train=False, download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=True, num_workers=2)

def train(num_epochs):
    epochs_train_loss = []
    epochs_test_loss = []
    for i in range(num_epochs):
        if i % 1 == 0:
            with torch.no_grad():
                correct = 0
                total = 0
                for inputs, targets in testloader:
                    outputs = model(inputs.to(device))
                    loss = criterion(outputs, targets.to(device))
                    _, predicted = outputs.max(1)
                    total += targets.size(0)
                    correct += predicted.eq(targets.to(device)).sum().item()

            print('Accuracy of the model on the testing images: %d %%' % (100 * correct / total))
        tmp_loss = []
        for (x, y) in trainloader:
            outputs = model(x.to(device))
            loss = criterion(outputs, y.to(device))
            tmp_loss.append(loss.detach())
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    return epochs_train_loss, epochs_test_loss

In [None]:
epochs_train_loss, epochs_test_loss = train(num_epochs)

Plot the train and test loss here below:

<span style="color:red; font-style: italic"> Your code comes below </span> 

In [None]:
# your code

## 3.  Create a custom dataset!

Sometimes you would like to train a model on your own dataset, which will very likely not be part of the Torchvision package. To overcome this you can create a custom dataset class which will handle the data for you. This can be done by inheriting from the `torch.utils.data.TensorDataset` class. 

In this exercise your goal is to program a custom dataset class which we will later use for training two different models. We will use the Kaggle Cats and Dogs Dataset which you can download from [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). 

When programming a custom dataset class you have to start by defining the constructor, which will get as input the location of your dataset, whether the images that will be returned will serve for training or testing, and some other potential attributes. For this exercise we will be using 20000 images for training, while 5000 ones for testing.  For the `__getitem__` function you may find the `PIL.Image.open` useful, do not forget to return the item class as well ($0$ or $1$).


<span style="color:red; font-style: italic"> Your code comes below</span>:

In [None]:
from PIL import Image
from numpy import asarray
import os

class CatAndDogsDataset(torch.utils.data.Dataset):
    def __init__(self, root_dir, train=True):
        """Initializes a dataset containing images and labels."""
        super().__init__()
        # Your code

    def __len__(self):
        """Returns the size of the dataset."""
        # Your code

    def __getitem__(self, index):
        """Returns the index-th data item of the dataset."""
        # Your code
    
transform = transforms.Compose([transforms.Resize((32, 32)), 
                                transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Let us have a quick look at these samples.

In [None]:
def show_images(img):
    img = img 
    npimg = img.numpy() * .5 + .5
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

my_dataset = CatAndDogsDataset('kagglecatsanddogs_3367a/PetImages/', transform=transform) # training directory
my_loader = torch.utils.data.DataLoader(my_dataset, batch_size=4, shuffle=True, num_workers=0)

dataiter = iter(my_loader)
images, labels = dataiter.next()
show_images(torchvision.utils.make_grid(images))

## 4. Classifying the Cats and Dogs dataset with a CNN!

As we have seen in class, classifying images with a multi-layer perceptron isn't really a good idea. Convolutional Neural Networks (CNN) are in fact a much better option for this task. It is now your job to create your custom CNN and train it on the Cats and Dogs Dataset.

Similarly to what you have done when classifying the CIFAR10 dataset you are again required to report and discuss the performance of your model.

<span style="color:red"> Your code comes below </span> together <span style="color:green"> with an explanation </span> about what has motivated you some of your design choices.

In [None]:
# your code

## Feedback

Now that you are done with this final deep-learning assignment here are some final questions about the exercises you were required to solve:

<span style="color:blue">How much time did you spend on this homework?</span>

<span style="color:blue">Do you feel confortable with what it means to define a neural network and train it?</span>

<span style="color:blue">Do you think you now have enough preliminary knowledge for successfully starting to work on your course final project?</span>

<span style="color:blue">If you had to go through the two homeworks again, is there something you would have liked to explore more or explained more into detail?</span>