# Assignment 3 (15 points)
(due on Nov. 25, 11pm)

In the third assignment, we will write a **Convolutional Neural Network (CNN)** together with training and evaluation routines.

The task description can be found [below](#Task).

**Important**: I strongly recommend to use *Google Collab (GC)* for this assignment. Make yourself familiar with running Jupyter notebooks on GC (especially selecting the right runtime, i.e., Python 3 + GPU). This will make your life a lot easier, as training will be faster and you can easily debug problems in your model.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

# Torchvision imports
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import MNIST, CIFAR10
from torch.utils.data.dataset import Subset
from torch.utils.data import DataLoader

# Numpy and other stuff
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
from collections import Counter

torch.manual_seed(1234);
np.random.seed(1234);

# Check if we have a CUDA-capable device; if so, use it
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Will train on {}'.format(device))

Will train on cpu


In [3]:
# CIFAR10 transforms (random horizontal flipping + mean/std. dev. normalize)
cifar10_transforms = transforms.Compose(
    [transforms.RandomHorizontalFlip(p=0.5),
     transforms.ToTensor(),
     transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
         
# Load full training data
ds_train = CIFAR10('/tmp/cifar', 
                 train=True, 
                 transform=cifar10_transforms, 
                 target_transform=None, 
                 download=True)

# Load full testing data 
ds_test = CIFAR10('/tmp/cifar', 
                 train=False, 
                 transform=cifar10_transforms,
                 target_transform=None, 
                 download=True)

lab = [ds_train[x][1] for x in range(len(ds_train))]

Files already downloaded and verified
Files already downloaded and verified


In [4]:
def generate_train_indices(n_splits, train_size, lab):
    s = StratifiedShuffleSplit(
        n_splits=n_splits, 
        train_size=train_size, 
        test_size=None)
    
    return [i.tolist() for i, _ in s.split(lab, lab)]

In [5]:
classes = ['plane', 
           'car', 
           'bird', 
           'cat',
           'deer', 
           'dog', 
           'frog', 
           'horse', 
           'ship', 
           'truck']

In [6]:
def show_images(ds: torchvision.datasets.cifar.CIFAR10, 
                indices: list):
    
    assert np.max(indices) < len(ds)
    
    plt.figure(figsize=(9, len(indices)));
    for j,idx in enumerate(indices):
        plt.subplot(1,len(indices),j+1)
        plt.imshow(ds[idx][0].permute(1,2,0).numpy())
        plt.title('Label={}'.format(classes[ds[idx][1]]),fontsize=9)

## Tasks

The assignment is split into 4 parts: 

1. Writing the model definition
2. Writing the training code
3. Writing the testing code
4. Writing the *glue* code for training/testing

*First*, implement the following **convolutional neural network (CNN)**: It consists of 3 blocks and a simple linear classifier at the end.

My notation denotes the following:

- `Conv2D(in_channels, out_channels, kernel_size, padding)` - 2D Convolution
- `MaxPool(kernel_size, stride, padding)` - Max. pooling
- `AvgPool(kernel_size, stride, padding)` - Avg. pooling
- `Dropout(dropout_probability)` - Dropout layer
- `BatchNorm2D` - 2D batch normalization

All these operations can also be found in the [PyTorch documentaton](https://pytorch.org/docs/stable/index.html).

**Block 1**

```
Conv2D(  3,128,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(128,128,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(128,128,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
MaxPool(2,2,0)
Dropout(0.5)
```
The output size at that point should be $N \times 128 \times 16 \times 16$.

**Block 2**

```
Conv2D(128,256,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(256,256,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(256,256,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
MaxPool(2,2,0)
Dropout(0.5)
```
The output size at that point should be $N \times 128 \times 8 \times 8$.

**Block 3**

```
Conv2D(256,512,3,0) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(512,256,1,0) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(256,128,1,0) -> Batchnorm2D -> LeakyReLU(0.1)
AvgPool(6,2,0)
Dropout(0.5)
```
The output size at that point should be $N \times 128 \times 1 \times 1$.

**Classifier**

View the output of the last block as a $1 \times 128$ tensor and add a 
linear layer mapping from $\mathbb{R}^{128} \rightarrow \mathbb{R}^{10}$
(include bias).

In [7]:
class ConvNet(nn.Module): 
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        # YOUR CODE GOES HERE

    def forward(self, x):
        # YOUR CODE GOES HERE
        pass

You can quickly test your network definition with ...

In [None]:
net = ConvNet(10)
out = net(torch.rand(5,3,32,32))
print(out.size())

Write a **training method** which takes the `model`, the `device`, the current loader for the training data, the `optimizer` and the current epoch as parameters. The training method should also print the accumulated cross-entropy loss over each epoch.

In [8]:
def train(model, device, train_loader, optimizer, epoch):
    # YOUR CODE GOES HERE
    pass

Then, write a **testing method** which takes the `model`, the `device` and the testing data loader as parameters and evaluates the model on the testing split of CIFAR10. 

In [9]:
def test(model, device, test_loader):
   # YOUR CODE GOES HERE
    pass   

In [10]:
train_indices = generate_train_indices(10, 500, lab)
ds_train_subset = Subset(ds_train, train_indices[1])
print(Counter([ds_train_subset[i][1] for i in range(len(ds_train_subset))]))

train_loader = torch.utils.data.DataLoader(
    ds_train_subset,
    batch_size=32,
    shuffle=True)

test_loader = torch.utils.data.DataLoader(
    ds_test, 
    batch_size=64, 
    shuffle=False)

Counter({2: 50, 3: 50, 8: 50, 6: 50, 9: 50, 0: 50, 4: 50, 7: 50, 1: 50, 5: 50})


Finally, write the *glue* code which iterates over `n_epochs` (e.g., 100) and, in each epoch, calls `train(...)` and `test(...)`. Train the model using **SGD** with a learning rate of 0.01 and momentum of 0.9 for 100 epochs.
After every 10th epoch, evaluate the current model on the testing data and print the current accuracy.

**Bonus (2 points)**: Add a learning rate scheduler which divides the learning rate into half after each 30th epoch.

In [None]:
n_epochs = 100

model = ConvNet().to(device)

# YOUR CODE GOES HERE

If you train with reasonable settings, you should get a testing accuracy somewhere between 45% and 50% (random chance is 1/10 obviously).