# Assignment 3 (15 points)
(due on Nov. 25, 11pm)

In the third assignment, we will write a Convolutional Neural Network (CNN) together with training and evaluation routines.

The task description can be found below.

**Important**: I strongly recommend to use *Google Collab (GC)* for this assignment. Make yourself familiar with running Jupyter notebooks on GC (especially selecting the right runtime, i.e., Python 3 + GPU). This will make your life a lot easier, as training will be faster and you can easily debug problems in your model.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

# Torchvision imports
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import MNIST, CIFAR10
from torch.utils.data.dataset import Subset
from torch.utils.data import DataLoader

# Numpy and other stuff
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
from collections import Counter

torch.manual_seed(1234);
np.random.seed(1234);

# Check if we have a CUDA-capable device; if so, use it
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Will train on {}'.format(device))

Will train on cuda


In [2]:
print(torch.__version__)

1.3.1


In [3]:
# CIFAR10 transforms (random horizontal flipping + mean/std. dev. normalize)
cifar10_transforms = transforms.Compose(
    [transforms.RandomHorizontalFlip(p=0.5),
     transforms.ToTensor(),
     transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
         
# Load full training data
ds_train = CIFAR10('./cifar', 
                 train=True, 
                 transform=cifar10_transforms, 
                 target_transform=None, 
                 download=True)

# Load full testing data 
ds_test = CIFAR10('./cifar', 
                 train=False, 
                 transform=cifar10_transforms,
                 target_transform=None, 
                 download=True)

lab = [ds_train[x][1] for x in range(len(ds_train))]

0it [00:00, ?it/s]

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./cifar/cifar-10-python.tar.gz


 99%|█████████▊| 168026112/170498071 [00:11<00:00, 17343285.75it/s]

Extracting ./cifar/cifar-10-python.tar.gz to ./cifar
Files already downloaded and verified


In [0]:
def generate_train_indices(n_splits, train_size, lab):
    s = StratifiedShuffleSplit(
        n_splits=n_splits, 
        train_size=train_size, 
        test_size=None)
    
    return [i.tolist() for i, _ in s.split(lab, lab)]

In [0]:
classes = ['plane', 
           'car', 
           'bird', 
           'cat',
           'deer', 
           'dog', 
           'frog', 
           'horse', 
           'ship', 
           'truck']

In [0]:
def show_images(ds: torchvision.datasets.cifar.CIFAR10, 
                indices: list):
    
    assert np.max(indices) < len(ds)
    
    plt.figure(figsize=(9, len(indices)));
    for j,idx in enumerate(indices):
        plt.subplot(1,len(indices),j+1)
        plt.imshow(ds[idx][0].permute(1,2,0).numpy())
        plt.title('Label={}'.format(classes[ds[idx][1]]),fontsize=9)

## Tasks

The assignment is split into 4 parts: 

1. Writing the model definition
2. Writing the training code
3. Writing the testing code
4. Writing the *glue* code for training/testing

*First*, implement the following **convolutional neural network (CNN)**: It consists of 3 blocks and a simple linear classifier at the end.

My notation denotes the following:

- `Conv2D(in_channels, out_channels, kernel_size, padding)` - 2D Convolution
- `MaxPool(kernel_size, stride, padding)` - Max. pooling
- `AvgPool(kernel_size, stride, padding)` - Avg. pooling
- `Dropout(dropout_probability)` - Dropout layer
- `BatchNorm2D` - 2D batch normalization

All these operations can also be found in the [PyTorch documentaton](https://pytorch.org/docs/stable/index.html).

**Block 1**

```
Conv2D(  3,128,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(128,128,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(128,128,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
MaxPool(2,2,0)
Dropout(0.5)
```
The output size at that point should be $N \times 128 \times 16 \times 16$.

**Block 2**

```
Conv2D(128,256,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(256,256,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(256,256,3,1) -> Batchnorm2D -> LeakyReLU(0.1)
MaxPool(2,2,0)
Dropout(0.5)
```
The output size at that point should be $N \times 128 \times 8 \times 8$.

**Block 3**

```
Conv2D(256,512,3,0) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(512,256,1,0) -> Batchnorm2D -> LeakyReLU(0.1)
Conv2D(256,128,1,0) -> Batchnorm2D -> LeakyReLU(0.1)
AvgPool(6,2,0)
Dropout(0.5)
```
The output size at that point should be $N \times 128 \times 1 \times 1$.

**Classifier**

View the output of the last block as a $1 \times 128$ tensor and add a 
linear layer mapping from $\mathbb{R}^{128} \rightarrow \mathbb{R}^{10}$
(include bias).

```python
class ConvNet(nn.Module): 
    def __init__(self, num_classes=10):
      super(ConvNet, self).__init__()
      # YOUR CODE GOES HERE
      
    def forward(self, x):
      # YOUR CODE GOES HERE
      pass
```

In [0]:
class ConvNet(nn.Module): 
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        
        def make_block(conv_config, pooling_op=None, use_dropout=False):
            mlist = nn.ModuleList()
            for in_c, out_c, k_size, pad in conv_config:
                mlist.extend([
                    nn.Conv2d(in_c, out_c, k_size, padding=pad),
                    nn.BatchNorm2d(out_c),
                    nn.LeakyReLU(0.1)
                ])
            mlist.append(pooling_op)
            if use_dropout:
                mlist.append(nn.Dropout(0.5))
            return mlist

        self.block1 = make_block([
            [  3,128,3,1],
            [128,128,3,1],
            [128,128,3,1]], 
            nn.MaxPool2d(2,stride=2,padding=0), 
            use_dropout=True)
        
        self.block2 = make_block([
            [128,256,3,1],
            [256,256,3,1],
            [256,256,3,1]], 
            nn.MaxPool2d(2,stride=2,padding=0),
            use_dropout=True)

        self.block3 = make_block([
            [256,512,3,0],
            [512,256,1,0],
            [256,128,1,0]], 
            nn.AvgPool2d(6,stride=2,padding=0),
            use_dropout=False)
        
        self.classifier = nn.Linear(128,10)
    
    def forward(self, x):
        for l in self.block1: x = l(x)
        for l in self.block2: x = l(x)
        for l in self.block3: x = l(x)
        x = x.view(x.size(0),-1)
        x = self.classifier(x)
        return x

In [8]:
net = ConvNet(10)
out = net(torch.rand(5,3,32,32))
print(out.size())

torch.Size([5, 10])


Write a **training method**

```python
def train(model, device, train_loader, optimizer, epoch):
  # your code goes here
```

which takes the `model`, the `device`, the current loader for the training data, the `optimizer` and the current epoch as parameters.

The training method should also print the accumulated cross-entropy loss over each epoch.

Then, write a **testing method**

```python
def test(model, device, test_loader):
  # your code goes here
```

which takes the `model`, the `device` and the testing data loader as parameters and evaluates the model on the testing split of CIFAR10.

*For both methods, you can use my MNIST Jupyter notebook 
as a template.*









In [0]:
def train(model, device, train_loader, optimizer, epoch):
    
    model.train()
    
    epoch_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        
        data, target = data.to(device), target.to(device)
        
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()

    print('Train Epoch: {:2d} \tLoss: {:.6f}'.format(epoch, epoch_loss))

In [0]:
def test(model, device, test_loader):
    model.eval()
    test_loss, correct = 0, 0
    
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.cross_entropy(output, target, reduction='sum').item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Finally, write the *glue* code which iterates over `n_epochs` (e.g., 100) and, in each epoch, calls `train(...)` and `test(...)`.

In [12]:
train_indices = generate_train_indices(10, 500, lab)
ds_train_subset = Subset(ds_train, train_indices[1])
print(Counter([ds_train_subset[i][1] for i in range(len(ds_train_subset))]))

train_loader = torch.utils.data.DataLoader(
    ds_train_subset,
    batch_size=32,
    shuffle=True)

test_loader = torch.utils.data.DataLoader(
    ds_test, 
    batch_size=64, 
    shuffle=False)

Counter({2: 50, 3: 50, 8: 50, 6: 50, 9: 50, 0: 50, 4: 50, 7: 50, 1: 50, 5: 50})


Train the model using **SGD** with a learning rate of 0.01 and momentum of 0.9 for 100 epochs.

After every 10th epoch, evaluate the current model on the testing data and print the current accuracy.

In [0]:
n_epochs = 100

model = ConvNet().to(device)

optimizer = optim.SGD(
    model.parameters(), 
    lr=0.01, 
    momentum=0.9,
    weight_decay=1e-3)

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 30, 0.1)

for epoch in range(1,n_epochs + 1):
    train(model, device, train_loader, optimizer, epoch)
    if epoch % 10 == 0:
      test(model, device, test_loader)
    scheduler.step()

Train Epoch:  1 	Loss: 35.070380
Train Epoch:  2 	Loss: 31.259755
Train Epoch:  3 	Loss: 29.924989
Train Epoch:  4 	Loss: 28.908141
Train Epoch:  5 	Loss: 27.579933
Train Epoch:  6 	Loss: 25.884531
Train Epoch:  7 	Loss: 25.010489
Train Epoch:  8 	Loss: 24.257061
Train Epoch:  9 	Loss: 22.806054
Train Epoch: 10 	Loss: 22.769443

Test set: Average loss: 1.8360, Accuracy: 3483/10000 (35%)

Train Epoch: 11 	Loss: 22.033722
Train Epoch: 12 	Loss: 20.440476
Train Epoch: 13 	Loss: 20.216068
Train Epoch: 14 	Loss: 19.459319
Train Epoch: 15 	Loss: 18.544853
Train Epoch: 16 	Loss: 18.764824
Train Epoch: 17 	Loss: 19.586009
Train Epoch: 18 	Loss: 19.042756
Train Epoch: 19 	Loss: 17.903930
Train Epoch: 20 	Loss: 16.870466

Test set: Average loss: 1.8044, Accuracy: 3938/10000 (39%)

Train Epoch: 21 	Loss: 16.183145
Train Epoch: 22 	Loss: 16.309255
Train Epoch: 23 	Loss: 15.619540
Train Epoch: 24 	Loss: 13.994100
Train Epoch: 25 	Loss: 13.106648
Train Epoch: 26 	Loss: 12.964822
Train Epoch: 27 	Los

In [0]:
test(model, device, test_loader)


Test set: Average loss: 1.8020, Accuracy: 4961/10000 (50%)



If you train with reasonable settings, you should get a testing accuracy somewhere between 45% and 50% (random chance is 1/10 obviously).