<div class="alert alert-block alert-info">
<b>Number of points for this notebook:</b> 1
<br>
<b>Deadline:</b> March 23, 2020 (Monday). 23:00
</div>

# Exercise 4.2. Convolutional networks. VGG-style network.

In the second part you need to train a convolutional neural network with an architecture inspired by a VGG-network [(Simonyan \& Zisserman, 2015)](https://arxiv.org/abs/1409.1556).

In [2]:
skip_training = True  # Set this flag to True before validation and submission

In [3]:
# During evaluation, this cell sets skip_training to True
# skip_training = True

In [4]:
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import torch
import torchvision
import torchvision.transforms as transforms

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import tools
import tests

In [5]:
# When running on your own computer, you can specify the data directory by:
# data_dir = tools.select_data_dir('/your/local/data/directory')
data_dir = tools.select_data_dir()

The data directory is /coursedata


In [6]:
# Select the device for training (use GPU if you have one)
#device = torch.device('cuda:0')
device = torch.device('cpu')

In [7]:
if skip_training:
    # The models are always evaluated on CPU
    device = torch.device("cpu")

## FashionMNIST dataset

Let us use the FashionMNIST dataset. It consists of 60,000 training images of 10 classes: 'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'.

In [8]:
transform = transforms.Compose([
    transforms.ToTensor(),  # Transform to tensor
    transforms.Normalize((0.5,), (0.5,))  # Min-max scaling to [-1, 1]
])

trainset = torchvision.datasets.FashionMNIST(root=data_dir, train=True, download=True, transform=transform)
testset = torchvision.datasets.FashionMNIST(root=data_dir, train=False, download=True, transform=transform)

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal',
           'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=5, shuffle=False)

# VGG-style network

Let us now define a convolution neural network with an architecture inspired by the [VGG-net](https://arxiv.org/abs/1409.1556):

<img src="vgg-style.png" width=600 style="float: left;">

The architecture:
* A block of three convolutional layers with:
    * 3x3 kernel
    * 16 output channels
    * one pixel zero-pading on both sides
    * 2d batch normalization after each convolutional layer
    * ReLU nonlinearity after each 2d batch normalization layer
* Max pooling layer with 2x2 kernel and stride 2.
* A block of three convolutional layers with:
    * 3x3 kernel
    * 32 output channels
    * one pixel zero-pading on both sides
    * 2d batch normalization after each convolutional layer
    * ReLU nonlinearity after each 2d batch normalization layer
* Max pooling layer with 2x2 kernel and stride 2.
* One convolutional layer with:
    * 3x3 kernel
    * 48 output channels
    * *no padding*
    * 2d batch normalization after the convolutional layer
    * ReLU nonlinearity after the 2d batch normalization layer
* One convolutional layer with:
    * 1x1 kernel
    * 32 output channels
    * *no padding*
    * 2d batch normalization after the convolutional layer
    * ReLU nonlinearity after the 2d batch normalization layer
* One convolutional layer with:
    * 1x1 kernel
    * 16 output channels
    * *no padding*
    * 2d batch normalization after the convolutional layer
    * ReLU nonlinearity after the 2d batch normalization layer
* Global average pooling (compute the average value of each channel across all the input locations):
    * 5x5 kernel (the input of the layer should be 5x5)
* A fully-connected layer with 10 outputs (no nonlinearity)

Notes:
* Batch normalization is expected to be right after a convolutional layer, before nonlinearity.
* We recommend that you check the number of modules with trainable parameters in your network.

In [28]:
class VGGNet(nn.Module):
    def __init__(self, n_channels=16):
        """
        Args:
          n_channels (int): Number of channels in the first convolutional layer. The number of channels in the following layers are the multipliers of n_channels. Hence, parameters of the layers to follow can be defined using this variable.
        """
        super(VGGNet, self).__init__()
        self.model= nn.Sequential(
            # First Block
            nn.Conv2d(in_channels=1,out_channels=16,kernel_size=3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.Conv2d(in_channels=16,out_channels=16,kernel_size=3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.Conv2d(in_channels=16,out_channels=16,kernel_size=3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Second Block
            nn.Conv2d(in_channels=16,out_channels=32,kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Conv2d(in_channels=32,out_channels=32,kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Conv2d(in_channels=32,out_channels=32,kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(in_channels=32,out_channels=48,kernel_size=3),
            nn.BatchNorm2d(48),
            nn.ReLU(),
            
            nn.Conv2d(in_channels=48,out_channels=32,kernel_size=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            
            nn.Conv2d(in_channels=32,out_channels=16,kernel_size=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            
            nn.AvgPool2d(kernel_size=5, stride=1),
            nn.Flatten(),
            nn.Linear(in_features=16, out_features=10)
        )

    def forward(self, x, verbose=False):
        """
        Args:
          x of shape (batch_size, 1, 28, 28): Input images.
          verbose: True if you want to print the shapes of the intermediate variables.
        
        Returns:
          y of shape (batch_size, 10): Outputs of the network.
        """
        # YOUR CODE HERE
        return self.model(x)

In [29]:
def test_VGGNet_shapes():
    net = VGGNet()
    net.to(device)

    # Feed a batch of images from the training data to test the network
    with torch.no_grad():
        images, labels = iter(trainloader).next()
        images = images.to(device)
        print('Shape of the input tensor:', images.shape)

        y = net(images, verbose=True)
        assert y.shape == torch.Size([trainloader.batch_size, 10]), f"Bad y.shape: {y.shape}"

    print('Success')

test_VGGNet_shapes()

Shape of the input tensor: torch.Size([32, 1, 28, 28])
Success


In [30]:
tests.test_vgg_net(VGGNet)

y: tensor([[ 8.0032,  8.0032,  8.0032,  8.0032,  8.0032, -8.0032, -8.0032, -8.0032,
         -8.0032, -8.0032]], grad_fn=<AddmmBackward>)
expected: tensor([ 8.0032,  8.0032,  8.0032,  8.0032,  8.0032, -8.0032, -8.0032, -8.0032,
        -8.0032, -8.0032])
Success


# Train the network

In [31]:
# This function computes the accuracy on the test dataset
def compute_accuracy(net, testloader):
    net.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    return correct / total

### Training loop

Your task is to implement the training loop. The recommended hyperparameters:
* Adam optimizer with learning rate 0.01.
* Cross-entropy loss. Note that we did not use softmax nonlinearity in the final layer of our network. Therefore, we need to use a loss function with log_softmax implemented, such as [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss).
* Number of epochs: 10

We recommend you to use function `compute_accuracy()` defined above to track the accaracy during training. The test accuracy should be above 0.87.

**Note: function `compute_accuracy()` sets the network into the evaluation mode which changes the way the batch statistics are computed in batch normalization. You need to set the network into the training mode (by calling `net.train()`) when you want to perform training.**

In [32]:
net = VGGNet()

In [35]:
# Implement the training loop in this cell
net.train()
if not skip_training:
    print('Start Training')
    optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
    criterion= nn.CrossEntropyLoss()
    running_loss=0
    for epoch in range(10):
        print('Epoch: {}'.format(epoch))
        for index, data in enumerate(trainloader,1):
            print(index,end='')
            inputs, labels = data
            inputs.to(device)
            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
            if index % 200 == 0:  # print every 200 mini-batches
                print("\n",end='')
                print('e{}, i{} loss: {}'.format(epoch + 1, index , running_loss / 200))
                running_loss = 0.0
            else:
                print('\r',end='')
        #print("\n",end='')
        #print('e{}, acc: {}'.format(epoch + 1,  compute_accuracy(net,testloader)))
                

Start Training
Epoch: 0
200
e1, i200 loss: 0.36217107236385343
400
e1, i400 loss: 0.37921126548200845
600
e1, i600 loss: 0.3616892295703292
800
e1, i800 loss: 0.3465536125376821
1000
e1, i1000 loss: 0.32729655753821135
1200
e1, i1200 loss: 0.3349766008928418
1400
e1, i1400 loss: 0.2914083111286163
1600
e1, i1600 loss: 0.3249150077998638
1800
e1, i1800 loss: 0.31037201832979916
Epoch: 1
200
e2, i200 loss: 0.3799090321734548
400
e2, i400 loss: 0.2896515525504947
600
e2, i600 loss: 0.28101243149489163
800
e2, i800 loss: 0.2803269878216088
1000
e2, i1000 loss: 0.2561722061224282
1200
e2, i1200 loss: 0.2910097647458315
1400
e2, i1400 loss: 0.2759807527810335
1600
e2, i1600 loss: 0.27103545527905226
1800
e2, i1800 loss: 0.2751238760072738
Epoch: 2
200
e3, i200 loss: 0.3614684865344316
400
e3, i400 loss: 0.24263677155598998
600
e3, i600 loss: 0.2580601344071329
800
e3, i800 loss: 0.26429131504148246
1000
e3, i1000 loss: 0.2568614980764687
1200
e3, i1200 loss: 0.2378591691888869
1400
e3, i1400

In [36]:
# Save the model to disk (the pth-files will be submitted automatically together with your notebook)
if not skip_training:
    tools.save_model(net, '4_vgg_net.pth')
else:
    net = VGGNet()
    tools.load_model(net, '4_vgg_net.pth', device)

Do you want to save the model (type yes to confirm)? yes
Model saved to 4_vgg_net.pth.


In [37]:
# Compute the accuracy on the test set
accuracy = compute_accuracy(net, testloader)
print('Accuracy of the VGG net on the test images: %.3f' % accuracy)
assert accuracy > 0.87, "Poor accuracy ({:.3f})".format(accuracy)
print('Success')

Accuracy of the VGG net on the test images: 0.923
Success
