# Exercise Sheet 3 - Convolutional Neural Networks on MNIST + Batch Normalization

 * Deep Learning – Winter term 2019/20
 * Instructor: Prof. Dr. Alexander Ecker
 * Tutors: Pronaya Prosun Das, Samaneh Sadegh and Muhammad Jazib Zafar
 * Due date: Jan 13, 2020 at noon

In this assignment you will learn how to train a Convolutional Neural Network to classify images. We will work with the MNIST hand-written digits dataset.
The goals of this assignment are as follows:

*   Exploring the architecture of CNNs like number of features, kernel sizes and pooling.
*   Understanding the impact of batch normalization.
*   Creating custom `nn.Module` in PyTorch.

### IMPORTANT SUBMISSION INSTRUCTIONS

- When you're done, download the notebook and rename it to \<surname1\>_\<surname2\>_\<surname3\>.ipynb
- Only submit the ipynb file, no other file is required
- Submit only once
- The deadline is strict
- You are required to present your solution in the tutorial; submission of the notebook alone is not sufficient

Implementation
- Only change code to replace placeholders. Leave the other code as is.



### **PART 1**

**Importing required libraries.**

In [16]:
import torch
import torch.nn as nn
import torchvision.datasets as ds
import torchvision.transforms as T
import pathlib
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt
import time

**load dataset.**

We use the MNIST dataset. This might take a couple minutes the first time you do it. Use appropriate training and validation samples.
 

In [19]:
#seed is important for reproducibility
seed = 42
np.random.seed(seed)
print(torch.manual_seed(seed))

mnist_transforms = T.Compose([T.ToTensor(), T.Normalize((0.1307,), (0.3081,))])

batch_size = 256
# Load MNIST dataset
mnist_trainset = ds.MNIST(root='./data', train=True, download=True, transform=mnist_transforms)
train_size = int(0.8 * len(mnist_trainset))
val_size = len(mnist_trainset) - train_size
train_set, val_set = torch.utils.data.random_split(mnist_trainset, [train_size, val_size])

trainloader = DataLoader(train_set, batch_size=batch_size,
                         shuffle=True, num_workers=2)
valloader = DataLoader(val_set, batch_size=batch_size,
                        shuffle=True, num_workers=2)
mnist_testset = ds.MNIST(root='./data', train=False, download=True, transform=mnist_transforms)
testloader = DataLoader(mnist_testset, batch_size=batch_size,
                        shuffle=True, num_workers=2)

<torch._C.Generator object at 0x000002959BABE030>


**Define a model.**

The first step to training a model is defining its architecture. 
Use `nn.Sequential` to define a model with following structure:
![Imgur](https://i.imgur.com/7LfRN2y.jpg)
*   Convolutional layer with 32 filters, kernel size of 5*5 and stride of 1.
*   Max Pooling layer with kernel size of 2*2 and default stride (2).
*   ReLU activation function.
*   Linear layer with output of 512.
*   ReLU activation function.
*   A linear layer with output of 10.
*   At the end put a softmax activation.

In [11]:
# TODO
class MOD1(nn.Module): 
    def _init_(self):
        super()._init_()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=1, filters=32) #nn.Conv2d(input chanels, output chanels, kernel)
        #torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
        #output shape: M=(N-k)/s+1=(28-5)/1+1=23+1=24
        #out_channels= 32 because of 32 filters!
        self.layer1 = nn.Linear(12*12*32,512) #input_Size maybe 12*12*32 because of the image size and the number of filters, but what ist the impact of the pooling 
        self.layer2 = nn.Linear(512,10)        
        self.sll = nn.Sequential(
            self.conv1(),
            nn.functional.max_pool2d(32, (2,2)),
            nn.functional.ReLU(),
            self.layer1(),
            nn.functional.ReLU(),
            self.layer2(),
            nn.functional.Softmax() #maybe better use LogSoftmax, because later we should use the cross.entropy
        )
        
    def forward(self, x):
        x = self.sll(x)
        return x
        
    """In nn.Sequential, the nn.Module's stored inside are connected in a cascaded way. For instance, 
    in the example that I gave, I define a neural network that receives as input an image with 3 channels and 
    outputs 10 neurons. That network is composed by the following blocks, in the following order: Conv2D -> 
    ReLU -> Linear layer. Moreover, an object of type nn.Sequential has a forward() method, so if I have an 
    input image x I can directly call y = simple_cnn(x) to obtain the scores for x. When you define an 
    nn.Sequential you must be careful to make sure that the output size of a block matches the input size of 
    the following block. Basically, it behaves just like a nn.Module"""
    
model = MOD1()
model

MOD1()

In [20]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.relu1 = nn.ReLU()
        self.fc1 = nn.Linear(32 * 12 * 12, 512)
        self.relu2 = nn.ReLU()
        self.fc2 = nn.Linear(512, 10)
        self.softmax = nn.Softmax()

    def forward(self, x):
        x = self.relu1(self.pool(self.conv1(x)))
        # Changing view because for executing the 3d-tensor as a 1d tensor in the linear layers
        x = x.view(x.shape[0], 32 * 12 * 12)
        x = self.relu2(self.fc1(x))
        x = self.softmax(self.fc2(x))
        return x


net = Net()
if os.path.exists("./mnist_classifier_model.pth"):
    net.load_state_dict(torch.load("./mnist_classifier_model.pth"))

NameError: name 'os' is not defined

**Train the model**

Use the cros-entropy loss and set up an optimizer with appropriate parameters.

In [0]:
# TODO
loss_fct = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-4, momentum=9e-01)

Train the model using datasets. Run the training for at least 10 epoch. Show validation accuracy and loss in each epoch. Also monitor training accuracy and loss.

In [0]:
def accurancy_fn(correct, total): return float(correct)/total

def train(dataloader, model, deviceoptimizer, loss_fct, train_loss):
    # TODO
    epoch_loss = [] #to store the loss for each epoch
    #to store the correct preticeted values and calc the accurancy later, for each epoch seperatly
    epoch_correct = 0
    epoch_total = 0 
    #loop over the data
    for x, y in dataloader:
        optimizer.zero_grad()
        model.train()
        y_pred = model(x.to(device))
        hits = y.to(device) == y_pred.argmax(dim=1).to(device)
        epoch_correct += sum(hits)
        epoch_total += len(hits)
        loss = loss_fct(y_pred.zo(device), y.to(device))
        loss.backward()
        optimizer.step()
        epoch_loss.append(loss.item())
        average_loss = sum(epoch_loss[-50:])/len(epoch_loss[-50:])
    train_loss.extend(epoch_loss)
    epoch_accurancy = accurancy_fn(epoch_correct, epoch_total)
    return epoch_loss, epoch_accurancy 

  


In [0]:
start_time = time.time() 
(...) = train(...)
print("--- execution time in seconds : %s ---" % (time.time() - start_time))

**Show plots**

Show the Epoch Vs Accuracy and Epoch Vs Validation Loss plot from the previous training.

In [13]:
# TODO

NameError: name 'loader_test' is not defined

**Check the accuracy of the model.**

Check the accuracy of the model using test dataset: loader_test.

In [14]:
def check_accuracy(data, model):
    #nutzen des trainierten Models und dann vergleichen der 
    #split the data to the images and the output
    x, y = data
    
    #get the size of inputs/ predictions
    total = len(y)
    #predict the output with the trained model
    y_pred = model(x)
    
    #calculate the number of correct predicted numbers so y==y_pred
    #but with respect to the probability, because the model just returns probs for the outputs and not just a singel value
    correct = sum(y==y_pred.argmax(dim=1))
    return(accurancy(correct, total))
    
  # TODO


In [0]:
check_accuracy(...)

### **PART 2**

We now add batch normalization to the convolutional layer and dropout to the fully-connected layer. Both should improve performance of the model.

**Batch Normalization**

Normalization is done to adjust and scale the activations. For example, when we have features from 0 to 1 and some from 1 to 1000, we should normalize them to speed up learning. If the input layer is benefiting from it, the same should also hold for the values in the hidden layers. Batch normalization improves training speed and stabilizes training by avoiding vanishing or exploding gradients.

**Dropout**

Dropout is a regularization method. It temporarily removes a unit from the network, along with all its incoming and outgoing connections. 

**Task**

Add batch normalization after convolutional layer and put a dropout layer with probality 0.6 after the first linear layer.


In [12]:
# TODO
#took the model from Part 1
class MOD2(nn.Module): 
    def _init_(self):
        super()._init_()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=1, filters=32) #nn.Conv2d(input chanels, output chanels, kernel)
        #torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
        #output shape: M=(N-k)/s+1=(28-5)/1+1=23+1=24
        #out_channels= 32 because of 32 filters!
        self.bn1 = nn.BatchNorm2d(32) #batchnorm of 32, because of 32 filters
        self.layer1 = nn.Linear(12*12*32,512) #input_Size maybe 12*12*32 because of the image size and the number of filters, but what ist the impact of the pooling 
        self.layer2 = nn.Linear(512,10)        
        self.sll = nn.Sequential(
            self.conv1(),
            self.bn1(),
            nn.functional.max_pool2d(32, (2,2)),
            nn.functional.ReLU(),
            self.layer1(),
            nn.Dropout(p=0.6),
            nn.functional.ReLU(),
            self.layer2(),
            nn.functional.Softmax() #maybe better use LogSoftmax, because later we should use the cross.entropy
        )
        
    def forward(self, x):
        x = self.sll(x)
        return x

Train the model again with the same loss function and optimizer. Just call the train function you have written earlier.

In [0]:
# Your loss function and optimizer
# TODO

In [0]:
start_time = time.time() 
(...) = train(...)
print("--- execution time in seconds : %s ---" % (time.time() - start_time))

Is there any change in train, validation and test accuracy?

In [0]:
check_accuracy(...)

### **PART 3**

In this section we implement a model with multiple convolutional layers and train using GPU (if available).

Implement the following architecture by subclassing `nn.Module`:

![Imgur](https://imgur.com/rpBqY43.png)

**Setting Free GPU**

Go to Edit > Notebook settings or Runtime > Change runtime type and select GPU as Hardware accelerator.

![Imgur](https://imgur.com/wGchqmj.png)

For details, please read the 
[this](https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d)
article.

Also you have to move all the tensors to the GPU, otherwise it will run on CPU. Train the model for at least 10 epoch.

In [0]:
# Define the model
class CustomModel(torch.nn.Module):
    def __init__(self):
        super(CustomModel, self).__init__()
        # TODO


    def forward(self, x):
        # TODO


# Instantiate the model
cnn = CustomModel()

# Utilize GPU
device = torch.device('cuda:0')
cnn = cnn.to(device)

Define the loss function and optimizer,

In [0]:
# TODO

Call the train function you have written earlier.

In [0]:
start_time = time.time() 
(...) = train(...)
print("--- execution time in seconds : %s ---" % (time.time() - start_time))

Show the accuracy by calling check_accuracy(...)


In [0]:
check_accuracy(...)

#### [OPTIONAL]
Can you get the performance of the CNN to 99.5% with the above architecture? If not, can you change it to improve performance? Note that state-of-the-art performance on MNIST with CNNs is around 99.8% accuracy.

In [0]:
# TODO