# Convolutional Neural Network - part B

In this notebook we will look a very quickl look at what convolutional layers learn

Goal of this lab is to:

* Visualize and understand what convolutional layer learns

# Optional project - reproducibility challenge

You are welcome to work on it during the lab if you have finished the exercises. https://www.cs.mcgill.ca/~jpineau/ICLR2018-ReproducibilityChallenge.html

* Counts as 4 points (~ should be approximately worth of exercises from 2 labs)
* Pick an ICLR paper and then consult the choice
* Deadline for submitting the report is 20.06.2018, 23:59:59.
* Don't pick papers submitted by GMUM :)

# Whiteboard exercises

( + Any exercise from the previous labs )

* (0.5) What is the most common way to apply batch normalization to convolutional layer output? Provide an explanation behind it. Reference: https://arxiv.org/pdf/1502.03167v3.pdf.

* (0.5) Look up instance normalization, a variant of batch normalization. Explain: (i) how it works, (ii) when and why it might be preferable to batch normalization.

* (0.5) Should we apply batch normalization before or after Dense layer? Provide an explanation. Reference: https://arxiv.org/pdf/1502.03167v3.pdf.

# Setup

In [None]:
# Boilerplate code to get started

%load_ext autoreload
%autoreload 
%matplotlib inline

import json
import matplotlib as mpl
from src import fmnist_utils
from src.fmnist_utils import *

from torchvision import utils
import numpy as np
import matplotlib.pyplot as plt

def plot(H):
    plt.title(max(H['test_acc']))
    plt.plot(H['acc'], label="acc")
    plt.plot(H['test_acc'], label="test_acc")
    plt.legend()

mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['figure.figsize'] = (7, 7)
mpl.rcParams['axes.titlesize'] = 12
mpl.rcParams['axes.labelsize'] = 12

(x_train, y_train), (x_test, y_test) = fmnist_utils.get_data(which="mnist")

x_train_4d = x_train.view(-1, 1, 28, 28)
x_test_4d = x_test.view(-1, 1, 28, 28)

# Exercise 1: Convolution vs FC on FMNIST

* Fill out blanks in the code
* Train an MLP and a ConvNet for the supplied hyperparameters. Which one works better in terms of generalization?

In [None]:
def build_conv(input_dim, output_dim, n_filters=32, maxpool=4, hidden_dims=[32], dropout=0.0):
    model = torch.nn.Sequential()
    
    # Convolution part
    model.add_module("conv2d", torch.nn.Conv2d(input_dim[0], n_filters, kernel_size=5, padding=??))
    model.add_module("relu", torch.nn.ReLU()) 
    model.add_module("maxpool", torch.nn.MaxPool2d(maxpool))
    model.add_module("dropout", torch.nn.Dropout2d(dropout))
    model.add_module("flatten", ??) # Add flattening from 4d -> 2d. 
    
    previous_dim = ??
    
    # Classifier
    for id, D in enumerate(hidden_dims):
        model.add_module("linear_{}".format(id), torch.nn.Linear(previous_dim, D, bias=True))
        model.add_module("nonlinearity_{}".format(id), torch.nn.ReLU())
        previous_dim = D
    model.add_module("final_layer", torch.nn.Linear(D, output_dim, bias=True))
    return model

In [None]:
## Starting code for training a ConvNet.
input_dim = (1, 28, 28)

model = build_conv(input_dim, 10, n_filters=32, dropout=0.5)

loss = torch.nn.CrossEntropyLoss(size_average=True)
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

H = train(loss=loss, model=model, x_train=x_train_4d, y_train=y_train,
          x_test=x_test_4d, y_test=y_test,
          optim=optimizer, batch_size=128, n_epochs=50)

plot(H)

In [None]:
## Starting code for training a MLP.

model = build_mlp(784, 10)
loss = torch.nn.CrossEntropyLoss(size_average=True)
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
H_mlp = train(loss=loss, model=model, x_train=x_train, y_train=y_train,
          x_test=x_test, y_test=y_test, optim=optimizer, batch_size=128, n_epochs=100)

# Exercise 2: examine filters

1. Plot filters from *best validation epoch* of the model in exercise 1. Please save them to file "9b_1.png".

2. Plot filter from different epochs. How are they changing? In which epochs do the filters stabilize? Please save your answer to "9b_2.txt"

In [None]:
## Starting code for Ex2.1

def vistensor(tensor, ch=0, allkernels=False, nrow=8, padding=1): 
    '''
    vistensor: visuzlization tensor
        @ch: visualization channel 
        @allkernels: visualization all tensores
    ''' 
    
    n,c,w,h = tensor.shape
    if allkernels: tensor = tensor.view(n*c,-1,w,h )
    elif c != 3: tensor = tensor[:,ch,:,:].unsqueeze(dim=1)
        
    rows = np.min( (tensor.shape[0]//nrow + 1, 64 )  )    
    grid = utils.make_grid(tensor, nrow=nrow, normalize=True, padding=padding)
    plt.figure( figsize=(nrow,rows) )
    plt.imshow(grid.numpy().transpose((1, 2, 0)))

filters = ?? # Get Torch Tensor corresponding to filters from the best validation point in training
vistensor(filters, ch=0, allkernels=True)