In [None]:
%run supportvectors-common.ipynb

In this exercise we are going to compare the performance of a CNN architecture against the Fully Connected Neural Network architecture on the MNIST dataset. 

The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

SOURCE: 
[http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/)

The goal of the exercise is to classify the hand written digit images. 
We have used [the Inside Deep Learning book](https://www.manning.com/books/inside-deep-learning) - Chapter 3 as reference to create this exercise.

In [None]:
import numpy as np

#torch
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader

# Sklearn
from sklearn.metrics import accuracy_score

# custom functions
from svlearn.train.simple_trainer import train_simple_network

#Visualization
import matplotlib.pyplot as plt

from svlearn.config.configuration import ConfigurationMixin
from svlearn.common.utils import ensure_directory

config = ConfigurationMixin().load_config()

results_dir = config['mnist-classification']['results']
ensure_directory(results_dir)
data_dir = config['mnist-classification']['data']
ensure_directory(data_dir)



plot_style = config['plot_style']
plt.rcParams.update(plot_style)
plt.style.use('default')
device=torch.device(config['device']) # <------------------ If you dont have "cuda" set it to "cpu"

## Load Dataset and Explore
The MNIST dataset is available in PyTorch which we will download for this exercise. 

In [4]:
mnist_data_train =torchvision.datasets.MNIST(data_dir, train=True, download=True, transform=transforms.ToTensor())
mnist_data_test = torchvision.datasets.MNIST(data_dir, train=False, download=True, transform=transforms.ToTensor())

Let's view a sample

In [None]:
inputs , target = mnist_data_train[0]
print(inputs.shape)

From the shape of the input we can infer that the image has 1 channel , and the image size of (28 by 28 pixels).  Next, let's visualize this tensor as an image. We detach the tensor from the gradient tracking, then convert it to a numpy object after losing the extra channel dimension.

In [None]:
detached_input = inputs.detach().cpu().squeeze().numpy()
plt.figure(figsize=(3,3))
plt.xticks([])
plt.yticks([])
plt.imshow(detached_input, cmap='gray')
plt.title(target);

Next let's print the tensor to see what the pixel values look like

In [None]:
np.min(detached_input) , np.max(detached_input)

The pixel values range between 0 and 1, which are already normalized. Let's proceed to defining the neural networks

## Define Model Architecture

We will now create two simple models: 
1. a fully connected neural network - with 2 linear layers
2. a convolutional neural network - with a convolutional layer and a linear layer

In [8]:
input_dim = 28*28 # Width * Height

channels = 1 # channels

classes = 10 # number of target classes

# metrics
score_funcs = {'acc': accuracy_score }

# ------------------------------------------------------------------------
# try modifying these hyper-parameters to improve the model

filters = 16 # number of convolution filters

filter_dim = 3 # dimension of the filter (K x K)

batch_size = 32

# ------------------------------------------------------------------------

model_linear = nn.Sequential(
  nn.Flatten(), # (Batch, Channel, Width, Height) is flattened to (Batch, Channel*Width*Height) = (Batch , input_dim) 
  nn.Linear(input_dim, 256), 
  nn.Tanh(), 
  nn.Linear(256, classes),
)

# ------------------------------------------------------------------------

model_cnn = nn.Sequential(
  nn.Conv2d(channels, filters, filter_dim, padding=1),
  nn.Tanh(),
  nn.Flatten(), # (Batch, Channel, Width, Height) is flattened to (Batch, Channel*Width*Height) = (Batch , input_dim)  
  nn.Linear(filters*input_dim, classes),
)

# ------------------------------------------------------------------------

The layer `nn.Conv2d(channels, filters, filter_dim, padding=filter_dim//2),` uses a total of 16 filters of size 3 X 3. 

Therefore the input of (32 , 1, 28 , 28 ) is transformed to (32 , 16 , 28 , 28)
Here we are padding the image, which adds extra pixels (with 0 value) along the border of the image to control the size of the image outputs of the convolution layer. Here we add a padding of size 1 to retain the image size in the output. 

To see how this works let's use this [online tool](https://ezyang.github.io/convolution-visualizer/) to understand

Before training what do the filters look like?

In [None]:
param_tensor = '0.weight' # weights of the first layer
print(param_tensor)
filters = model_cnn.state_dict()[param_tensor]


plt.figure(figsize=(5,5))

for idx, filter in enumerate(filters):
    plt.subplot(4, 4, idx+1)
    plt.imshow(filter.detach().cpu().squeeze().numpy(), cmap='gray')
    plt.title(idx, fontdict={"size": 8}, pad=1)
    plt.xticks([])
    plt.yticks([])

Next we create dataloaders for the training and test datasets

In [10]:
mnist_train_loader = DataLoader(mnist_data_train, batch_size=batch_size, shuffle=True)
mnist_test_loader = DataLoader(mnist_data_test, batch_size=batch_size)

### Training the Convolutional Neural Network & Fully Connected Network

In [None]:
results_linear = train_simple_network(model=model_linear,
                           loss_func=nn.CrossEntropyLoss(),
                           train_loader=mnist_train_loader,
                           test_loader=mnist_test_loader,
                           epochs=20,
                           score_funcs=score_funcs,
                           classify=True,
                           checkpoint_file=f"{results_dir}/fc-model.pt")

In [None]:
results_linear

In [None]:
results_cnn = train_simple_network(model=model_cnn,
                           loss_func=nn.CrossEntropyLoss(),
                           train_loader=mnist_train_loader,
                           test_loader=mnist_test_loader,
                           epochs=20,
                           score_funcs=score_funcs,
                           classify=True,
                           checkpoint_file=f"{results_dir}/cnn-model.pt")

In [None]:
results_cnn

### Comparing the performance

Let's plot the validation accuracies to compare

In [None]:
import seaborn as sns
plt.rcParams.update(plot_style)
plt.style.use('ggplot')
sns.lineplot(x='epoch', y='test acc', data=results_cnn, label='CNN')
sns.lineplot(x='epoch', y='test acc', data=results_linear, label='Fully Connected');


This simple CNN model performs better than the Fully connected Neural Network just by swapping a linear layer by a convolution layer. 

### Visualize the filters after training

In [None]:

param_tensor = '0.weight' # weights of the first layer
print(param_tensor)
filters = model_cnn.state_dict()[param_tensor]

plt.style.use('default')
plt.figure(figsize=(5,5))


for idx, filter in enumerate(filters):
    plt.subplot(4, 4, idx+1)
    plt.imshow(filter.detach().cpu().squeeze().numpy(), cmap='gray')
    plt.title(idx, fontdict={"size": 8}, pad=1)
    plt.xticks([])
    plt.yticks([])
    

It is not very obvious what each filter is looking for in the image, so let's recreate the convolution operation from scratch using numpy to see what happens under the hood.

In [None]:
# taking a sample
sample = mnist_data_test[1][0][0]
sample = sample.detach().cpu().numpy()


plt.figure(figsize=(3, 3))
plt.imshow(sample, cmap='gray')
plt.xticks([])
plt.yticks([]);

In [None]:

def convolve(sample: np.array , filter: np.array) -> np.array:
    """applies the convolution filter to the sample

    Args:
        sample (np.array): the input image
        filter (np.array): filter

    Returns:
        np.array: output image after convolution
    """
    results = []
    H = sample.shape[0]
    W = sample.shape[1]

    filter_size = len(filter)

    for i in range(H - (filter_size - 1)):
        for j in range (W - (filter_size - 1)):

            segment = padded_sample[i:i+filter_size, j:j+filter_size]
            result = np.sum(segment * filter)
            results.append(result)
    
    return np.array(results).reshape((28 , 28))

# padding the sample 
padded_sample = np.pad(sample , pad_width=1)

# choosing the first filter to test. Modify this to see the effect of different filters
filter_id = 1
plt.figure(figsize=(5,5))

# for each filter apply the convolution
for idx, filter in enumerate(filters):
    filter = filter.detach().cpu().squeeze().numpy()
    convolved = convolve(padded_sample , filter)

    plt.subplot(4, 4, idx+1)
    plt.imshow(convolved, cmap='gray')
    plt.title(idx, fontdict={"size": 8}, pad=1)
    plt.xticks([])
    plt.yticks([])


We can see that each filter is capturing a different spacial feature of the image. Edges, horizontal lines, vertical lines and many combinations of other features are highlighted by these filters. And fascinatingly this ability to learn spacial patterns , help CNNs outperform fully connected networks with fewer weights! 