# 5a. Convolutional neural networks

Traditional multilayered perceptron has many limitations. It does not take spatial structure of data into consideration. Let's not forget about the fact that fully connected layers lead to huge numbers of weights for images with high resolution and therefore make it impossible to process data efficiently. 

That's where CNNs (convolutional neural networks) come to play. The idea behind it is how visual cortex analyzes images. It creates and adapts filters that extract features. In contrast to classical image classifiers, the filters are not hand-engineered by experts but trained automatically within the network.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import torch
from torch import nn
import torchvision

%matplotlib inline
import seaborn as sns
sns.set(style="darkgrid")

## CNN structure

Let's take a look at the one of the built-in CNN model available in `torchvision` library:

In [None]:
import torchvision.models as models
alexnet = models.alexnet()
alexnet

We can see that it contains many layers of different types. It may seem complicated at first, but those building blocks will be explained in this section so don't worry.

### Convolutional layers

It is no surprise that convolutional layers are the core concept used in CNNs. The convolution is an operation that can be seen as applying filters to the images. One of the most popular filters using convolution is Sobel filter detecting edges which you can see below.

Convolutional layers 'learn' which filters need to be applied by themselves. There is no need to hand engineer and adapt the filters anymore. The network adapts its weights

In [None]:
convolutional_layer = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=0, stride=1)
print(convolutional_layer)

In [None]:
torch.nn.Conv2d?

## Pooling layers

Using convolutional layers can lead to increasing dimension of the layers. In order to reduce the number of parameters in the network, we use pooling layers. These are the layers which down-sample their inputs by using selected function (eg. maximum value within the frame). This process is presented in the picture below: 

![pooling layer](image "Max pooling layer")

In [None]:
input_array = np.arange(-8, 8, dtype=np.float32).reshape((1,4,4))
input_layer = torch.from_numpy(input_array)

pd.DataFrame(data=input_layer.numpy().reshape((4,4)))

### Max

One of the most popular type of pooling layers is max pooling. As its name suggests, it takes the highest value within the frame. In PyTorch there are several ways to use max pooling layer:

- `nn.MaxPool2d` - simplest to use, you specify the size of the kernel, stride and padding yourself,
- `nn.AdaptiveMaxPool2d` - you specify the size of the output, size of the kernel, other parameters are adapted according to the given parameters,
- `nn.FractionalMaxPool2d` - applies fractional max pooling, you specify the size of the kernel and the size of the output/output ratio, described in detail in the paper [Fractional MaxPooling by Ben Graham](https://arxiv.org/abs/1412.6071)

### Exercise 2:

Declare max pooling layer, pass input layer through it and display results.

In [None]:
max_pool = nn.MaxPool2d(2)
pd.DataFrame(data=max_pool(input_layer).numpy().reshape((2,2)))

In [None]:
adp_max_pool = nn.AdaptiveMaxPool2d(output_size=(2,2))
pd.DataFrame(data=adp_max_pool(input_layer).numpy().reshape((2,2)))

In [None]:
frac_max_pool = nn.FractionalMaxPool2d(2, output_size=(2, 2))
pd.DataFrame(data=frac_max_pool(input_layer).numpy().reshape((2,2)))

### Average

Another type of pooling layer is average pooling. Again, PyTorch provides several options to use it:

- `nn.AvgPool2d` - simplest to use, you specify the size of the kernel, stride and padding yourself,
- `nn.AdaptiveAvgPool2d` - you specify the size of the output, size of the kernel, other parameters are adapted according to the given parameters

### Exercise 3:

Declare average pooling layer, pass input layer through it and display results.

In [None]:
avg_pool = nn.AvgPool2d(2)
pd.DataFrame(data=avg_pool(input_layer).numpy().reshape((2,2)))

In [None]:
adp_avg_pool = nn.AdaptiveAvgPool2d(output_size=(2,2))
pd.DataFrame(data=adp_avg_pool(input_layer).numpy().reshape((2,2)))

### Power average

The last type of built-in pooling in PyTorch is power average pooling, which calculates the output according to this formula:

\begin{equation}
f(X)=\sqrt[p]{\sum_{x \in X}x^p}
\end{equation}

If you use $p=1$ you get sum pooling, $p=\infty $ returns results similar to max pooling.

### Exercise 4:

Declare power average pooling layer, pass input layer through it and display results.

In [None]:
lp_pool = nn.LPPool2d(2, 2)
pd.DataFrame(data=lp_pool(input_layer).numpy().reshape((2,2)))

## Activation layers

Another block in CNNs are activation layers which were discussed in [Neural networks notebook](../3_Neural_networks/3a_Neural_network_module.ipynb). If you need to refresh your memory, please refer to the section Activation layers there.

### Exercise 5:

Declare ReLU activation layer and pass output of the pooling layer of your choice through it. Display the results.

In [None]:
act_layer = nn.ReLU()
pd.DataFrame(data=act_layer(adp_avg_pool(input_layer)).numpy().reshape((2,2)))

## Classifier

Classifier is the last part of the network. It is a linear neural network which analyzes features extracted by the previous blocks and provides information about the predicted class. Again, for more about linear networks, please refer to [Neural networks notebook](../3_Neural_networks/3a_Neural_network_module.ipynb).

### Exercise 6:

Declare classifier similar to the one used in AlexNet.

In [None]:
classifier = nn.Sequential(
    nn.Dropout(),
    nn.Linear(in_features=9216, out_features=4096),
    nn.ReLU(),
    nn.Dropout(),
    nn.Linear(in_features=4096, out_features=4096),
    nn.ReLU(),
    nn.Linear(in_features=4096, out_features=1000),
  )
classifier

## CIFAR10

In this notebook, we will be working with CIFAR10 dataset available in `torchvision`. It provides images of the objects of ten classes:

In [None]:
from workshop import data

train_cifar = torchvision.datasets.CIFAR10(data.DATA_PATH, download=True, train=True)
test_cifar = torchvision.datasets.CIFAR10(data.DATA_PATH, download=True, train=False)

### Exercise 7

We discussed the basics of CNNs. Now it's time to implement your first network. 

In [None]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        
        # input 32x32x3
        self.conv_1 = nn.Conv2d(3, 16, 3, padding=1)
        # input 16x16x16
        self.conv_2 = nn.Conv2d(16, 32, 3, padding=1)
        # input 8x8x32
        self.conv_3 = nn.Conv2d(32, 64, 3, padding=1)
        
        # reduces size by 2
        self.pool = nn.MaxPool2d(2)
        
        # input 4x4x64
        self.lin_1 = nn.Linear(4*4*64, 128)
        self.lin_2 = nn.Linear(128, 10)
        
        self.dropout = nn.Dropout()
        self.act = nn.ReLU()
        
    def forward(self, x):        
        x = self.pool(self.act(self.conv_1(x)))
        x = self.pool(self.act(self.conv_2(x)))
        x = self.pool(self.act(self.conv_3(x)))
        
        x = x.flatten()
        
        x = self.dropout(x)
        x = self.act(self.lin_1(x))
        x = self.dropout(x)
        x = self.act(self.lin_2(x))
        return x

In [None]:
network = Network()
network

## Visualize filters

## References

- [PyTorch NN module documentation](https://pytorch.org/docs/stable/nn.html)
- [Convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network)
- [Convolutional layers for deep learning neural networks](https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/)