# CNNs in Depth

## Convolutional layers in depth

Definitions:
- dense layers = fully connected layers, meaning that the nodes are connected to every node in the previous layer
- locally-connected layers (convolutional layers) = meaning that the nodes are only connected to a small subset of the previous layer nodes

Kernels for color images:
- grayscale images uses a matrix of `kernel_size` x `kernel_size` numbers
- color images uses a 3d filter of `kernel_size` x `kernal_size` x 3 for RGB colors

"Channels" indicate the feature maps of convolutional layers: a convolutional layer that takes feature maps with a depth of 64 and outputs 128 feature maps is said to have 64 channels as input and 128 as outputs.

### The number of parameters in a convolutional layer
The number of parameters in a convolutional layer is the number of filters times the number of input feature maps times the size of the kernel squared plus the number of kernels:
- $n_k$​ : number of filters in the convolutional layer
- $k$ : height and width of the convolutional kernel
- $c$ : number of feature maps produced by the previous layer (or number of channels in input image)

There are k times k times c weights per filter plus one bias per filter, so $c*k2 + 1$ parameters. The convolutional layer is composed of $n_k$​ filters, so the total number of parameters in the convolutional layer is: $n_p = n_k (c k^2 + 1)$


## Convolutional Layers in PyTorch

Example code to create a convolutional layer:
```python
from torch import nn
conv1 = nn.Conv2d(in_channels, out_channels, kernel_size)
```

We can also add an activation and dropout function. In case of CNNs, we need to use the 2d version of dropout, which randomly drops some input channel entirely:
```python
conv_block = nn.Sequential(
  nn.Conv2d(in_channels, out_channels, kernel_size),
  nn.ReLU(),
  nn.Dropout2d(p=0.2)
)
```

## Stride and padding

Padding and stride are hyperparameters, or configuration settings, for the filter:
- Padding: Expanding the size of an image by adding pixels at its border
- Stride: Amount by which a filter slides over an image.

There are multiple padding strategies:
- zero-padding strategy is by far the most common: padding values will all be 0
- reflect: padding pixels filled with copies of values in input image taken in opposite order, in a mirroring fashion
- replicate: padding pixels filled with value of closest pixel in input image
- circular: like reflect mode, but image is first flipped horizontally and vertically

## Pooling layers

Also for pooling layers you can set the padding and stride paramters.

Different padding strategies:
- max-pooling: take the maximum value in the window = mostly used for images
- average-pooling: take mean average of all the values in the window

Example code for max and average pooling:
```python
from torch import nn
nn.MaxPool2d(kernel_size, stride)
nn.AvgPool2d(window_size, stride)
```

Definitions:
- Kerel size: size of the side of the convolutional kernel
- Window size: size of the window considered during pooling
- Stride: Step size of the convolutional kernel or of the pooling window when moving over the input image
- Padding: Border to add to an input image before the convolution operation is performed

## Typical structure of CNNs

Example code of typical convolutional layer:
```python
conv1 = nn.Conv2d(
    depth_of_input_layer, # for an RBG image, the input size is 3
    desired_depth_of_output, # if we want 16 filters, this will be 16
    kernel_size, # size of the filter
    stride, # default is 1
    padding, # default is 0
)
```

A classical CNN is made of two distinct parts, sometimes called the backbone and the head:
- The **backbone** is made of convolutional and pooling layers, and has the task of extracting information from the image.
- After the backbone there is a flattening layer that takes the output feature maps of the previous convolutional layer and flattens them out in a 1d vector: for each feature map the rows are stacked together in a 1d vector, then all the 1d vectors are stacked together to form a long 1d vector called a feature vector or embedding.

### A simple CNN

Example of a simple CNN in Pytorch:
```python
import torch
import torch.nn as nn

class MyCNN(nn.Module):

  def __init__(self, n_classes):

    super().__init__()

    # Create layers. In this case just a standard MLP
    self.model = nn.Sequential(
      # First conv + maxpool + relu
      nn.Conv2d(3, 16, 3, padding=1),
      nn.MaxPool2d(2, 2),
      nn.ReLU(),
      nn.Dropout2d(0.2),

      # Second conv + maxpool + relu
      nn.Conv2d(16, 32, 3, padding=1),
      nn.MaxPool2d(2, 2),
      nn.ReLU(),
      nn.Dropout2d(0.2),

      # Third conv + maxpool + relu
      nn.Conv2d(32, 64, 3, padding=1),
      nn.MaxPool2d(2, 2),
      nn.ReLU(),
      nn.Dropout2d(0.2),

      # Flatten feature maps
      nn.Flatten(),

      # Fully connected layers. This assumes
      # that the input image was 32x32
      nn.Linear(1024, 128),
      nn.ReLU(),
      nn.Dropout(0.5),
      nn.Linear(128, n_classes)
    )

  def forward(self, x):

    # nn.Sequential will call the layers 
    # in the order they have been inserted
    return self.model(x)
```

### Output volume for a convolutional layer

To compute the output size of a given convolutional layer we can perform the following calculation (taken from [Stanford's cs231n course](http://cs231n.github.io/convolutional-networks/#layers)):
- We can compute the spatial size of the output volume as a function of the input volume size (W), the kernel/filter size (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. The correct formula for calculating how many neurons define the output_W is given by: $\frac{{\text{{Input size W}} - \text{{Kernel size F}} + 2 \times \text{{Padding P}}}}{{\text{{Stride S}}}} + 1$


For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output:
- W = 7
- F = 3
- S = 1
- P = 0
- ((7-3+2*0)/1) + 1 = 5

With stride 2 we would get a 3x3 output.

# Misc

A tensor can be moved to GPU or CPU by:
```python
tensor.cuda()
tensor.cpu()
```

CUDA is a parallel computing platform
