# Convolutional Neural Networks

## Building a CNN
The architecture will contain:
- `Conv2d`;
- `MaxPool2d`;
- `Rectified Linear Unit (ReLU)`;
- View;
- Linear layer.

### Conv2d
A convolution filter (kernel) is applied to the input by moving it based on the stride value.

_Stride_ is the number of values to be skipped ($1$ by default).

_Padding_ prevents the filter to stop before being applied to all the data. Basically, zeros are added to both ends of a tensor. 

Values in the kernel are randomly initialized and trained trhough backpropagation and gradient descent.

The kernel sizes that are commonly used are 1, 3, 5, and 7. The larger the kernel size, the larger the area that it can cover. Usually, high-dimensional filters are applied to the input data in the early layers.

### MaxPool2d
After convolutional layers it is common practice to add pooling layers. They reduce the size of the feature maps and the outcomes of convolution layers.

MaxPool2d acts on the data generated by each filter from the previous layer. If the kernel size is $2\times 2$, it considers that size in the image and picks the $\max$ of that area.

Another technique that is used is __average pooling__, where the $\max$ is replaced by the average.

### ReLU
It is best practice to add a nonlinear layer after the pooling layer, or after convolution. The nonlinear function is applied to each element of the feature maps. 

An example of a rectifier is the activation function: $f(x)=\max(0,x)$.

### View
It is common practice to use a fully connected (or linear) layer at the end of most networks, when dealing with image classification problems. If we are using a two-dimensional convolution, this takes a matrix of numbers as input and outputs another matix of numbers. Hence, to apply a linear layer, we need to flatten the matrix to a vector of one-dimension.

```
x.view(-1, 320)
```

The `view` method will flatten an n-dimension tensor to a one-dimensional tensor (i.e. vector). Since the input data, after batching have dimension $32\times 1 \times 28 \times 28$ (i.e. batch size x channel (BW image) x height x width), we do not want to flatten/mix data for different images, so the first argument that we pass to the function is $-1$ which correspond to saying "avoid flattening data on the first dimension".

### Linear layer
After flattening data we pass them through a linear layer, followed by a nonlinear activation. The `log_softmax` is the final activation and predicts the digits contained in the given image.

$$
logsoftmax(x_i) = \log\Big{(}\frac{\exp(x_i)}{\sum_j\exp(x_j)}\Big{)}
$$

In [8]:
import torch.nn as nn
import torch.nn.functional as F

Neural Networks can be constructed using the `torch.nn` package. An `nn.Module` contains layers and a method `forward(input)` that returns the output. `nn` depends on `autograd` to define modules and differentiate them.

Once the `forward` function is defined, the `backward` function (which computes gradients) is automatically defined. The parameters learned are returned by `net.parameters()`.

## Architecture

In [12]:
##CNN ARCHITECTURE
##Conv2d --> MaxPool2d --> ReLU 
##Conv2d --> dropout --> MaxPool2d --> ReLU
##view --> 
##FC --> ReLU
##dropout --> FC --> 
##Log_softmax

class cnn(nn.Module):
    
    def __init__(self):
        super(cnn, self).__init__()
        ##class torch.nn.Conv2d(in_channels, out_channels, kernel_size, 
        ##                      stride=1, padding=0, dilation=1, groups=1, 
        ##                      bias=True)
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        ##class torch.nn.Dropout(p=0.5, inplace=False)
        self.drop = nn.Dropout2d()
        ##class torch.nn.Linear(in_features, out_features, bias=True)
        self.l1 = nn.Linear(320, 50) #4x4x20=320 image 4x4 with 20 channels?
        self.l2 = nn.Linear(50, 10)
        
    def forward(self, x):
        ##class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, 
        ##                         dilation=1, return_indices=False, 
        ##                         ceil_mode=False)
        ##torch.nn.functional.max_pool2d(input, kernel_size, stride=None, 
        ##                               padding=0, dilation=1, ceil_mode=False, 
        ##                               return_indices=False)
        ##class torch.nn.ReLU(inplace=False)
        ## torch.nn.functional.relu(input, inplace=False) → Tensor
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.l1(x))
        ##torch.nn.functional.dropout(input, p=0.5, training=False, inplace=False)
        x = F.dropout(x, training=self.training) ##only in training phase
        x = self.l2(x)
        ##torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3)
        return F.log_softmax(x)

In [13]:
model = cnn()
print(model)
##to perform computation on the GPU
#device = torch.device("cuda", if torch.cuda.is_available() else "cpu")
#model.to(device)

cnn(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (drop): Dropout2d(p=0.5)
  (l1): Linear(in_features=320, out_features=50, bias=True)
  (l2): Linear(in_features=50, out_features=10, bias=True)
)


See https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html?highlight=convolution and (Subramanian, V. Deep Learning with PyTorch: A practical approach to building neural network models using PyTorch, 2018).