# Deep Learning

## Conv Layers

- https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n

### Conv1D

- https://datascience.stackexchange.com/questions/12830/how-are-1x1-convolutions-the-same-as-a-fully-connected-layer#:~:text=Instead%20of%20a%20single%20output,really%20act%20as%201x1%20convolutions.
- https://jdhao.github.io/2017/09/29/1by1-convolution-in-cnn/


In [3]:
import torch
from torch import nn

# Shows that 1x1 conv is similar to a FC layer in terms out final output

m = nn.Conv2d(16, 33, kernel_size=1, stride=1)
_input = torch.randn(20, 16, 50, 100)
output1 = m(_input)

fc = nn.Linear(16, 33)
fc.weight.data = m.weight.squeeze()
fc.bias.data = m.bias
output2 = fc(_input.transpose(1, 3)).transpose(1, 3)

print("--- max diff", (output2 - output1).abs().max())

--- max diff tensor(5.9605e-07, grad_fn=<MaxBackward1>)


### Conv2D

- https://stackoverflow.com/questions/55444120/understanding-the-output-shape-of-conv2d-layer-in-keras



### Conv3D

<img src="https://drive.google.com/uc?id=17MZ4evTef0ypWyALfTaGWj4xgd-BqHOf" alt="parameters" width="600" height="600">

## Batch Normalization

- https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/

## CNN Invariance

### CNN Conv

The convolutional and pooling layers in a CNN network are independent of the image size (input shape), this is because the weights of each convolutional layers are calculated only on the number of filters (out_channels). Please refer to my image in Deep Learning Notes on how to calculate number of parameters (which is the number of weights). It is as simple as

$$f^{\ell} \times f^{\ell} \times n_{c}^{\ell-1} \times n_{c}^{\ell} + n_{b}^{\ell}$$

where we denote

$f^{\ell} = \text{filter size in current layer}$

$n_{c}^{\ell-1} = \text{number of filters/channels in previous layer}$

$n_{c}^{\ell} = \text{number of filters in current layer}$

$n_{b}^{\ell} = \text{number of bias in current layer}$

So one can simply calculate the first layer's paramaters/weights as follows given the input size of (3,224,224):

$$\text{number of weights/paramaters} = 3 \times 3 \times 3 \times 16 + 16 = 448$$

---

and for the second layer it is:

$$\text{number of weights/paramaters} = 5 \times 5 \times 16 \times 32 + 32 = 12832$$

---

and for the linear layer without bias it is:

$$\text{number of weights/paramaters} = \text{number of input neurons} \times \text{number of output neurons} = 387200 \times 64 = 24780800$$

What I did just now is to make a point that when we calculate the weights/paramaters of each CNN layer, there is absolutely no input shape or image size involved. Thus, the implication is that the number of weights of a CNN layer is **invariant of the input shape**. However, the output shape of each CNN layer is not the same for varying image size, and this will pose a problem - which will be explained in the next part. Before we go, take a moment to run the code below and see that for 2 different input shape 224 vs 448 and you see the only changes are the output shape, the number of weights and parameters are not changed.

However, the issue arises since we are mostly training images using transfer learning. That is to say, we will be using a model that is already trained on a fixed image size. Take ImageNet for example, `VGG16` is trained with 224x224 image sizes. Let us just assume for a moment the model we defined below is `VGG16`, then you should realize that the number of weights and parameters are already fixed once the model is trained. If you **hardcoded** the `Linear()` layer, then you will encounter an error if you run:

```python
input_image_448 = torch.randn((1, 3, 448, 448))
model(input_image_448)
```

This is because at the `Linear` layer, the model expects an input of

The output of the convolutional layers will have different spatial sizes for differently sized images, and this will cause an issue if we have a fully connected layer afterwards (since our fully connected layer requires a fixed size input). So for example **VGG16** which is pretrained on `imagenet` with image sizes of 224x224, then when you load the `state dict`, the number of weights and parameters are already fixed. To be more verbose, the number of weights for the convolutional layers stay the same for any input image size, but the fully connected layers will not. For example, in the native resolution of 224x224, the layer before the fully connected layer is a convolutional layer and subsequent pooling layer - which has an output shape of `(-1, 512, 7, 7)`. We need to flatten this pooling layer into a dense layer first, one can imagine in a 3-dimensional perpective that we squashed a pool of 3-dimensional neurons into a vertical fully connected neurons. Refer to this image: 

Now, some intuition needs to be provided here, for the absent minded (me), look further after the image pasted above for the math behind weights (pages after). 

Continuing above, we know that the learnable weights of the flattened layer is $512\times 7\times 7 = 25088$, and since we are connected to a pre-defined fully connected layer of 4096 neurons, then it follows that in this very fully connected layer, we will output $$25088\times 4096 + 4096 = 102764544$$ weights. This is a fixed number and will change if you change the image input size.

For example, if I were to input a 512x512 image, see the code above, then the previous layer before the fully connected layer is actually `[-1, 512, 16, 16]` which is not the same as $512\times 7 \times 7$. This will lead to our weights mismatched at the fully connected layer. So we can solve this problem by using `nn.AdaptiveAvgPooling`.

In [4]:
import torchsummary 
import torch
from torch import nn
from prettytable import PrettyTable

def get_output_shape(model, image_dim):
    return model(torch.rand(*(image_dim))).data.shape



def count_parameters(model):
    """Counts the number of learnable parameters in each layer of a PyTorch Model.

    Args:
        model ([type]): [description]

    Returns:
        [type]: [description]
    """
    table = PrettyTable(["Modules", "Parameters"])
    total_params = 0
    for name, parameter in model.named_parameters():
        if not parameter.requires_grad:
            continue
        param = parameter.numel()
        table.add_row([name, param])
        total_params += param
    print(table)
    print(f"Total Trainable Params: {total_params}")
    return total_params

In [24]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.net = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False),
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(5,5), stride=(1,1), padding=(1,1)),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(387200, 64),
            nn.ReLU(),
            nn.Linear(64, 10),
        )

    def forward(self, x):
        for layer in self.net:
            x = layer(x)
            print(x.size())
        return x


model = Model()
input_image_224 = torch.randn((1, 3, 224, 224))
input_image_448 = torch.randn((1, 3, 448, 448))

# Let's prin) it
model(input_image_224)

torch.Size([1, 16, 224, 224])
torch.Size([1, 16, 224, 224])
torch.Size([1, 16, 112, 112])
torch.Size([1, 32, 110, 110])
torch.Size([1, 32, 110, 110])
torch.Size([1, 387200])
torch.Size([1, 64])
torch.Size([1, 64])
torch.Size([1, 10])


tensor([[-0.0094,  0.0795, -0.0394,  0.0066,  0.0128, -0.1164,  0.0741,  0.1371,
          0.1820, -0.0125]], grad_fn=<AddmmBackward>)

In [16]:
count_parameters(model)

+--------------+------------+
|   Modules    | Parameters |
+--------------+------------+
| net.0.weight |    432     |
|  net.0.bias  |     16     |
| net.3.weight |   12800    |
|  net.3.bias  |     32     |
| net.6.weight |  24780800  |
|  net.6.bias  |     64     |
| net.8.weight |    640     |
|  net.8.bias  |     10     |
+--------------+------------+
Total Trainable Params: 24794794


24794794

In [6]:
# model = model.to('cuda')
# model_summary = torchsummary.summary(model, (3,224,224))
# model_summary

In [None]:
model(input_image_448)

+--------------+------------+
|   Modules    | Parameters |
+--------------+------------+
| net.0.weight |    432     |
|  net.0.bias  |     16     |
| net.3.weight |   12800    |
|  net.3.bias  |     32     |
| net.6.weight |  24780800  |
|  net.6.bias  |     64     |
| net.8.weight |    640     |
|  net.8.bias  |     10     |
+--------------+------------+
Total Trainable Params: 24794794


24794794

In [None]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.net = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False),
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(5,5), stride=(1,1), padding=(1,1)),
            nn.ReLU(),
            nn.AdaptiveAvgPool2d(110,110),
            nn.Flatten(),
            nn.Linear(387200, 64),
            nn.ReLU(),
            nn.Linear(64, 10),
        )

    def forward(self, x):
        for layer in self.net:
            x = layer(x)
            print(x.size())
        return x

## Pooling Layers

### AdaptiveAvgPool

Mostly used in CNN models to make input image size invariant.

- https://www.zhihu.com/question/282046628
- https://stackoverflow.com/questions/58692476/what-is-adaptive-average-pooling-and-how-does-it-work

In [4]:
# target output size of 7x7 (square)
m = nn.AdaptiveAvgPool2d(7)
input = torch.randn(1, 64, 32, 32)
output = m(input)
print(output.shape)

torch.Size([1, 64, 7, 7])
