# 05 - Input and output shapes of neural network layers

When defining neural networks, we have to make sure that consecutive layers are compatible. In practice, this basically means that the output shape of layer ``l`` is compatible with the input shape requirements of the layer ``l+1``. 

We introduced pytorch's standard dimension notations in the 2nd tutorial `02_using_nn_sequential_and_training_loop.ipynb`. Here, we will use a specific example to illustrate how important it is to:

1. carefully read the documentation
2. understand how input and output shapes are defined. 

You do not need to understand what is a convolutional layer nor a maxpool layer yet, you just need to read the documentation. We will study these layers in depth later on, so don't worry if you're struggling a bit now. If you are curious, you can still watch **Andrew Ng's videos about convolution and pooling for detailed info (especially from C4W1L02 to C4W1L11)**. But don't insist if it's more confusing than helpful for now! You can go back to this document once you are more familiar with deep learning concepts. :)

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms

## Designing a specific example

### Loading data

For this specific example will we use a (preprocessed) batch from the CIFAR dataset. 

In [None]:
def load_CIFAR10(data_path='../data/', preprocessor=None):
    
    if preprocessor is None:
        preprocessor = transforms.Compose([
            transforms.CenterCrop((28, 32)),
            transforms.ToTensor(),
        ])
    
    data_train_val = datasets.CIFAR10(
        data_path,       
        train=True,      
        download=True,
        transform=preprocessor)
    
    return data_train_val

data = load_CIFAR10()
loader = torch.utils.data.DataLoader(data, batch_size=512, shuffle=False)

# Take the first first batch of our dataset
(batch, labels) = next(iter(loader))

print("\nShape of a batch for our specific example:\n", batch.shape)

### Defining a neural network

For this specific example we define a very simple convolutional network. Note that we don't use the functional API here for non-trainable layers such as MaxPool2d or Flatten. This is only for clarity purpose.

In [None]:
class MyNet(nn.Module):
    def __init__(self):
        super().__init__() 
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(3, 4), padding=1)  
        self.pool1 = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(16, 10, kernel_size=3)
        self.pool2 = nn.MaxPool2d(2)
        self.flat = nn.Flatten()
        self.fc1 = nn.Linear(in_features=10 * 6 * 6, out_features=32)
        self.fc2 = nn.Linear(32, 10)

    def forward(self, x):
        print("input shape:              ", x.shape)
        
        out = self.conv1(x)
        print("After 1st convolution:    ", out.shape)
        
        out = torch.relu(self.pool1(out))
        print("After 1st MaxPool2d:      ", out.shape)
        
        out = self.conv2(out)
        print("After 2nd convolution:    ", out.shape)
        
        out = torch.relu(self.pool2(out))
        print("After 2nd MaxPool2d:      ", out.shape)
        
        out = self.flat(out)
        print("After flattening:         ", out.shape)
        
        out = torch.relu(self.fc1(out))
        print("After 1st linear:         ", out.shape)
        
        out = self.fc2(out)
        print("After 2nd linear:         ", out.shape)
        return out
    
model = MyNet()
out = model(batch)

## Figuring out shapes when stacking layers

### Shape of the data

When defining `MyNet`, the objective is to deal with batches of size `N=512` of `28x32` RGB images, which means inputs are of dimensions ``(512, 3, 28, 32)``. There are 10 classes in this dataset so the output of `MyNet` must be of shape `(512, 10)`. Note that, in general `N` is not affected by layers nor activation functions.

### 1st layer: Convolution, [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#conv2d)

Let's take a closer look at the following line (line 4 in the cell above):

```python
nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(3, 4), padding=1)  
```

Here: 
- ``in_channels = 3`` because we have 3 channels in our data (RGB). You are not free to choose what you want here, it must correspond to the number of channels of this layer's input (which is here your data).
- ``out_channels = 16`` You are free to choose what you want here, this corresponds to the number of convolutional filters in the layer. It also defines `C_out`.
- ``kernel_size = (3, 4)`` The dimensions of your convolutional filters. You are free to choose what you want here, but it will affect the `H_out` and `W_out` of the output.
- ``stride = 1`` (default value). You are free to choose what you want here, but affects `H_out` and `W_out` as well.
- ``padding = 1`` You are free to choose what you want here also, but affects `H_out` and `W_out` as well.

To compute the values of ``H_out`` and ``W_out`` we used the formula written in the documentation: 

$$H_{out} = int\Big[ \frac{H_{in} + 2*padding - kernel\_size}{stride} + 1 \Big]$$ 
$$W_{out} = int\Big[ \frac{W_{in} + 2*padding - kernel\_size}{stride} + 1 \Big]$$ 

Therefore, the input shape ``(N, C_in, H_in, W_in)`` of this layer is `(512, 3, 32, 32)` and the output shape ``(N, C_out, H_out, W_out)`` of this layer is `(512, 16, 28, 31)`. 

### Intermediate layer: MaxPool, [nn.MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#maxpool2d)

```python
nn.MaxPool2d(2)  #line 5
```

Which means: 

- ``kernel_size=2`` The dimensions of your pooling filters. You are free to choose what you want here, but it will affect the `H_out` and `W_out` 
- ``stride=kernel_size`` (default value). You are free to choose what you want here, but it will affect the `H_out` and `W_out`. When ``stride=kernel_size``, the output dimensions `H_out` and `W_out` are the input dimensions `H_in` and `W_in` divided by ``kernel_size``. 

Therefore the input/output shapes are:

- `(N, C_in, H_in, W_in) = (512, 16, 28, 31)`
- `(N, C_out, H_out, W_out) = (512, 16, 14, 15)`

Note that for a `nn.MaxPool2d` layer, there is no requirements on any of the shape variables `N, C_in, H_in, W_in` and that you are free to choose its parameters.

### 2nd layer: Convolution again

```python
self.conv2 = nn.Conv2d(16, 10, kernel_size=3) #line 6
```

Which means:

- ``in_channels = 16``. Which is indeed compatible with the previous layer where we had `C_out=16`.
- ``out_channels = 10``, ``kernel_size = 3 ``, ``stride = 1 `` (default value),``padding = 1 `` Which we are all free to choose (but affects ``H_out`` and ``W_out``).

Therefore the input/output shapes are:

- `(N, C_in, H_in, W_in)=(512, 16, 14, 15)`
- `(N, C_out, H_out, W_out)=(512, 10, 12, 13)`

### 2nd Intermediate layer: MaxPool, [nn.MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#maxpool2d)

```python
nn.MaxPool2d(2)  #line 7
```

Therefore the input/output shapes are:

- `(N, C_in, H_in, W_in)=(512, 10, 12, 13)`
- `(N, C_out, H_out, W_out)=(512, 10, 6, 6)`

### 3rd layer: Fully connected layer [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#linear)

#### Flatten layer, [nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html#flatten)

This type of layer requires a different type of input shape: `(N, in_features)` and outputs batches of shape `(N, out_features)`. So the inputs must be flattened first, see line 8.

So, for this layer we have as input dimension `(N, C_in, H_in, W_in)=(512, 10, 6, 6)`, and as output dimension `(N, out_features)=(512, 10*6*6)`

#### Linear layer

```python
nn.Linear(in_features=10 * 6 * 6, out_features=32)  #line 9
```

Which means:

- ``in_features = 10*6*6``. You are not free to choose what you want here, it must correspond to the number features of this layer's input (which is here the output of the flatten layer). This is indeed compatible with the previous layer where we had `out_features=10*6*6`.
- ``out_features = 32``. You are free to choose what you want here.

So, for this layer we have as input dimension `(N, in_features)=(512, 10*6*6)`, and as output dimension `(N, out_features)=(512, 32)`

### 4th layer: Fully connected layer [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#linear)

```python
self.fc1 = nn.Linear(32, 10) # line 10
```

Which means:

- ``in_features = 32``. This is indeed compatible with the previous layer.
- ``out_features = 10``. You are not free to choose what you want here because it is the last layer of your model. It must correspond to the classes in your dataset.

Therefore the input/output shapes are:

- `(N, in_features)=(512, 32)`
- `(N, out_features)=(512, 10)`
