## Convolution
- Steps: **Convolution** -> **Add bias** -> **ReLU**
- Two types:
    - Valid: no padding.
    - Same: output size is equal to input size.
- Filter
    - One filter contributes to **one feature map**.
    - Filter size is usually **odd**.
    - The filter and previous layer must have the **same number of channels**.

- The formulas relating the output shape of the convolution to the input shape is:
$$ n_H = \lfloor \frac{n_{H_{prev}} - f + 2 \times pad}{stride} \rfloor +1 $$
$$ n_W = \lfloor \frac{n_{W_{prev}} - f + 2 \times pad}{stride} \rfloor +1 $$
$$ n_C = \text{number of filters (i.e., number of channels) used in the convolution}$$

- As the increase of the number of layers, $n_H$ and $n_W$ are more and more smaller, but $n_C$ are more and more larger.

## Padding

The benefits of padding:
- It helps us keep more of the information at the border of an image. 
- It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes.

In [19]:
import numpy as np

# Pad the array "a" of shape $(5,5,5,5,5)$ with `pad = 1` for the 2nd dimension, `pad = 3` for the 4th dimension and `pad = 0` for the rest
a = np.random.randn(5, 5, 5, 5, 5)
a = np.pad(a, ((0,0), (1,1), (0,0), (3,3), (0,0)), 'constant')
print(a.shape) # (5, 7, 5, 11, 5)

(5, 7, 5, 11, 5)


## Pooling
It helps reduce computation, as well as helps make feature detectors **more invariant to its position** in the input.  

It operates on each channel independently, so it will **not reduce** the number of channels.

**No trainable parameters** in pooling layer.

The two types of pooling are: 
- Max-pooling
- Average-pooling

## Fully connected
Flatten the convolution layer to one vector.

If the activation size (i.e., length of the flatten vector) drops too quickly, it will downside the model performance.

## How to compute the number of layers in CNN ?
**Layer** definition: it should have trainable parameters.

Let **[CONV + POOL]** be one layer, since pooling layer does not have trainable parameters.

## What's the advantages of CNN compared to NN ?
- **Parameter sharing**: a filter is useful in one part of the image is probably useful in another part of the image.
- **Local connection**: the input is part.

## Numpy syntax
Note the **slicing** way will **retain the dimension**:

In [2]:
# a has two elements, and the shape of each element is (3, 3, 4)
a = np.random.randn(2,3,3,4)
print(a.shape) # (2, 3, 3, 4)

# first element of a
print(a[1].shape) # (3, 3, 4)

# another way
print(a[1,:,:,:].shape) # (3, 3, 4)

# note that slicing will retain the dimension
print(a[1:2,:,:,:].shape) # (1, 3, 3, 4)


# last dimension
a[..., 1] == a[:,:,:,1]