## Padding
The benefits of padding:
- It helps us keep more of the information at the border of an image. 
- It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes.

In [19]:
import numpy as np

# Pad the array "a" of shape $(5,5,5,5,5)$ with `pad = 1` for the 2nd dimension, `pad = 3` for the 4th dimension and `pad = 0` for the rest
a = np.random.randn(5, 5, 5, 5, 5)
a = np.pad(a, ((0,0), (1,1), (0,0), (3,3), (0,0)), 'constant')
print(a.shape) # (5, 7, 5, 11, 5)

(5, 7, 5, 11, 5)


## Convolution
- One filter contributes to one feature map.

The formulas relating the output shape of the convolution to the input shape is:
$$ n_H = \lfloor \frac{n_{H_{prev}} - f + 2 \times pad}{stride} \rfloor +1 $$
$$ n_W = \lfloor \frac{n_{W_{prev}} - f + 2 \times pad}{stride} \rfloor +1 $$
$$ n_C = \text{number of filters used in the convolution}$$

## Pooling
It helps reduce computation, as well as helps make feature detectors **more invariant to its position** in the input.  

It will not reduce the number of channels.  

The two types of pooling are: 
- Max-pooling
- Average-pooling

## Numpy syntax
Note the **slicing** way will **retain the dimension**:

In [2]:
# a has two elements, and the shape of each element is (3, 3, 4)
a = np.random.randn(2,3,3,4)
print(a.shape) # (2, 3, 3, 4)

# first element of a
print(a[1].shape) # (3, 3, 4)

# another way
print(a[1,:,:,:].shape) # (3, 3, 4)

# note that slicing will retain the dimension
print(a[1:2,:,:,:].shape) # (1, 3, 3, 4)


# last dimension
a[..., 1] == a[:,:,:,1]