# 7.3. Padding and Stride

In [None]:
import torch
from torch import nn

### 7.3.1 Padding

In [None]:
# We define a helper function to calculate convolutions. It initializes the
# convolutional layer weights and performs corresponding dimensionality
# elevations and reductions on the input and output
def comp_conv2d(conv2d, X):
  # (1, 1) indicates that batch size and the number of channels are both 1
  X = X.reshape((1,1) + X.shape)
  Y = conv2d(X)
  # Strip the first two dimensions: examples and channels
  return Y.reshape(Y.shape[2:])
# 1 row and column is padded on either side, so a total of 2 rows or columns are added
conv2d = nn.LazyConv2d(1, kernel_size=3, padding=1)
X = torch.rand(size=(8,8))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

When the height and width of the convolution kernel are different, we can make the output and input have the same height and width by setting different padding numbers for height and width.

In [None]:
# We use a convolution kernel with height 5 and width 3. The padding on either
# side of the height and width are 2 and 1, respectively
conv2d = nn.LazyConv2d(1, kernel_size=(5,3), padding=(2,1))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

### 7.3.2 Stride

In [None]:
conv2d = nn.LazyConv2d(1, kernel_size=3, padding=1, stride=2)
comp_conv2d(conv2d, X).shape

torch.Size([4, 4])

In [None]:
conv2d = nn.LazyConv2d(1, kernel_size=(3,5), padding=(0,1), stride=(3,4))
comp_conv2d(conv2d, X).shape

torch.Size([2, 2])

## Discussions

- Assuming that the input shape is $n_h \times n_w$ and the convolution kernel shape is $k_h \times k_w$ the output shape will be $(n_h - k_h +1)\times (n_w-k_w + 1)$: we can only shift the convolution kernel so far until it runs out of pixels to apply the convolution to.
- One issue when applying convolutional layers is that we tend to lose pixels on the perimeter of our image.

7.3.1 Padding
- A straightforward solution to the problem of losing pixels is to add extra pixels of filler around the boundary of our input image, thus increasing the effective size of the image. Typically, we set the values of the extra pixels to zero.
- In general, if we add a total of $p_h$ rows of padding (roughly half on top and half on bottom) and a total of $p_w$ columns of padding (roughly half on the left and half on the right), the output shape will be $$ (n_h - k_h + p_h +1)\times(n_w-k_w+p_w+1)$$
  - In many cases, we will want to set $p_h = k_h-1$ and $p_w = k_w-1$ to give the input and output the same height and width.
  - Assuming that $k_h$ is odd here, we will pad $p_h/2$rows on both sides of the height. If $k_h$ is even, one possibility is to pad $\lceil p_h /2 \rceil$ rows on the top of the input and $\lfloor p_h /2 \rfloor$ rows on the bottom. We will pad both sides of the width in the same way.
- CNNs commonly use convolution kernels with odd height and width values, such as 1, 3, 5, or 7.

7.3.2 Stride
- Sometimes we move our window more than one element at a time, skipping the intermediate locations. This is useful if the convolution kernel is large since it captures a large area of the underlying image.
- We refer to the number of rows and columns traversed per slide as stride.
- In general, when the strinde for height is $s_h$ and the stride for the width is $s_w$, the output shape is
$$
(n_h/s_h)\times(n_w/s_w)
$$


- Padding can increase the height and width of the output. Often used to give the output the same height and width as the input to avoid undesirable shrinkage of the output. It ensures that all pixels are used equally frequently.
- Typically we pick symmetric padding on both sides of the input height and width. In which case we simply state that we choose padding $p$

### Exercises

1. For audio signals, what does a stride of 2 correspond to?

With stride 2, the filter moves two samples at a time.

2. What are the computational benefits of a stride larger than 1?

It reduces computational complexity. Reduces the number of computations needed for a given layer and fewer output values are computed.
Also, since fewer output activations are produced, memory usage is also reduced.