## CNN Lecture 4: Padding & Strides

### I. Padding

Padding is an operation designed to address two primary problems that arise during standard convolution operations.

#### A. Problems Solved by Padding
1.  **Size Reduction and Information Loss (Resolution Loss)**:
    *   When a filter is convolved over an image, the resulting **feature map** is smaller than the input image. For instance, a $5 \times 5$ image processed by a $3 \times 3$ filter results in a $3 \times 3$ feature map.
    *   Applying successive convolutional layers causes the image size to progressively shrink (e.g., $28 \times 28 \to 26 \times 26 \to 24 \times 24$, etc.). This continuous reduction leads to information loss, which is undesirable.
2.  **Uneven Feature Importance (Border Pixels)**:
    *   Pixels located on the side or border of the image participate in fewer convolution operations compared to pixels in the centre.
    *   For example, a central pixel might be part of three or four different convolution operations, giving it a greater "say" or importance in the final feature map. Border pixels, however, might only participate in one operation.
    *   If crucial information resides in the side pixels, this information may not be adequately filtered out and preserved in the feature map.

#### B. The Padding Mechanism
*   Padding involves adding extra columns and rows (a boundary) around the input image.
*   The pixel values added in padding are typically **zeroes**. This operation is therefore commonly referred to as **Zero Padding** in literature.
*   The ultimate goal of padding is often to ensure that the size of the feature map matches the size of the input image.

#### C. Feature Map Size Calculation with Padding
The formula for calculating the shape of the resulting feature map, where $N$ is the image size, $F$ is the filter size, and $P$ is the padding amount, is:
$$\mathbf{(N + 2P - F + 1)}$$

*   *Example:* For a $5 \times 5$ image ($N=5$) and a $3 \times 3$ filter ($F=3$), if padding $P=1$ is applied, the output size is $5 + 2(1) - 3 + 1 = 5$, resulting in a $5 \times 5$ feature map.

#### D. Keras Implementation of Padding
When working with Keras, two main padding options are available:
1.  **'Valid'**: This option applies **no padding** (it functions as standard convolution).
2.  **'Same'**: Keras automatically determines the necessary padding amount ($P$) required to ensure the output feature map size is the **same** as the input image size.

### II. Strides

Strides define the movement of the convolutional filter across the input image.

#### A. Definition and Movement
*   The default stride value is **$1 \times 1$**. This means the filter moves one pixel to the right after each operation, and then one pixel downwards when moving to the next row.
*   Strides can be changed (e.g., to $2 \times 2$). A stride of $2 \times 2$ means the filter jumps two pixels horizontally and two pixels vertically for the next convolution operation.
*   If the stride value is greater than one, the process is called **Strided Convolution**.

#### B. Effect on Feature Map Size
*   Increasing the stride value **decreases the size** of the resulting feature map.

#### C. Feature Map Size Calculation with Strides
The general formula for calculating the size of the feature map, including padding ($P$), filter size ($F$), image size ($N$), and stride ($S$), is:
$$\mathbf{(N + 2P - F) / S + 1}$$

If padding is zero ($P=0$), the formula simplifies to:
$$(N - F) / S + 1$$

#### D. Special Case: Non-Integer Results
When dividing by the stride ($S$) results in a decimal value, it signifies that there are insufficient pixels remaining to perform the final convolution operation.
*   In this scenario, a **floor operation** is applied to the result of the division (rounding down to the nearest integer) before adding 1. This ensures that only completed convolution operations are considered in the final size.

#### E. Why Strides Are Necessary
Using strided convolution means the filter is skipping information, taking "jumps" across the image. Two primary reasons for using strided convolution are:
1.  **High-Level Feature Extraction**: If the goal is to capture only **high-level, gross features** and ignore low-level features (the fine details or subtleties of the image), a larger stride is used. A smaller stride allows the CNN to capture these low-level features.
2.  **Computational Efficiency**: When training is required on large datasets, increasing the stride speeds up the process because it rapidly reduces the size of the data being processed, saving computational power. (Note: This reason has become less critical recently due to advances in computing power, and models often use a default stride of 1.)

#### F. Keras Implementation of Strides
In Keras, strides are implemented using a `stride` parameter, which is a tuple (e.g., `(2, 2)`). This tuple specifies the movement in the horizontal direction (first number) and the vertical direction (second number). Using a $2 \times 2$ stride on a $28 \times 28$ input image rapidly reduces the size after consecutive convolutional layers (e.g., $28 \times 28 \to 14 \times 14 \to 7 \times 7$).

In [4]:
from tensorflow import keras
from keras import layers
from keras import Sequential

# eg model
model=Sequential()
model.add(layers.Conv2D(32,(3,3),activation='relu',input_shape=(50,50,3)))
model.add(layers.Conv2D(32,(3,3),activation='relu',padding='valid'))
model.add(layers.Conv2D(32,(3,3),activation='relu',padding='same'))
model.add(layers.Conv2D(32,(3,3),activation='relu',padding='same',strides=(2,2)))
model.summary()