## CNN Lecture 5: The Pooling Layer

### I. Introduction and Definition

The Pooling Layer is an essential component of a CNN architecture, typically added **right after a Convolutional Layer**.

*   **Operation Type:** Pooling is fundamentally a **downsampling operation**.
*   **Purpose:** The primary goal of pooling is to **reduce the size of the feature map**.
*   **Trainability:** Pooling layers have **zero trainable parameters** because they function purely as an aggregate operation.

### II. Necessity: Problems Solved by Pooling

Pooling is used to address two major issues that arise after standard convolution operations:

#### A. Memory Issues
*   Convolution generates large **feature maps** that require a significant amount of memory storage (RAM).
*   If the feature maps are too large (e.g., a $228 \times 228 \times 100$ feature map), the memory requirement for even a single training datum can be substantial (around 900MB), potentially causing the program to **crash**.
*   Pooling actively reduces the size of the feature map, thereby mitigating these storage demands.

#### B. Translation Variance
*   Standard convolution operations exhibit **Translation Variance**. This means that the features detected are **location dependent**.
*   If a specific feature (like a cat's ear) slightly shifts position within the image, subsequent layers will treat it as a different feature.
*   For tasks like **image classification**, the specific location of a feature does not matter; only the presence of the feature matters.
*   Pooling solves this by ensuring **Translation Invariance**. By downsampling the feature map, the features become **location independent**.

### III. How Pooling Works (Mechanism and Types)

Pooling is applied to individual feature maps after the non-linear activation (like ReLU) is applied.

<a href="https://deeplizard.com/resource/pavq7noze3">See demo here</a>
#### A. Required Configuration
To define a pooling operation, three elements must be specified:
1.  **Size of the window (Pool Size):** Commonly $2 \times 2$, but can be changed.
2.  **Stride Value:** Often set to 2, but customizable.
3.  **Type of Pooling** (e.g., Max, Mean).

#### B. Max Pooling (The Most Used Type)
*   The filter (window) is placed over a section of the feature map.
*   The operation extracts the **single maximum value** from the numbers within that window. This maximum value is considered the most **dominant feature** in that small local receptive field.
*   Example: A $4 \times 4$ feature map processed by a $2 \times 2$ Max Pooling operation with a stride of 2 will result in a $2 \times 2$ feature map.
*   By keeping only the dominant features, Max Pooling eliminates low-level details, helping to achieve translation invariance.

#### C. Other Types of Pooling
*   **Mean/Average Pooling:** Calculates the **average** of the numbers within the pooling area.
*   **L2 Pooling:** Uses the L2 norm (Euclidean distance).
*   **Global Pooling Layers:** These operate on the **entire feature map**.
    *   **Global Max Pooling:** Extracts the maximum value from the entire map, resulting in a single number per map.
    *   **Global Average Pooling:** Calculates the average of all numbers in the map, resulting in a single number per map. Global Pooling is often used near the end of the CNN to **reduce overfitting** before passing data to the Fully Connected Layer.

#### D. Operation on Multi-Channel Data
If working with a volume (tensor) composed of multiple feature maps (e.g., $4 \times 4 \times 33$ feature maps), pooling is applied **individually** to each feature map. The depth/volume size remains unchanged after pooling (e.g., $4 \times 4 \times 33$ becomes $2 \times 2 \times 33$).

### IV. Advantages of Pooling

1.  **Significant Size Reduction:** Pooling dramatically shrinks the size of the feature map. This is particularly noticeable when comparing the input feature map size (e.g., $226 \times 226 \times 100$) to the pooled output size (e.g., $113 \times 113 \times 100$).
2.  **Translation Invariance:** It helps the CNN model focus on **higher-level features** and ignore low-level details, ensuring the model identifies a feature regardless of its exact position.
3.  **Enhanced Features (Specific to Max Pooling):** Because Max Pooling extracts the most dominant features, the resulting features in the map can appear **enhanced** (e.g., brighter).
4.  **No Training Required:** Since pooling is a simple aggregate function (max/average extraction), there is **no need for training** using backpropagation, making the layer fast. It is considered faster than relying on high stride values in convolution to reduce size.

### V. Disadvantages of Pooling

1.  **Loss of Information:** Pooling eliminates a considerable amount of data. When a $4 \times 4$ area is reduced to $2 \times 2$, only one number is kept out of four (25%), meaning the operation effectively **loses around 75% of the information**.
2.  **Unsuitable for Location-Dependent Tasks:** For computer vision tasks where the exact location of a feature is critical (e.g., **Image Segmentation**), the use of pooling is limited or avoided because it sacrifices location information.

In [2]:
from tensorflow import keras
from keras import layers
from keras import Sequential

# eg model
model=Sequential()
model.add(layers.Conv2D(32,(3,3),activation='relu',input_shape=(50,50,3)))
model.add(layers.MaxPool2D(pool_size=(2,2),strides=2,padding='valid'))
model.add(layers.Conv2D(32,(3,3),activation='relu'))
model.add(layers.MaxPool2D(pool_size=(2,2),strides=2,padding='valid'))
model.summary()