# Pooling layers
- goal is to *subsample* (shrink) the input image in order to reduce the computational load, memory usage, and the number of parameters (to limit risk of overfitting)
    - while conv. net's goal is to recognize patterns in images
- Each neuron in a pooling layer is connected to a limited num. of neurons in the prev layer, must have size, string, and padding type defined just like conv. layers
- Difference from conv. layer: Pooling neuron has no weights, all it does is aggregate inputs such as using max or mean functions

## Max pooling:
- max pooling layer: most common type of pooling layer
- pooling kernel - *does not have weights, just stateless sliding windows*
- Only the max input value in each receptive field makes it to the next layer, the others are dropped
- <img src="images/CNNPoolingLayers.jpeg" width=400/> pooling layer
    - (bottom left) only the max value of the receptive field is propagated onto the next layer
    - stride of 2 will make the output image (top layer) have half the height and width as the previous layer
- introduces some levels of *invariance*
    - Makes the system invariant to translatation, it can still detect objects even if they are moved to different places in the image
    - Useful in tasks where the ouput does not depend on the position such as classification
    - **max pooling introduces a a large amount of translational invariance, a small amount of rotational invariance and a slign scale invariance**
- downsides:
    - information is lost
        - e.g. a 2x2 kernel (receptive field?) and a stride of 2 will out a 75% loss of input values
    - invariance is not desirable in some cases
        - e.g. semantic segmentation (classifying each pixel in an image acording to the objet that pixel belongs to
        - In this case, *equivariance* (a small change to the inputs should lead to a corresponding small change in the output) is the goal

### Tensorflow Implementation

In [10]:
from tensorflow import keras

max_pool = keras.layers.MaxPool2D(pool_size=2) #pool_size is like receptive field. Here it is a 2x2 window to pick the max value from. divides each 
# spatial dimension by a factor of two
average_pool = keras.layers.AveragePooling2D(pool_size=2) # computes mean rather than max

Might be surprising but max pooling generally performs than avg pooling better because it only keeps the most important information, has strongers translation invariance, and slightly fewer computations

#### Depth-wise max pooling layer
- allows the CNN to be invariant to various features (e.g. rotation of a number) and ensure that the output of the same regardless of the rotation
- <img src="images/CNNDepthwiseMaxPool.jpeg" width=400/> figure
- Keras has no depthwise max pooling layer, but TF does

In [6]:
import tensorflow as tf
from sklearn.datasets import load_sample_image
import numpy as np

china = load_sample_image("china.jpg") / 255
flower = load_sample_image("flower.jpg") / 255
images = np.array([china, flower])

# first three numbers of four-tuples should be one so kernel size and stride along batch, height, and width dimensions are 1
# last value can be any kernels size and stride along the **depth** dimension and must be a be a divisor of the input depth (no. feature maps)
output = tf.nn.max_pool(images, ksize=(1, 1, 1, 3), strides=(1, 1, 1, 3), padding="VALID")

In [8]:
depth_pool = keras.layers.Lambda(lambda X: tf.nn.max_pool(X, ksize=(1, 1, 1, 3), strides=(1, 1, 1, 3), padding="VALID"))

## Global Average Pooling Layer
- computes the mean of the entire feature map and outputs a single number. It's percetive field is the size of the feature map.
    - outputs a single number per feature map and per instance
- much information is lost, but it makes as a good output layer

In [7]:
global_avg_pool = keras.layers.GlobalAvgPool2D()

# <font color="orange">At this point the basic building blocks for CNNs have been covered</font>