# 7.5 Pooling

In [2]:
import torch
from torch import nn
from d2l import torch as d2l

### 7.5.1 Maximum Pooling and Average Pooling

In [3]:
def pool2d(X, pool_size, mode='max'):
  p_h, p_w = pool_size
  Y = torch.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))

  for i in range(Y.shape[0]):
    for j in range(Y.shape[1]):
      if mode == 'max':
        Y[i, j]= X[i: i + p_h, j: j + p_w].max()
      elif mode == 'avg':
        Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
  return Y

In [4]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
pool2d(X, (2, 2))

tensor([[4., 5.],
        [7., 8.]])

In [5]:
pool2d(X, (2,2), 'avg')

tensor([[2., 3.],
        [5., 6.]])

### 7.5.2 Padding and Stride

In [6]:
X = torch.arange(16, dtype=torch.float32).reshape((1,1,4,4))
X

tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]]]])

In [7]:
pool2d = nn.MaxPool2d(3)
#Pooling has no model parameters, hence it need no initialization
pool2d(X)

tensor([[[[10.]]]])

In [8]:
#The stride and padding can be manually specified to override framework defaults if required
pool2d = nn.MaxPool2d(3, padding=1, stride=2)
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]]]])

In [9]:
#Arbitrary rectangular pooling window
pool2d = nn.MaxPool2d((2,3), stride=(2, 3), padding=(0, 1))
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]]]])

### 7.5.3 Multiple Channels

In [10]:
X = torch.cat((X, X + 1), 1)
X

tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]],

         [[ 1.,  2.,  3.,  4.],
          [ 5.,  6.,  7.,  8.],
          [ 9., 10., 11., 12.],
          [13., 14., 15., 16.]]]])

In [11]:
pool2d = nn.MaxPool2d(3, padding=1, stride=2)
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]],

         [[ 6.,  8.],
          [14., 16.]]]])

## Discussions

- In many cases our ultimate task asks some global question about the image. The units of our final layer should be sensitive to the entire input.
- The deeper we go in the network, the larger the receptive field (relative to the input) to which each hidden node is sensitive. Reducing spatial resolution accelerates this process, since the convolution kernels cover a larger effective area.
- When detecting lower-level features, we often want our representations to be somewhat invariant to translation.
- Pooling layers serve the dual purposes of mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations

7.5.1 Maximum Pooling and Average Pooling
- Pooling operators consist of a fixed-shape window that is slid over all regions in the input according to its stride, computing a single output for each location traversed by the fixed-shape window (pooling window). However,  the pooling layer contains no parameters.
- , pooling operators are deterministic, typically calculating either the maximum or the average value of the elements in the pooling window. These operations are called maximum pooling (max-pooling for short) and average pooling, respectively.
- n almost all cases, max-pooling, is preferable to average pooling.

7.5.2 Padding and Stride
- Pooling layers change the output shape
- We can adjust the operation to achieve a desired output shape by padding the input and adjusting the stride.
- Deep learning frameworks default to matching pooling window sizes and stride.

7.5.3 Multiple Channels
- When processing multi-channel input data, the pooling layer pools each input channel separately
- The number of output channels for the pooling layer is the same as the number of input channels

- A popular choice is to pick a pooling window size of 2x2 to quarter the spatial resolution of output.
