# 池化

池化有2个目的：
- 一个是解决卷积对位置敏感的问题。卷积层的输入数据，如果发生了一定的平移、或旋转，那么它的输出就会有较大的变化（featmap上也会平移或旋转），加入MaxPooling后，就能很大程度上解决这些问题。大的响应在一个窗口内都有效。
- 二是对feature map进行降采样，来减小空间尺寸，减少整个模型的复杂度。

池化类似于卷积采取滑动窗口的形式，所以来有窗口尺寸、步长、补边这3个超参数，同时我们可以选择最大池化max或平均池化mean两种方式。

不同于卷积，池化对于多通道的输入，一般都是分别池化的，所以输出的通道数与输入通道数相同。

In [2]:
import torch
from torch import nn

# 从零实现

In [24]:
from typing import Tuple
def pooling_2d(X: torch.Tensor, kernel_size: Tuple[int,int], padding: Tuple[int,int]=(0,0), stride: Tuple[int,int]=(1,1), mode: str='max'):
    nh, nw = X.shape
    kh, kw = kernel_size
    ph, pw = padding
    sh, sw = stride
    X_p = torch.zeros(nh + ph, nw + pw)# padding
    X_p[ph//2:nh+ph//2, pw//2:nw+pw//2] = X 
    Y = torch.zeros((X_p.shape[0] - kh + sh) // sh, (X_p.shape[1] - kw + sw) // sw)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                Y[i,j] = X_p[i*sh:i*sh+kh,j*sw:j*sw+kw].max()
            elif mode == 'avg':
                Y[i,j] = X_p[i*sh:i*sh+kh,j*sw:j*sw+kw].mean()
            else:
                raise ValueError('not supported mode')
    return Y

In [25]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
pooling_2d(X,[2,2])

tensor([[4., 5.],
        [7., 8.]])

In [37]:
def pooling_2d_multi_channel(X, kernel_size, padding=(0,0), stride=(1,1), mode='max'):
    return torch.stack([pooling_2d(X[i], kernel_size, padding, stride, mode) for i in range(X.shape[0])])

In [38]:
pooling_2d_multi_channel(torch.stack([X,X,X]),(2,2))

tensor([[[4., 5.],
         [7., 8.]],

        [[4., 5.],
         [7., 8.]],

        [[4., 5.],
         [7., 8.]]])

# nn.MaxPool2d

- kernel_size: the size of the window to take a max over
- stride: the stride of the window. Default value is :attr:`kernel_size`
- padding: implicit zero padding to be added on both sides
- dilation: a parameter that controls the stride of elements in the window
- return_indices: if ``True``, will return the max indices along with the outputs. Useful for :class:`torch.nn.MaxUnpool2d` later
- ceil_mode: when True, will use `ceil` instead of `floor` to compute the output shape

其中`kernel_size`、`stide`、`padding`、`dilation`都即可以是一个数或一个tuple

In [32]:
pool2d = nn.MaxPool2d(2,stride=1)
pool2d(X.reshape(1,1,X.shape[0],X.shape[1]))

tensor([[[[4., 5.],
          [7., 8.]]]])

In [39]:
pool2d(torch.stack([X,X,X]))

tensor([[[4., 5.],
         [7., 8.]],

        [[4., 5.],
         [7., 8.]],

        [[4., 5.],
         [7., 8.]]])