In [1]:
%config ZMQInteractiveShell.ast_node_interactivity = "all"
%pprint

## 二维卷积

卷积神经网络是含有卷积层的神经网络
- 我们用得最多就是二维的卷积层(有高宽两个维度)
- 多输入通道数和多输出通道数的卷积层都是在这上面进行扩展

### 卷积和互相关运算

通常来说,我们在卷积神经网络上使用的卷积运算并不是真正的卷积运算,其通常是使用互相关运算
- 卷积运算是先将卷积核上下左右翻转后,再对输入数据进行互相关运算
- 但由于深度学习中,卷积核的参数都是学习出来的,无论使用互相关运算或者是真正的卷积运算都不影响模型预测时的输出
- 卷积核其实就是一个特征提取器,运算后的输出可以看成是输入在空间维度(高和宽)上某一级的表征
- 考虑单通道输入和输出,将输入的高宽分别记为$I_h$和$I_w$,卷积核的高宽分别记为$K_h$和$K_w$,输出的的高宽分别记为$O_h$和$O_w$,对高宽进行的padding分别记为$p_h$和$p_w$,对高宽进行的stride分别记为$s_h$和$s_w$,那么对于以下几种情况的卷积输出分别为:
    - 无padding和stride: $O_h = I_h - k_h + 1$, $O_w = I_w - k_w + 1$
    - 有padding和无stride: $O_h = I_h - k_h + p_h + 1$, $O_h = I_w- k_w + p_w + 1$
    - 无padding和有stride: $(O_h = I_h - k_h)/s_h + 1$, $(O_w = I_w - k_w)/s_w + 1$
    - 有padding和有stride: $(O_h = I_h - k_h + p_h)/s_h + 1$, $(O_w = I_w - k_w + p_w)/s_w + 1$
        - 其实第四条就能包括前三条,这里只是列得仔细点,另外p指的是两边一共padding的数量,有的书是用2p(这实际上是指单边的padding数量)

通过我们会将padding设为$k-1$,这样就能获得和输入同shape的tensor(这也叫等宽卷积)
- 卷积核通常也是奇数,这样两端的padding一样,否则为偶数时,一边的padding需要向上取整,另一边padding要向下取整
- 目前多用小的卷积核(像1x1, 3x3等)

我们可以通过更深的网络结构来让感受野变得更加广阔,从而捕捉输入上更大尺寸特征

### 卷积个人实现

在二维互相关运算中(如无特殊说明,深度学习中的卷积就是指互相关运算)
- 就是卷积窗口从输入数组的最上方开始,从左到右,从上到下的顺序,依次做滑窗运算

In [2]:
import torch
import torch.nn as nn

In [3]:
def conv2d(x, k):
    """
    功能: 实现卷积操作(无padding/无stride)
    参数 x: 输入数据
    参数 k; 传入一个卷积核
    """
    # 获取卷积核的大小
    h, w = k.shape
    # 定义输出的shpe
    y = torch.rand((x.shape[0] - h + 1, x.shape[1] - w + 1))
    # 卷积运算
    for i in range(y.shape[0]):
        for j in range(y.shape[1]):
            y[i, j] = (x[i:i+h, j:j+w] * k).sum()
            
    return y

In [4]:
x = torch.arange(9).view(3, 3)
k = torch.arange(4).view(2, 2)

# 卷积运算
conv2d(x, k)

tensor([[19., 25.],
        [37., 43.]])

### 自定义卷积层

In [5]:
import torch.nn as nn

In [6]:
class Conv2D(nn.Module):
    """自定义实现卷积层"""
    def __init__(self, kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.rand(kernel_size))
        self.bias = nn.Parameter(torch.zeros(1))
        
    def forward(self, x):
        return conv2d(x, self.weight) + self.bias

卷积窗口形状为pxq的卷积层称为pxq卷积层
- 说明卷积核的高和宽分别为p和q

### 图像的物体边缘检测

用卷积层来检测图像中的物体边缘(找到像素变化的位置)

In [7]:
# 构建一个6*8的图像,中间4列为黑,其余为白
x = torch.ones(6, 8)
x[:, 2:6] = 0
x

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])

因为实际上是测相邻的元素是否有变化,因此可以定义一个1*2的卷积核[[-1, 1]],只要相邻两行出现变化,卷积计算出来的就不为0,如果没有变化就为0

#### 自定义卷积核的方式

In [8]:
k = torch.tensor([[-1, 1]])
y = conv2d(x, k)
y

tensor([[ 0., -1.,  0.,  0.,  0.,  1.,  0.],
        [ 0., -1.,  0.,  0.,  0.,  1.,  0.],
        [ 0., -1.,  0.,  0.,  0.,  1.,  0.],
        [ 0., -1.,  0.,  0.,  0.,  1.,  0.],
        [ 0., -1.,  0.,  0.,  0.,  1.,  0.],
        [ 0., -1.,  0.,  0.,  0.,  1.,  0.]])

卷积层可以通过重复使用卷积核有效地表征局部空间

像
```
[[1, 0, -1
1, 0, -1
1, 0, -1]] 叫做垂直边缘过滤器，其不但能检测边缘，也能区别明暗

[[-1, 0, 1
-1, 0, 1
-1, 0, 1]] 叫做水平边缘过滤器

[[1, 0, -1
2, 0, -2
1, 0, -1]] 叫做sobel过滤器，有更强的鲁棒性

[[3, 0, -3
10, 0, -10
3， 0， -3]] 叫做scharr过滤器
```

#### 网络训练

In [9]:
import sys
sys.path.append("../d2l_func/")
from optim import sgd
from sqdm import sqdm

In [10]:
def squared_loss(y_pred, y):
    return ((y_pred - y)**2).sum()

In [11]:
model  = Conv2D(k.shape)
loss = squared_loss
epoch_num = 100
lr = 0.01
weight_decay = 0

process_bar = sqdm()
for epoch in range(epoch_num):
    print(f"Epoch [{epoch+1}/{epoch_num}]")
    y_pred = model(x)
    l = loss(y_pred, y)
    l.backward()
    
    sgd([model.weight, model.bias], lr=lr, weight_decay=weight_decay)
#     _ = model.weight.grad.data.zero_()
#     _ = model.bias.grad.data.zero_()
    _ = model.weight.grad.fill_(0)
    _ = model.bias.grad.fill_(0)
    
    process_bar.show_process(data_num=1, batch_size=1, train_loss=l.item())
    print("\n")

Epoch [1/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 41.5285, train_score: -, test_loss: -, test_score: -

Epoch [2/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 16.2697, train_score: -, test_loss: -, test_score: -

Epoch [3/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 11.2531, train_score: -, test_loss: -, test_score: -

Epoch [4/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 8.3722, train_score: -, test_loss: -, test_score: -

Epoch [5/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 6.2939, train_score: -, test_loss: -, test_score: -

Epoch [6/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 4.7535, train_score: -, test_loss: -, test_score: -

Epoch [7/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 3.6036, train_score: -, test_loss: -, test_score: -

Epoch [8/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 2.7406, train_score: -, test_loss: -, test_score: -

Epoch [9/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - 

1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [70/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [71/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [72/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [73/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [74/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [75/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [76/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_loss: 0.0000, train_score: -, test_loss: -, test_score: -

Epoch [77/100]
1/1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] - train_los

In [12]:
# result
model.weight
model.bias

Parameter containing:
tensor([[-1.0000,  1.0000]], requires_grad=True)

Parameter containing:
tensor([-5.9854e-14], requires_grad=True)

结果和自定义的[[-1, 1]]一样

### 带padding卷积的个人实现

In [13]:
import numpy as np

In [14]:
def conv2d_padding(x, k, padding=0):
    """
    function: 实现带padding的卷积
    params x: 输入张量
    params k: 巻积核
    params padding: 传入padding的元组(h, w),高padding多少,宽padding多少,
                    如果为整数就高宽一样, padding是指两端都填充相同数量的零
    """
    assert padding >= 0
    if padding == 0:
        return conv2d(x, k)
    else:
        if isinstance(padding, int):
            h = w = padding
        else:
            h, w = padding
            
        x = x.numpy()
        # 前一个元组是增加上下（列），后一个元组是增加左右（宽）
        x = torch.from_numpy(np.pad(x, ((h, h), (w, w))))
        return conv2d(x, k)

In [15]:
# 等宽巻积
x = torch.rand(8, 8)
k = torch.rand(3, 3)

conv2d_padding(x, k, padding=1).shape

torch.Size([8, 8])

In [16]:
# 和pytorch自带Conv2d的结果对比
x = torch.arange(9.).view(3, 3)
k = torch.arange(4.).view(2, 2)
conv2d_padding(x, k, padding=1)

tensor([[ 0.,  3.,  8.,  4.],
        [ 9., 19., 25., 10.],
        [21., 37., 43., 16.],
        [ 6.,  7.,  8.,  0.]])

In [17]:
# 使用pytorch实现
def compute_conv2d(x, k):
    # pytorch实现的是四维的卷积核，先为x添加维度
    x = x.view((1, 1) + x.shape)
    result = k(x)
    return result.view(result.shape[2:])

In [18]:
conv = nn.Conv2d(in_channels=1, out_channels=2, kernel_size=2, padding=1)
conv.weight.data = k.view((1,1) + k.shape)
conv.bias.data = torch.zeros(1)

compute_conv2d(x, conv)

tensor([[ 0.,  3.,  8.,  4.],
        [ 9., 19., 25., 10.],
        [21., 37., 43., 16.],
        [ 6.,  7.,  8.,  0.]], grad_fn=<ViewBackward>)

### 带stride的巻积个人实现

In [19]:
def conv2d_padding_stride(x, k, padding=0, stride=1):
    """
    function: 实现带padding和stride的巻积
    params x: 输入张量
    params k: 巻积核
    params padding: 传入padding的元组(h, w),指两端填充一样的值
    params stride: 默认为1, 传入(h, w)或者整数
    """
    if stride == 1:
        return conv2d_padding(x, k, padding)
    else:
        kh, kw = k.shape
        if isinstance(padding, int):
            ph = pw = padding
        else:
            ph, pw = padding
        
        if isinstance(stride, int):
            sh = sw = stride
        else:
            sh, sw = stride
        
        y = torch.zeros((x.shape[0]-kh+2*ph+sh)//sh, 
                        (x.shape[1]-kw+2*pw+sw)//sw)
        x = torch.from_numpy(np.pad(x.numpy(), ((ph, ph), (pw, pw))))
        for i in range(y.shape[0]):
            for j in range(y.shape[1]):
                y[i, j] = (x[(i*sh):(i*sh+kh), (j*sw):(j*sw+kw)] * k).sum()
                
        return y

In [20]:
# 测试自定义和使用nn实现的结果
test_x = torch.rand(8, 8)
test_k = torch.rand(3, 3)
conv2d_padding_stride(test_x, test_k, 1, 2)
conv2d_padding_stride(test_x, test_k, 1, 2).shape


conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1, stride=2)
conv.weight.data = test_k.view((1, 1) + test_k.shape)
conv.bias.data = torch.zeros(1)
compute_conv2d(test_x, conv)

tensor([[0.4132, 0.4866, 1.4444, 0.9264],
        [0.6863, 1.1458, 2.2513, 1.7484],
        [1.1097, 1.0538, 1.7666, 1.6823],
        [1.2972, 1.3523, 1.7586, 1.7562]])

torch.Size([4, 4])

tensor([[0.4132, 0.4866, 1.4444, 0.9264],
        [0.6863, 1.1458, 2.2513, 1.7484],
        [1.1097, 1.0538, 1.7666, 1.6823],
        [1.2972, 1.3523, 1.7586, 1.7562]], grad_fn=<ViewBackward>)

In [21]:
# 测试自定义和使用nn实现的结果(复杂一点)
test_x = torch.rand(8, 8)
test_k = torch.rand(3, 5)
padding = (0, 1)
stride = (3, 4)
conv2d_padding_stride(test_x, test_k, padding, stride)
conv2d_padding_stride(test_x, test_k, padding, stride).shape


conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=padding, stride=stride)
conv.weight.data = test_k.view((1, 1) + test_k.shape)
conv.bias.data = torch.zeros(1)
compute_conv2d(test_x, conv)

tensor([[3.3842, 3.7897],
        [2.6026, 4.7549]])

torch.Size([2, 2])

tensor([[3.3842, 3.7897],
        [2.6026, 4.7549]], grad_fn=<ViewBackward>)

## 多通道

### 多通道输入

考虑多通道输入和多通道输出，正如pytorch一样，卷积核默认是四维的
- 当输入数据是多通道的时候，我们要构造一个输入通道数与输入数据通道数相同的卷积核

In [22]:
def conv2d_multi_in(x, k, padding=0, stride=1):
    """
    function: 实现多通道的卷积运算，单通道输入
    params x: 多通道的数据
    params k: 多通道的卷积核
    """
    result = conv2d_padding_stride(x[0], k[0], padding, stride)
    for i in range(1, x.shape[0]):
        result += conv2d_padding_stride(x[i], k[i], padding, stride)
        
    return result

In [23]:
x1, x2 = torch.arange(9).view(3, 3), torch.arange(1, 10).view(3, 3)
k1, k2 = torch.arange(4).view(2, 2), torch.arange(1, 5).view(2, 2)
X, K = torch.stack((x1, x2)), torch.stack((k1, k2))
X
K

conv2d_multi_in(X, K)

tensor([[[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]],

        [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

tensor([[[0, 1],
         [2, 3]],

        [[1, 2],
         [3, 4]]])

tensor([[ 56.,  72.],
        [104., 120.]])

### 多通道输出

当有多输入通道的时候，我们实际上做了累计，所以输出通道都是0，如果想拿到多输出通道就对每个输出通道分别创建满足3维度的卷积核

In [24]:
def conv2d_multi_in_out(X, K):
    """
    function: 实现多通道输出，k是四维的卷积核
    """
    result = torch.stack([conv2d_multi_in(X, k) for k in K])
    return result

In [25]:
test_k = torch.stack((K, K+1, K+2))
test_k.shape

conv2d_multi_in_out(X, test_k)

torch.Size([3, 2, 2, 2])

tensor([[[ 56.,  72.],
         [104., 120.]],

        [[ 76., 100.],
         [148., 172.]],

        [[ 96., 128.],
         [192., 224.]]])

## 1x1卷积

因为使用了最小的窗口，所以1x1卷积失去了可以识别高和宽相邻元素构成模式的功能
- 1x1卷积的主要计算发生在通道维上
- 如果把高宽元素当成样本，通道维当作是特征，那么1x1卷积作用就等价于全连接

In [26]:
def conv2d_multi_in_out_1x1(x, k):
    """
    function：实现1x1卷积
    params x: 是一个多通道的输入(3维度)
    params k：是一个多通道的1x1卷积核(4维度)
    """
    # 将高宽的元素认为是输入数据，将通道认为是特征，进行拉长
    x = x.view(3, -1)
    # 将两个通道的数据拼接起来
    x = torch.stack((x, x))
    # 将卷积核的数据按照mm转化
    k = k.view(2, 1, 3)
    # 批量的矩阵相乘
    result = torch.bmm(k, x)
    return result.view(2, 3, 3)

In [27]:
# 验证1x1卷积和多通道输入输出卷积运算是否一样
x = torch.arange(27).view(3, 3, 3)
k = torch.arange(6).view(2, 3, 1, 1)
conv2d_multi_in_out_1x1(x, k)

conv2d_multi_in_out(x, k)

tensor([[[ 45,  48,  51],
         [ 54,  57,  60],
         [ 63,  66,  69]],

        [[126, 138, 150],
         [162, 174, 186],
         [198, 210, 222]]])

tensor([[[ 45.,  48.,  51.],
         [ 54.,  57.,  60.],
         [ 63.,  66.,  69.]],

        [[126., 138., 150.],
         [162., 174., 186.],
         [198., 210., 222.]]])

小结：
- 使用多通道可以拓展卷积层的模型参数
- 假设将通道当作特征维，将高和宽维度上的元素当成数据样本，那么1x1卷积作用和全连接一样
- 1x1卷积通常用来调整网络层之间的通道数，并控制模型复杂度

## 池化层

在实际图像中，我们感兴趣的物体不会总是出现在固定的位置
- 池化： 缓解卷积层对位置的过度敏感性
- 通常用最大池化和平均池化
- 在pytorch中，池化的卷积核为(3, 3)，那么stride也为(3, 3)
    - 只要卷积层识别模型在高和宽上移动不超过一个元素，依然可以将它检测出来

### 池化个人实现

In [28]:
# 这里实现的是默认带stride, 即卷积核多大，stride就多大
def pool2d(x, kernel_size, mode="max"):
    """
    function: 实现池化
    params x: 输入数据
    params kernel_size: 卷积核的size
    params mode: mode为max 或者 mean
    """
    x = x.float()
    if isinstance(kernel_size, int):
        h = w = kernel_size
    else:
        h, w = kernel_size
    y = torch.zeros(x.shape[0]//h, x.shape[1]//w)
    
    for i in range(y.shape[0]):
        for j in range(y.shape[1]):
            if mode.lower() == "max":
                y[i, j] = x[(i*h):(i*h+h), (j*w):(j*w+w)].max()
            else:
                y[i, j] = x[(i*h):(i*h+h), (j*w):(j*w+w)].mean()
                
    return y

In [38]:
x = torch.arange(9.).view(3, 3)
pool2d(x, (2, 2))
pool2d(x, (2, 2), mode="mean")

pool = nn.MaxPool2d(kernel_size=2)
pool(x.view((1, 1)+x.shape))

pool = nn.AvgPool2d(kernel_size=2)
pool(x.view((1, 1)+x.shape))

tensor([[4.]])

tensor([[2.]])

tensor([[[[4.]]]])

tensor([[[[2.]]]])

### 自由改变padding和stride实现

In [30]:
def pool2d_padding_stride(x, kernel_size, padding=0, stride=None, mode="max"):
    
    if isinstance(kernel_size, int):
        kh = kw = kernel_size
    else:
        kh, kw = kernel_size
        
    if stride == None:
        sh, sw = kh, kw
    else:
        if isinstance(stride, int):
            sh = sw = stride
        else:
            sh, sw = stride
        
    if padding == 0:
        return pool2d(x, kernel_size)
    else:
        if isinstance(padding, int):
            ph = pw = padding
        else:
            ph, pw = padding
            
    y = torch.zeros((x.shape[0]-kh+2*ph+sh)//sh, (x.shape[1]-kw+2*pw+sw)//sw)
    x = torch.from_numpy(np.pad(x.numpy(), ((ph, ph), (pw, pw))))
    
    for i in range(y.shape[0]):
        for j in range(y.shape[1]):
            if mode.lower() == "max":
                y[i, j] = x[(i*sh):(i*sh+kh), (j*sw):(j*sw+kw)].max()
            else:
                y[i, j] = x[(i*sh):(i*sh+kh), (j*sw):(j*sw+kw)].mean()
                
    return y

In [31]:
x = torch.arange(16.).view(4, 4)
pool2d_padding_stride(x, kernel_size=3)
pool2d_padding_stride(x, kernel_size=(3, 3), padding=1, stride=2)
pool2d_padding_stride(x, kernel_size=(2, 4), padding=(1, 2), stride=(2, 3))

tensor([[10.]])

tensor([[ 5.,  7.],
        [13., 15.]])

tensor([[ 1.,  3.],
        [ 9., 11.],
        [13., 15.]])

### nn池化实现

池化层可以通过padding和stride来改变输出的shape
- 在`nn.MaxPool2d`中，默认stride和池化窗口形状是一样的

In [32]:
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2, 4), 
                 padding=(1, 2), stride=(2, 3))
conv.weight.data = torch.arange(8.).view(1, 1, 2, 4)
conv.bias.data = torch.zeros(1)
conv(x.view((1, 1)+x.shape))

tensor([[[[  7.,  32.],
          [134., 172.],
          [ 63.,  44.]]]], grad_fn=<MkldnnConvolutionBackward>)

In [33]:
pool = nn.MaxPool2d(kernel_size=3)
pool(x.view((1, 1)+x.shape))

pool = nn.MaxPool2d(kernel_size=3, padding=1, stride=2)
pool(x.view((1, 1)+x.shape))

pool = nn.MaxPool2d(kernel_size=(2, 4), padding=(1, 2), stride=(2, 3))
pool(x.view((1, 1)+x.shape))

tensor([[[[10.]]]])

tensor([[[[ 5.,  7.],
          [13., 15.]]]])

tensor([[[[ 1.,  3.],
          [ 9., 11.],
          [13., 15.]]]])

### 多通道

池化层通道的情况不像卷积层那样需要把各通道的结果相加
- 输入通道为多少，输出通道就为多少

In [55]:
def pool2d_multi_in_out(x, kernel_size, padding, stride=None, mode="max"):
    result = [pool2d_padding_stride(x[k], kernel_size[1:], padding, stride, 
                                   mode) for k in range(kernel_size[0])]
    return torch.stack(result)

In [64]:
x = torch.arange(16.).view(4, 4)
X = torch.stack((x, x+1, x+2))

pool2d_multi_in_out(X, (3, 2, 2), padding=1, stride=2)
pool2d_multi_in_out(X, (3, 2, 2), padding=1, stride=2).shape

tensor([[[ 0.,  2.,  3.],
         [ 8., 10., 11.],
         [12., 14., 15.]],

        [[ 1.,  3.,  4.],
         [ 9., 11., 12.],
         [13., 15., 16.]],

        [[ 2.,  4.,  5.],
         [10., 12., 13.],
         [14., 16., 17.]]])

torch.Size([3, 3, 3])

In [65]:
pool = nn.MaxPool2d(kernel_size=2, padding=1, stride=2)
pool(X)

tensor([[[ 0.,  2.,  3.],
         [ 8., 10., 11.],
         [12., 14., 15.]],

        [[ 1.,  3.,  4.],
         [ 9., 11., 12.],
         [13., 15., 16.]],

        [[ 2.,  4.,  5.],
         [10., 12., 13.],
         [14., 16., 17.]]])

小结：
- 最大池化和平均池化分别对池化窗口中输入元素中的最大值和平均值作为输出
- 池化层的一个主要作用是缓解卷积层对位置的过度敏感性