# 卷积

Eg. 分类猫和狗的图片

- 12M相机，则对应的RGB为有36M元素
- 对应100大小的单隐藏层，则有$3.6Billion\sim 14GB$的元素
- 实际上只有900M狗，600M猫

**两个原则**

- 平移不变性
- 局部性

******

**从二维全连接层到卷积层**

全路径从一维扩展到二维：

$$h_{i,j}=\sum_{k,l}w_{i,j,k,l}x_{k,l}=\sum_{a,b}v_{i,j,a,b}x_{i+a,j+b}$$

其中：$v_{i,j,k,l}=w_{i,j,i+a,j+b}$

**平移不变性**：$v_{i,j,k,l}=v_{a,b}$

$$h_{i,j}=\sum_{a,b}v_{a,b}x_{i+a,i+b}\qquad\text{二维交叉卷积}$$

**局部性**：$|a|,|b|<\Delta$

$$h_{i,j}=\sum_{a=-\Delta}^{\Delta}\sum_{b=-\Delta}^{\Delta}v_{a,b}x_{i+a,i+b}$$

******

**卷积层**

- 输入X：$n_h\times n_w$
- 核W：$k_h\times k_w$
- 偏差$b\in R$
- 输出Y：$(n_h-k_h+1)\times(n_w-k_w+1)$

$$Y=X\cdot W+b$$

In [31]:
import torch

def corr2d(X, K):
    h, w = K.shape
    Y = torch.zeros(size=(X.shape[0]-h+1,X.shape[1]-w+1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i+h, j:j+w] * K).sum()
    return Y

In [32]:
X = torch.arange(9, dtype=torch.float32).reshape(shape=(3, 3))
K = torch.arange(4, dtype=torch.float32).reshape(shape=(2, 2))
corr2d(X, K)

tensor([[19., 25.],
        [37., 43.]])

In [33]:
""" 卷积层 """
class Conv2D(torch.nn.Module):
    def __init__(self, kernel_size):
        super().__init__()
        self.weight = torch.nn.Paramter(torch.rand(kernel_size))
        self.bias = torch.nn.Parameter(torch.zeros(1))

    def forward(self, X):
        return corr2d(X, self.weight) + self.bias

## 应用

In [34]:
""" 假定[1, 1]是边缘 """
X = torch.ones(size = (6, 8))
X[:, 2:6] = 0
X

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])

In [35]:
""" 使用K检测 """
K = torch.tensor([[1.0, -1.0]])
Y = corr2d(X, K)
Y

tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])

In [36]:
""" 只能检测垂直边缘 """
corr2d(X.T, K)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [37]:
""" 学习卷积和 K """
conv2D = torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(1, 2), bias=False)

"""增加维度"""
# 批量大小 通道数 高 宽
X = X.reshape(shape=[1, 1, 6, 8])
Y = Y.reshape(shape=[1, 1, 6, 7])

for i in range(20):
    Y_hat = conv2D(X)
    l = (Y_hat - Y) ** 2
    conv2D.zero_grad()
    l.sum().backward()
    conv2D.weight.data[:] -= 3e-2 * conv2D.weight.grad
    print(f"batch:{i+1}, loss:{l.sum():.6f}")

batch:1, loss:9.853525
batch:2, loss:4.652109
batch:3, loss:2.299811
batch:4, loss:1.194359
batch:5, loss:0.650718
batch:6, loss:0.369899
batch:7, loss:0.217664
batch:8, loss:0.131494
batch:9, loss:0.080956
batch:10, loss:0.050501
batch:11, loss:0.031784
batch:12, loss:0.020122
batch:13, loss:0.012788
batch:14, loss:0.008147
batch:15, loss:0.005199
batch:16, loss:0.003321
batch:17, loss:0.002123
batch:18, loss:0.001358
batch:19, loss:0.000869
batch:20, loss:0.000556


In [38]:
conv2D.weight.data

tensor([[[[ 0.9975, -1.0023]]]])

In [39]:
K

tensor([[ 1., -1.]])

## 填充和步幅

**使用卷积核会导致输出变小**

### 填充

**填充额外的行和列（补0）**

输出形状：$(n_h-k_h+p_h+1)\times(n_w-k_w+p_w+1)$

- 取填充 $p_h=k_h-1\quad p_w=k_w-1$ 此时输出为$n_h\times n_w$
    - $k_h$为奇数：上下填充$\frac{p_h}{2}$
    - $k_h$为偶数：上下填充$[\frac{p_h}{2}\pm 1]$


### 步幅

- 步幅是指行列滑动的步长(加快维度下降)
    - $\lfloor(n_h-k_h+p_h+s_h)/s_h\rfloor \lfloor(n_w-k_w+p_w+s_w)/s_w\rfloor$

In [40]:
def comp_conv2d(conv2d, X):
    X = X.reshape((1, 1) + X.shape)
    Y = conv2d(X)
    return Y.reshape(Y.shape[2:])

conv2d = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1)
X = torch.randn(size=(8, 8))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

In [41]:
conv2d = torch.nn.Conv2d(1, 1, kernel_size=(5, 3), padding=(2, 1))
X = torch.randn(size=(8, 8))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

In [42]:
conv2d = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1, stride=2)
X = torch.randn(size=(8, 8))
comp_conv2d(conv2d, X).shape

torch.Size([4, 4])

In [43]:
conv2d = torch.nn.Conv2d(
    in_channels = 1, 
    out_channels = 1,
    kernel_size = (3, 5), 
    padding= (0, 1), 
    stride=(3, 4)) # 存在向下取整
X = torch.randn(size=(8, 8))
comp_conv2d(conv2d, X).shape

torch.Size([2, 2])

- 希望最后最大 $7\times 7$
- 机器学习是极端的压缩算法

## 多通道

- 彩色图片有可能是 RGB 三个通道
- 转换成灰度会丢失信息


**多输入通道**：

- 输入X：$c_i\times n_h\times n_w$
- 核W：$c_i\times k_h\times k_w$
- 输出Y：$m_h\times m_w$

$$Y=\sum_{i=0}^{c_i}X_{i,:,:}\cdot W_{i,:,:}$$

**多输出通道**：

对每一个输入通道都有$c_o$个卷积核

- 输入X：$c_i\times n_h\times n_w$
- 核W：$c_o\times c_i\times k_h\times k_w$
- 输出Y：$c_o\times m_h\times m_w$

$$Y_{i,:,:}=X\cdot W_{i,:,:,:}$$

**通道？**

- 每一个输出通道可以识别特定的模式（识别局部特征）
- 输入通道识别并组合输入的模式（组合局部特征）

**$1\times 1$卷积核**

$k_h=k_w=1$卷积核**不考虑空间结构**，而是融合不同的通道。

*相当于输入为 $n_hn_w\times c_i$, 权重为 $c_o\times c_i$ 的全连接层*

******

### 二维卷积层

- 输入X：$c_i\times n_h\times n_w$
- 核W：$c_o\times c_i\times k_h\times k_w$
- 偏差B：$c_o\times c_i$
- 输出Y：$c_o\times m_h\times m_w$

**计算复杂度**：$O(c_ic_ok_hk_wm_hm_w)\quad FLOP$

> $CPU\sim 0.15GFLOP$

In [44]:
def corr2d_multi_in(X, K): # X-3D K-4D
    return sum(corr2d(x, k) for x, k in zip(X, K))

X = torch.arange(18, dtype=torch.float32).reshape(shape=(2, 3, 3))
K = torch.arange(8, dtype=torch.float32).reshape(shape=(2, 2, 2))
corr2d_multi_in(X, K)

tensor([[268., 296.],
        [352., 380.]])

In [45]:
def corr2d_multi_out(X, K): # X-3D K-4D
    return torch.stack([corr2d_multi_in(X, k) for k in K], 0)

K = torch.stack(tensors=(K, K+1, K+2), dim=0)
K.shape

torch.Size([3, 2, 2, 2])

In [46]:
corr2d_multi_out(X, K)

tensor([[[268., 296.],
         [352., 380.]],

        [[320., 356.],
         [428., 464.]],

        [[372., 416.],
         [504., 548.]]])

In [47]:
""" 1x1 Conv2D"""

def corr2d_multi_in_out_1x1(X, K):
    c_i, h, w = X.shape
    c_o = K.shape[0]
    X = X.reshape(c_i, h*w)
    K = K.reshape(c_o, c_i) # kh*ki=1*1=1
    Y = torch.matmul(K, X)
    return Y.reshape(c_o, h, w)

X = torch.normal(mean=0, std=1, size=(3, 3, 3))
K = torch.normal(mean=0, std=1, size=(2, 3, 1, 1))

Y1 = corr2d_multi_in_out_1x1(X, K)
Y2 = corr2d_multi_out(X, K)

assert float(torch.abs(Y1 - Y2).sum()) < 1e6

- 一般输入输出高度宽度不变，通道数不变，但是宽高减半，通道数一般加倍
- 一般不同通道的卷积核大小相同

## 池化层

**卷积对位置是十分敏感的**：一个像素的偏移就有可能导致 0 输出

*需要具备一定的平移不变性*

### 二维最大池化

返回移动窗口的最大值

$$out = max_{1,2,3,4}(x_i)$$

### 平均池化层

返回移动窗口的平均值

**池化层可以容忍一定范围的像素偏移**

- 有填充和步幅
- 没有可以学习的参数
- 每一个通道应该都有一个池化
- 输出通道数 = 输入通道数

> 通常在卷积层后，缓解卷积层的位置敏感性

In [48]:
def pool2d(X, pool_size, mode="max"):
    p_h, p_w = pool_size
    Y = torch.zeros((X.shape[0]-p_h+1, X.shape[1]-p_w+1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == "max":
                Y[i, j] = X[i:i+p_h, j:j+p_w].max()
            elif mode == "mean":
                Y[i, j] = X[i:i+p_h, j:j+p_w].mean()
    return Y

In [49]:
X = torch.arange(9, dtype=torch.float32).reshape(3, 3)
pool2d(X, (2, 2))

tensor([[4., 5.],
        [7., 8.]])

In [50]:
pool2d(X, (2, 2), 'mean')

tensor([[2., 3.],
        [5., 6.]])

In [51]:
X = torch.arange(16, dtype=torch.float32).reshape((1, 1, 4, 4))
pool2d = torch.nn.MaxPool2d(kernel_size=3)
X, pool2d(X) # pytorch 默认步幅等于kernel_size

(tensor([[[[ 0.,  1.,  2.,  3.],
           [ 4.,  5.,  6.,  7.],
           [ 8.,  9., 10., 11.],
           [12., 13., 14., 15.]]]]),
 tensor([[[[10.]]]]))

In [52]:
pool2d = torch.nn.MaxPool2d(kernel_size=3, 
                            padding=1,
                            stride=3)
X, pool2d(X)

(tensor([[[[ 0.,  1.,  2.,  3.],
           [ 4.,  5.,  6.,  7.],
           [ 8.,  9., 10., 11.],
           [12., 13., 14., 15.]]]]),
 tensor([[[[ 5.,  7.],
           [13., 15.]]]]))

In [53]:
""" 多通道 """
X = torch.cat(tensors=(X, X+1), dim=1)
# cat 保存维度，stack 创建新的维度
X.shape

torch.Size([1, 2, 4, 4])

In [54]:
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]],

         [[ 6.,  8.],
          [14., 16.]]]])

- 保证性能，不用python自带的函数，使用list
- 池化层用的越来越少？
    - 目的是 1. 减少偏移影响 2. 加快维度下降
    - 由于数据增强，加扰，所以本身卷积的过拟合不重，不太需要池化