CNN瓶颈
1. 网络能够达到的最大深度依然很浅
2. 深度网络的训练难度太大（收敛方法有进展：动量法、随机梯度下降）损失更高，精度更低

退化：网络深度加深，精度却在下降

推测：深层网络中的函数关系本质上就比浅层网络中的函数关系更复杂，更难拟合，因此深层网络本质上就比浅层网络更难优化和训练

# 残差网络（微软）
假设增加深度用的最优结构就是恒等函数y=x，利用恒等函数的性质，将用于加深网络深度的结构项更容易拟合和训练的方向设计，从根源上降低深度网络的训练难度。

类似GoogLeNet,在普通的网络结构后串联特殊的结构

残差网络使用“残差块（Residual unit）,残差单元”。在残差网络中，将众多残差单元与普通卷积层串联，以实现“在浅层网络后堆叠某种结构、以增加深度的目的”。

- y=x => y-x = 0
- H(x)-x = F(x) -> 0 ，拟合F(x)即拟合 x 和 0 的关系

要得到输出H(x) = 拟合结果F(x) + x,则残差块`并联`：
1. 跳跃链接 x
2. 卷积链接 F(x)

优势：
1. 零负担增加深度
2. 在实验上表现出比普通卷积网络容易训练的性质，且理论上能保证网络的精度
3. 残差单元能大幅增加训练速度

残差网络架构：
1. 卷积 7*7 stride2
2. 最大池化 3*3 stride2
3. 几个残差块 3*3 padding=1能保证特征图不变
4. 平均池化、线性层、softmax函数

- 18/34层：残差单元 Residual Unit
- 50/101/152层：瓶颈结构+跳跃链接 Bottlrneck

注意：
1. 残差单元每个卷积层后有BN层，第一层后有relu函数
2. 每个Layer的多个残差块共享一个特征图尺寸
3. 每个Layer的第一个残差块的第一个卷积层步长为2
4. 为了保证跳跃链接和F(x)的输出的特征图尺寸数量相同以顺利相加，要增加一个1x1卷积核控制
5. 每一个瓶颈结构各层输出分别为（middle_out,middle_out,4*middle_out）

作为归一化手段，BN层对数据的影响：

$$output = \frac{x-E[x]}{\sqrt{Varx+\varepsilon}}*\gamma+\beta$$
- gamma和beta作为需要学习的参数，若都设置为0，则任何经过BN层的数据都会为0。
- nn.BatchNorm2d中beta默认为0，则设置gamma就可以了
- 将残差块（残差单元/瓶颈架构）中最后一个卷积层后的BN层上的gamma设置为0，就可以让F(x)的输出结果为0。

In [1]:
# 复现

In [2]:
# basicconv +BN +relu (conv3x3,conv1x1)
# Residual Unit,Bottlrneck

In [73]:
import torch
from torch import nn
from typing import Type,Union,List,Optional
from torchinfo import summary

In [4]:
def conv3x3(in_,out_,stride=1,initialzero = False):
    bn = nn.BatchNorm2d(out_)
    # 需要进行判断：要对BN进行0初始化吗？
    # 最后一层就初始化，不是最后一层就不改变gamma和beta
    if initialzero == True:
        nn.init.constant_(bn.weight,0)
    return nn.Sequential(nn.Conv2d(in_,out_
                            ,kernel_size=3,padding=1,stride= stride
                            ,bias=False)
                         ,bn)

In [5]:
conv3x3(2,10)

Sequential(
  (0): Conv2d(2, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

In [6]:
def conv1x1(in_,out_,stride=1,initialzero = False):
    bn = nn.BatchNorm2d(out_)
    # 需要进行判断：要对BN进行0初始化吗？
    # 最后一层就初始化，不是最后一层就不改变gamma和beta
    if initialzero == True:
        nn.init.constant_(bn.weight,0)
    return nn.Sequential(nn.Conv2d(in_,out_
                            ,kernel_size=1,padding=0,stride= stride
                            ,bias=False)
                         ,bn)

In [7]:
conv1x1(2,10,1,True)[1].weight

Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)

In [48]:
class ResidualUnit(nn.Module):
    # 这是残差单元类
    # stride1是否等于2呢？如果等于2 - 特征图尺寸会发生变化
    # 需要在跳跃链接上增加1x1卷积层来调整特征图尺寸
    # 如果stride1等于1，则什么也不需要做
    def __init__(self,out_:int
                 ,stride1: int = 1
                 ,in_:Optional[int] = None
                ):
        super().__init__()
        
        self.stride1 = stride1
        
        # 当特征图尺寸需要缩小时，卷积层的输出特征图数量out_等于输入特征图数量in_的2倍
        # 当特征图尺寸不需要缩小时，out_ == in_
        if stride1 != 1:
            in_ = int(out_ / 2)
        else:
            in_ = out_
        
        # 拟合部分，输出F(x)
        self.fit_ = nn.Sequential(conv3x3(in_,out_,stride=stride1)
                                 ,nn.ReLU(inplace=True)
                                 ,conv3x3(out_,out_,initialzero=True)
                                 )
        # 跳跃链接，输出 x (1x1卷积核之后的 x)
        self.skipconv = conv1x1(in_,out_,stride=stride1)
        
        # 单独定义放在H(x)之后来使用的激活函数ReLU
        self.relu = nn.ReLU(inplace=True)
        
    def forward(self,x):
        fx = self.fit_(x) # 拟合结果
        if self.stride1 != 1:
            x = self.skipconv(x) # 跳跃链接
#         x = x # 跳跃链接
        hx = self.relu(fx + x)
        return hx

In [9]:
data = torch.ones(10,64,56,56)

In [10]:
# 0号残差单元 - 需要特征图折半，特征图数量加倍
conv3_x_18_0 = ResidualUnit(out_ = 128,stride1 = 2)

In [11]:
conv3_x_18_0(data).shape

torch.Size([10, 128, 28, 28])

In [12]:
conv2_x_18_0 = ResidualUnit(out_ = 64)

In [13]:
conv2_x_18_0(data).shape

torch.Size([10, 64, 56, 56])

In [28]:
class Bottleneck(nn.Module):
    def __init__(self,middle_out
                 ,stride1:int = 1
                 ,in_:Optional[int]=None):
        super().__init__()
        
        out_ = 4 * middle_out
        
        if in_ == None:
            # 是需要将特征图尺寸缩小的场合吗？
            # conv2_x - conv3_x - conv4_x - conv5_x 相互链接的时候
            # 每次都需要将特征图尺寸折半，同时卷积层上的middle_out = 1/2 in_
            if stride1 != 1: # 缩小特征图的场合，即这个瓶颈结构是每个layers的第一个瓶颈结构
                in_ = middle_out * 2
            else: # 不缩小特征图的场合，即这个瓶颈结构是后面的重复结构
                in_ = middle_out * 4
        else:
            in_ = 64
        
        self.fit_ = nn.Sequential(conv1x1(in_,middle_out,stride=stride1)
                                 ,nn.ReLU(inplace=True)
                                 ,conv3x3(middle_out,middle_out)
                                 ,nn.ReLU(inplace=True)
                                 ,conv1x1(middle_out,out_,initialzero=True))
        
        self.skipconv = conv1x1(in_,out_,stride=stride1)
        
        self.relu = nn.ReLU(inplace=True)
        
    def forward(self,x):
        fx = self.fit_(x)
        
        # 跳跃链接
        x = self.skipconv(x)
        hx = self.relu(fx + x)
        return hx

In [38]:
# 测试
data1 = torch.ones(10,64,56,56) # conv2x的输入
conv2_x_101_0 = Bottleneck(in_=64,middle_out=64)
conv2_x_101_0(data1).shape
# 特征图尺寸不变，输出尺寸翻四倍

torch.Size([10, 256, 56, 56])

In [34]:
# 不是conv1后紧跟的第一个瓶颈结构，但是需要缩小特征图尺寸
data2 = torch.ones(10,256,56,56)
conv3_x_101_0 = Bottleneck(middle_out=128,stride1=2)
conv3_x_101_0(data2).shape # 输出翻两倍，特征图尺寸缩小一半

torch.Size([10, 512, 28, 28])

In [37]:
# 不是conv1后紧跟的第一个瓶颈结构，且不需要缩小特征图尺寸
data3 = torch.ones(10,512,28,28)
conv3_x_101_1 = Bottleneck(128)
conv3_x_101_1(data3).shape # 输出数量不变，特征图尺寸也不变

torch.Size([10, 512, 28, 28])

In [39]:
num_blocks_conv4_x = 6

In [40]:
conv4_x_50 = []

In [41]:
# 在列表中添加第0个瓶颈架构块
conv4_x_50.append(Bottleneck(middle_out = 256,stride1 = 2))

for i in range(num_blocks_conv4_x - 1):
    conv4_x_50.append(Bottleneck(middle_out=256))

In [42]:
len(conv4_x_50) # 包含了6个块，第一个块是包含步长为2的卷积层，剩下的块是重复结构

6

In [43]:
# 50-conv2
bt0 = Bottleneck(middle_out=64,in_=64)
bt1 = Bottleneck(middle_out=64)
bt2 = Bottleneck(middle_out=64)

In [46]:
# 除了第一个块，其他的用循环
layers = []
num_blocks = 6
afterconv1 = True # 是conv1之后的第一个块

if afterconv1 == True:
    layers.append(Bottleneck(middle_out=64,in_=64))
else:
    layers.append(Bottleneck(middle_out=128,stride1=2))
    
for i in range(num_blocks-1):
    layers.append(Bottleneck(middle_out=128))

In [47]:
len(layers)

6

In [49]:
# 除了第一个块，其他的用循环
layers = []
num_blocks = 6
afterconv1 = True # 是conv1之后的第一个块

if afterconv1 == True:
    layers.append(ResidualUnit(out_=64,in_=64))
else:
    layers.append(ResidualUnit(out_=128,stride1=2))
    
for i in range(num_blocks-1):
    layers.append(ResidualUnit(out_=128))

In [50]:
len(layers)

6

In [53]:
# Type只能填入类Union包括多个类[lei1,...]
def make_layers(block: Type[Union[ResidualUnit,Bottleneck]]
                ,the_out: int
                ,num_blocks: int
                ,afterconv1: bool=False):
    layers = []
    
    if afterconv1 == True:
        layers.append(block(the_out,in_=64))
    else:
        layers.append(block(the_out,stride1=2))

    for i in range(num_blocks-1):
        layers.append(block(the_out))
        
    return layers

In [54]:
layer_34_conv4_x = make_layers(ResidualUnit,
                              256,
                              6,
                              False)

In [55]:
layer_34_conv4_x

[ResidualUnit(
   (fit_): Sequential(
     (0): Sequential(
       (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
       (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     )
     (1): ReLU(inplace=True)
     (2): Sequential(
       (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
       (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     )
   )
   (skipconv): Sequential(
     (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
     (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
   )
   (relu): ReLU(inplace=True)
 ),
 ResidualUnit(
   (fit_): Sequential(
     (0): Sequential(
       (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
       (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     )
     (1): ReLU(in

In [56]:
nn.Sequential(*layer_34_conv4_x) # 相当于...[],pythonzhongde 星号解析列表/储存器

Sequential(
  (0): ResidualUnit(
    (fit_): Sequential(
      (0): Sequential(
        (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): ReLU(inplace=True)
      (2): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (skipconv): Sequential(
      (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (relu): ReLU(inplace=True)
  )
  (1): ResidualUnit(
    (fit_): Sequential(
      (0): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_

In [82]:
# Type只能填入类Union包括多个类[lei1,...]
def make_layers(block: Type[Union[ResidualUnit,Bottleneck]]
                ,the_out: int
                ,num_blocks: int
                ,afterconv1: bool=False):
    layers = []
    
    if afterconv1 == True:
        layers.append(block(the_out,in_=64))
    else:
        layers.append(block(the_out,stride1=2))

    for i in range(num_blocks-1):
        layers.append(block(the_out))
        
    return nn.Sequential(*layers)

In [83]:
# 测试
# 34层网络，conv2_x,紧跟在conv1后的首个架构
# 不缩小特征图尺寸，每层的输出都是64，3个块

In [84]:
datashape = (10,64,56,56)
conv2_x_34 = make_layers(ResidualUnit,
                              64,
                              3,
                              afterconv1=True)
summary(conv2_x_34,datashape,depth=1,device="cpu")

Layer (type:depth-idx)                   Output Shape              Param #
Sequential                               --                        --
├─ResidualUnit: 1-1                      [10, 64, 56, 56]          78,208
├─ResidualUnit: 1-2                      [10, 64, 56, 56]          78,208
├─ResidualUnit: 1-3                      [10, 64, 56, 56]          78,208
Total params: 221,952
Trainable params: 221,952
Non-trainable params: 0
Total mult-adds (G): 6.94
Input size (MB): 8.03
Forward/backward pass size (MB): 192.68
Params size (MB): 0.89
Estimated Total Size (MB): 201.59

In [85]:
conv2_x_101 = make_layers(Bottleneck,
            64,
            3,
            afterconv1=True)

In [87]:
summary(conv2_x_101,datashape,depth=3,device="cpu")

Layer (type:depth-idx)                   Output Shape              Param #
Sequential                               --                        --
├─Bottleneck: 1-1                        [10, 256, 56, 56]         --
│    └─Sequential: 2-1                   [10, 256, 56, 56]         --
│    │    └─Sequential: 3-1              [10, 64, 56, 56]          4,224
│    │    └─ReLU: 3-2                    [10, 64, 56, 56]          --
│    │    └─Sequential: 3-3              [10, 64, 56, 56]          36,992
│    │    └─ReLU: 3-4                    [10, 64, 56, 56]          --
│    │    └─Sequential: 3-5              [10, 256, 56, 56]         16,896
│    └─Sequential: 2-2                   [10, 256, 56, 56]         --
│    │    └─Conv2d: 3-6                  [10, 256, 56, 56]         16,384
│    │    └─BatchNorm2d: 3-7             [10, 256, 56, 56]         512
│    └─ReLU: 2-3                         [10, 256, 56, 56]         --
├─Bottleneck: 1-2                        [10, 256, 56, 56]         --

In [88]:
conv4_x_101 = make_layers(Bottleneck,
            256,
            23)

In [89]:
datashape = (10,512,28,28)

In [90]:
summary(conv4_x_101,datashape,depth=1,device="cpu")

Layer (type:depth-idx)                   Output Shape              Param #
Sequential                               --                        --
├─Bottleneck: 1-1                        [10, 1024, 14, 14]        1,512,448
├─Bottleneck: 1-2                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-3                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-4                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-5                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-6                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-7                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-8                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-9                        [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-10                       [10, 1024, 14, 14]        2,167,808
├─Bottleneck: 1-11                       [10, 1024, 14, 14]        2,167,808
├─Bottle

In [100]:
class ResNet(nn.Module):
    def __init__(self,block: Type[Union[ResidualUnit,Bottleneck]]
                ,layers:List[int]
                ,num_classes: int):
        super().__init__()
        
        """
        block:要使用的用来加深的基本架构是？（残差单元/瓶颈结构）
        layers:列表，每个层里具体有多少个块
        num_classes:真实标签含有多少个类别
        """
        
        # layer1:卷积+池化的组合
        self.layer1 = nn.Sequential(nn.Conv2d(3,64
                                              ,kernel_size=7,stride=2
                                              ,padding=3,bias=False)
                                   ,nn.BatchNorm2d(64)
                                   ,nn.ReLU(inplace=True)
                                   ,nn.MaxPool2d(kernel_size=3
                                                 ,stride=2
                                                 ,ceil_mode=True))
        
        # layer2 - layer5:残差块/瓶颈结构
        self.layer2_x = make_layers(block,64,layers[0],afterconv1=True)
        self.layer3_x = make_layers(block,128,layers[1])        
        self.layer4_x = make_layers(block,256,layers[2])        
        self.layer5_x = make_layers(block,512,layers[3])        
        
        # 全局平均池化
        self.avgpool = nn.AdaptiveAvgPool2d((1,1))
        
        # 分类
        if block == ResidualUnit:
            self.fc = nn.Linear(512,num_classes)
        else:
            self.fc = nn.Linear(2048,num_classes)
            
    def forward(self,x):
        x = self.layer1(x) # layer1. 普通卷积+池化的输出
        x = self.layer5_x(self.layer4_x(self.layer3_x(self.layer2_x(x))))
        x = self.avgpool(x) # 特征图尺寸1x1 (n_samples,fc,1,1)
        x = torch.flatten(x,1) # 将x拉平到1维
        x = self.fc(x)

In [101]:
datashape = (10,3,224,224) # ImageNet数据集的结构

In [102]:
res34 = ResNet(ResidualUnit,[3,4,6,3],num_classes=1000)

In [103]:
res101 = ResNet(Bottleneck,[3,4,23,3],num_classes=1000)

In [104]:
summary(res34,datashape,depth=2,device="cpu")

Layer (type:depth-idx)                        Output Shape              Param #
ResNet                                        --                        --
├─Sequential: 1-1                             [10, 64, 56, 56]          --
│    └─Conv2d: 2-1                            [10, 64, 112, 112]        9,408
│    └─BatchNorm2d: 2-2                       [10, 64, 112, 112]        128
│    └─ReLU: 2-3                              [10, 64, 112, 112]        --
│    └─MaxPool2d: 2-4                         [10, 64, 56, 56]          --
├─Sequential: 1-2                             [10, 64, 56, 56]          --
│    └─ResidualUnit: 2-5                      [10, 64, 56, 56]          78,208
│    └─ResidualUnit: 2-6                      [10, 64, 56, 56]          78,208
│    └─ResidualUnit: 2-7                      [10, 64, 56, 56]          78,208
├─Sequential: 1-3                             [10, 128, 28, 28]         --
│    └─ResidualUnit: 2-8                      [10, 128, 28, 28]         230,144

In [105]:
summary(res101,datashape,depth=2,device="cpu")

Layer (type:depth-idx)                        Output Shape              Param #
ResNet                                        --                        --
├─Sequential: 1-1                             [10, 64, 56, 56]          --
│    └─Conv2d: 2-1                            [10, 64, 112, 112]        9,408
│    └─BatchNorm2d: 2-2                       [10, 64, 112, 112]        128
│    └─ReLU: 2-3                              [10, 64, 112, 112]        --
│    └─MaxPool2d: 2-4                         [10, 64, 56, 56]          --
├─Sequential: 1-2                             [10, 256, 56, 56]         --
│    └─Bottleneck: 2-5                        [10, 256, 56, 56]         75,008
│    └─Bottleneck: 2-6                        [10, 256, 56, 56]         136,448
│    └─Bottleneck: 2-7                        [10, 256, 56, 56]         136,448
├─Sequential: 1-3                             [10, 512, 28, 28]         --
│    └─Bottleneck: 2-8                        [10, 512, 28, 28]         379,3