扩大感受野的方法：
- 加深卷积神经网络的深度（每增加一个卷积层，感受野的宽和高就会线性增加 `卷积核的尺寸-1`）
- 使用池化层或其他快速消减特征图尺寸的技术
- 使用更加丰富的卷积操作，如膨胀卷积/空洞卷积dilated convolution、残差连接等

感受野尺寸的计算：

$r_l = r_{l-1} + (k_l - 1) * \prod_{i=0}^{l-1}s_i$

感受野的大小只与卷积核的大小、各层的步长有关，与padding无关

In [4]:
import torch
import torch.nn as nn
from torch.nn import functional as F
from torch_receptive_field import receptive_field

# git clone https://github.com/Fangyh09/pytorch-receptive-field.git
# 将receptive-field文件夹放到 python 安装目录或anaconda 安装目录的 site-pacckages文件夹下
# $ pip -V 查看目录

In [3]:
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5,self).__init__()
        
        self.conv1 = nn.Conv2d(1,6,5) # rl1 = 1 + (5 - 1 ) * (1) = 5
        self.pool1 = nn.AvgPool2d(kernel_size=2,stride=2) # rl2 = 5 + (2-1) * (1*1) = 6 
        self.conv2 = nn.Conv2d(6,16,5) # rl3 = 6 + (5-1) * (1*1*2) = 14
        self.pool2 = nn.AvgPool2d(2) # rl4 = 14 + (2-1) * (1*1*2*1) = 16
#         self.fc1 = nn.Linear(5*5*16,120) # weight(120,400)
#         self.fc2 = nn.Linear(120,84)
    
    def forward(self,x):
        x = F.tanh(self.conv1(x))
        x = self.pool1(x)
        x = F.tanh(self.conv2(x))
        x = self.pool2(x)
        # 线性前，数据拉平
#         x = x.view(-1,5*5*16) # -1:占位符，自动计算
#         x = F.tanh(self.fc1(x)) 
#         output = F.softmax(self.fc2(x),dim=1) # (samples,features)

In [5]:
# net = LeNet5().cuda()
net = LeNet5()

In [8]:
receptive_field(net,(1,32,32)) # 输入的数据结构，这里的输入不包括样本书

------------------------------------------------------------------------------
        Layer (type)    map size      start       jump receptive_field 
        0               [32, 32]        0.5        1.0             1.0 
        1               [28, 28]        2.5        1.0             5.0 
        2               [14, 14]        3.0        2.0             6.0 
        3               [10, 10]        7.0        2.0            14.0 
        4                 [5, 5]        8.0        4.0            16.0 


OrderedDict([('0',
              OrderedDict([('j', 1.0),
                           ('r', 1.0),
                           ('start', 0.5),
                           ('conv_stage', True),
                           ('output_shape', [-1, 1, 32, 32])])),
             ('1',
              OrderedDict([('j', 1.0),
                           ('r', 5.0),
                           ('start', 2.5),
                           ('input_shape', [-1, 1, 32, 32]),
                           ('output_shape', [-1, 6, 28, 28])])),
             ('2',
              OrderedDict([('j', 2.0),
                           ('r', 6.0),
                           ('start', 3.0),
                           ('input_shape', [-1, 6, 28, 28]),
                           ('output_shape', [-1, 6, 14, 14])])),
             ('3',
              OrderedDict([('j', 2.0),
                           ('r', 14.0),
                           ('start', 7.0),
                           ('input_shape', [-1, 6, 14, 14]),
            

In [None]:
# 当你的PC上有GPU的时候，receptive_field函数会在gpu上自动运行。因此必须把输入函数的网络放到GPU上

# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# net = Model().to(device)

# 该包只能识别 Conv2d 和 MaxPool2d

In [9]:
receptive_field_dict = receptive_field(net,(1,32,32))

------------------------------------------------------------------------------
        Layer (type)    map size      start       jump receptive_field 
        0               [32, 32]        0.5        1.0             1.0 
        1               [28, 28]        2.5        1.0             5.0 
        2               [14, 14]        3.0        2.0             6.0 
        3               [10, 10]        7.0        2.0            14.0 
        4                 [5, 5]        8.0        4.0            16.0 


增加模型鲁棒性：
- 平移不变形
- 数据增强

    对原有数据略微修改或合成来增加数据量
    eg: 旋转、模糊、调高饱和度、放大、缩小、调高亮度、变形、镜面翻转、去纹理化、去颜色化（脱色）、边缘增强、显著边缘图（边缘检测）

架构如何影响卷积神经网络的效果：
1. 影响CNN效果的因子：最后一个卷积层后的特征图的数目，也被称作“最大感受野上的通道数”，通道数越大CNN，效果越好
2. 池化层（争议），不能提供不变性甚至不能对卷积神经网络的效果有影响。池化层缩小特征图的功能可以由步长等于2的卷积层来代替

    池化层不能提供完美的平移不变性，因此一定会存在信息损失和例外，但从放大感受野的角度来说，池化层应该对模型存在一定的影响。对池化层而言，最关键的是能够快速下采样，即快速减少特征图尺寸、减少模型所需的参数量

影响卷积神经网络模型参数量的两种方式：
1. 这个层自带参数，其参数量与该层的超参数量的输入有关（全连接层、BN层）
2. 这个层会影响feature map的尺寸，影响整体像素量和计算量，从而影响全连接层的输入（池化、padding、stride）

卷积层两种都有

dropout、激活函数等操作不影响参数量

卷积层参数量计算：
$N_{parameters} = (K_H * K_w *C_{in}) * C_{out} + C_{out}$

- 卷积核参数量：$K_H * K_w$
- 一次扫描输入$C_{in}$张特征图输出一张特征图: $(K_H * K_w) * C_{in} + 1$
    - 权重:$(K_H * K_w) * C_{in}$
    - 偏置: 1
- $C_{out}$次扫描输出$C_{out}$张特征图：$（(K_H * K_w) * C_{in} + 1） *  C_{out}$

In [11]:
import torch
from torch import nn

conv1 = nn.Conv2d(3,6,3) # 3*3*3*6 + 6 = 168
conv2 = nn.Conv2d(6,4,3) # 3*3*6*4 + 4 = 220

In [14]:
conv1.weight.numel(),conv1.bias.numel()

(162, 6)

In [16]:
conv2.weight.numel(),conv2.bias.numel()

(216, 4)

In [17]:
conv3 = nn.Conv2d(4,16,5,stride=2,padding=1) # 5*5*4*16 + 16 = 1616

In [18]:
conv3.weight.numel(),conv3.bias.numel()

(1600, 16)

减少参数量的方法：
- 消减输入特征图数量
- 消减输出特征图数量
- 消减每个连接上的核的尺寸
- 消减输入特征图与输出特征图之间的连接数量

1. 瓶颈设计 两个1*1卷积核之间包裹其他卷积层 ResNet
2. 分组卷积 groups 不影响偏置数量
    - 不考虑偏置 $parameters = N_{group} * groups = (K_H * K_W * \frac{C_{in}}{groups})* \frac{C_{out}}{groups}  * groups = \frac{1}{groups} (K_H*K_W*C_{in}*C_{out})$
    - 考虑偏置 $parameters = N_{group} * groups = ((K_H * K_W * \frac{C_{in}}{groups})* \frac{C_{out}}{groups}  + \frac{C_{out}}{groups})* groups 
    = \frac{1}{groups} (K_H*K_W*C_{in}*C_{out}) + C_{out}$
    - groups = C_in 的分组卷积叫“深度卷积” $parameters = K_H *K_W *C_{out} + C_{out}$
3. 深度可分离卷积（分离卷积） 对一个深度卷积输出的一组特征图执行1*1卷积，在对特征图进行线性变换，两种卷积打包在一起成为一个block。（GoogLeNet）  
不考虑偏置：
$parameters = K_H *K_W *C_{out}^{depth} + C_{in}^{pair} * C_{out}^{pair}$

假设1 * 1卷积层不改变特征图数量，则$C_{in}^{pair} = C_{out}^{pair} = C_{out}^{depth}$:

$ratio = \frac{parameters_{深度可分离卷积}}{parameters_{原始卷积}} = \frac{1}{C_{in}^{depth}} + \frac{C_{out}^{pair}}{K_H*K_W *C_{in}^{depth}}$

小卷积核，减少参数量
- 两个3 * 3可以代替5 * 5卷积核
- 三个3 * 3可以代替7 * 7卷积核
- 两个1 * 1中间加3 * 3可以替代3 * 3

1*1 卷积核 ，又叫“逐点卷积”，（MLP layer）
作用：
`用在卷积层之间，用于调整输出的通道数，协助大幅度降低计算量和参数量，从而协助加深网络深度，这一作用又被称为“跨通道信息交互”。`

In [19]:
conv1 = nn.Conv2d(4,8,3) # 3*3*4*8 + 8 = 288 + 8 = 296
conv1_group = nn.Conv2d(4,8,3,groups=2) # 1/2 * 288 + 8 = 152
# 分组数最大等于Max（C_in,C_out）,并且要能被C_in,C_out）整除

In [20]:
conv1.weight.numel(),conv1_group.weight.numel()

(288, 144)

In [22]:
conv1 = nn.Conv2d(4,8,3,bias=False) # 3*3*4*8 + 8 = 288 + 8 = 296

In [24]:
# 深度可分离卷积
conv1_depthwise = nn.Conv2d(4,8,3,groups=4,bias=False) # 288/4 = 72
conv1_pairwise = nn.Conv2d(8,8,1,bias=False) # 8 * 8 = 64

In [25]:
# ratio = 1 / C_in_depth + C_out_pair / (K_H + K_W * C_in_depth)

ratio = 1/4 + 8 / (3*3 * 4)
ratio

0.4722222222222222

In [26]:
(conv1_depthwise.weight.numel() + conv1_pairwise.weight.numel()) / conv1.weight.numel()

0.4722222222222222

In [14]:
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        
        # block1
        self.conv1 = nn.Conv2d(3,6,3)
        self.conv2 = nn.Conv2d(6,4,3)
        self.pool1 = nn.MaxPool2d(2)
        
        # block2
        self.conv3 = nn.Conv2d(4,16,5,stride=2,padding=1)
        self.conv4 = nn.Conv2d(16,3,5,stride=3,padding=2)
        self.pool2 = nn.MaxPool2d(2)
        
        # FC
        self.linear1 = nn.Linear(3*9*9,256)
        self.linear2 = nn.Linear(256,256)
        self.linear3 = nn.Linear(256,10)
        
    def forward(self,x):
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
        x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))        
        
        x = x.view(-1,3*9*9)
        
        x = F.relu(self.linear1(F.dropout(x,p=0.5)))
        x = F.relu(self.linear2(F.dropout(x,p=0.5)))
        output = F.softmax(self.linear3(x),dim=1)
        
        return output

In [15]:
net = Model()

In [16]:
net

Model(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 4, kernel_size=(3, 3), stride=(1, 1))
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv3): Conv2d(4, 16, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (conv4): Conv2d(16, 3, kernel_size=(5, 5), stride=(3, 3), padding=(2, 2))
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (linear1): Linear(in_features=243, out_features=256, bias=True)
  (linear2): Linear(in_features=256, out_features=256, bias=True)
  (linear3): Linear(in_features=256, out_features=10, bias=True)
)

In [17]:
data = torch.ones(size=(10,3,229,229))

In [18]:
net(data)

tensor([[0.0984, 0.0941, 0.1055, 0.1049, 0.0960, 0.1052, 0.0999, 0.0970, 0.1038,
         0.0950],
        [0.0983, 0.0939, 0.1054, 0.1049, 0.0971, 0.1054, 0.0993, 0.0975, 0.1036,
         0.0947],
        [0.0989, 0.0934, 0.1029, 0.1066, 0.0959, 0.1039, 0.1002, 0.0990, 0.1046,
         0.0947],
        [0.0984, 0.0938, 0.1042, 0.1061, 0.0963, 0.1046, 0.0996, 0.0993, 0.1047,
         0.0930],
        [0.0985, 0.0934, 0.1039, 0.1065, 0.0965, 0.1040, 0.1005, 0.0986, 0.1047,
         0.0935],
        [0.0991, 0.0932, 0.1026, 0.1060, 0.0973, 0.1043, 0.0996, 0.0991, 0.1045,
         0.0943],
        [0.0982, 0.0953, 0.1034, 0.1057, 0.0962, 0.1045, 0.0989, 0.0978, 0.1052,
         0.0948],
        [0.0979, 0.0938, 0.1036, 0.1054, 0.0962, 0.1050, 0.1002, 0.0983, 0.1059,
         0.0936],
        [0.0996, 0.0937, 0.1045, 0.1062, 0.0955, 0.1045, 0.1006, 0.0971, 0.1056,
         0.0926],
        [0.0974, 0.0942, 0.1031, 0.1045, 0.0968, 0.1046, 0.1019, 0.0992, 0.1054,
         0.0928]], grad_fn=<

In [11]:
net = nn.Sequential(nn.Conv2d(3,6,3),
                    nn.ReLU(inplace=True), # inplace=True 计算后替代原始的值
                    nn.Conv2d(6,4,3),
                    nn.ReLU(inplace=True),
                    nn.MaxPool2d(2),
                    nn.Conv2d(4,16,5,stride=2,padding=1),
                    nn.ReLU(inplace=True),
                    nn.Conv2d(16,3,5,stride=3,padding=2),
                    nn.ReLU(inplace=True),
                    nn.MaxPool2d(2),
                   )

In [12]:
net(data).shape # 卷积+池化操作之后得到的特征概念图尺寸大小以及特征图的数量

torch.Size([10, 3, 9, 9])

In [13]:
from torch_receptive_field import receptive_field

rfdict = receptive_field(net,(3,229,229))

------------------------------------------------------------------------------
        Layer (type)    map size      start       jump receptive_field 
        0             [229, 229]        0.5        1.0             1.0 
        1             [227, 227]        1.5        1.0             3.0 
        2             [227, 227]        1.5        1.0             3.0 
        3             [225, 225]        2.5        1.0             5.0 
        4             [225, 225]        2.5        1.0             5.0 
        5             [112, 112]        3.0        2.0             6.0 
        6               [55, 55]        5.0        4.0            14.0 
        7               [55, 55]        5.0        4.0            14.0 
        8               [19, 19]        5.0       12.0            30.0 
        9               [19, 19]        5.0       12.0            30.0 
        10                [9, 9]       11.0       24.0            42.0 


In [47]:
class VGG16(nn.Module):
    def __init__(self):
        super().__init__()
        self.features_ = nn.Sequential(nn.Conv2d(3,64,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(64,64,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.MaxPool2d(2)
                                      ,nn.Conv2d(64,128,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(128,128,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.MaxPool2d(2)
                                      ,nn.Conv2d(128,256,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(256,256,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(256,256,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.MaxPool2d(2)
                                      ,nn.Conv2d(256,512,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.MaxPool2d(2)
                                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                                      ,nn.MaxPool2d(2)
                                    )
        self.clf_ = nn.Sequential(nn.Dropout(0.5)
                                  ,nn.Linear(512*7*7,4096),nn.ReLU(inplace=True)
                                  ,nn.Dropout(0.5)
                                  ,nn.Linear(4096,4096),nn.ReLU(inplace=True)
                                  ,nn.Linear(4096,1000),nn.Softmax(dim=1))
        
    def forward(self,x):
        x = self.features_(x) # 用特征提取的架构 提取特征
        x = x.view(-1,512*7*7) # 调整数据结构，拉平数据
        output = self.clf_(x)
        return output

In [36]:
net = nn.Sequential(nn.Conv2d(3,64,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(64,64,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.MaxPool2d(2)
                      
                      ,nn.Conv2d(64,128,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(128,128,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.MaxPool2d(2)
                      
                      ,nn.Conv2d(128,256,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(256,256,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(256,256,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.MaxPool2d(2)
                      
                      ,nn.Conv2d(256,512,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.MaxPool2d(2)
                      
                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.Conv2d(512,512,3,padding=1),nn.ReLU(inplace=True)
                      ,nn.MaxPool2d(2)
                    )

In [37]:
data = torch.ones(size=(10,3,224,224))
net(data).shape # 512个特征图，尺寸为7*7

torch.Size([10, 512, 7, 7])

In [48]:
vgg = VGG16()

In [49]:
from torchinfo import summary

In [50]:
summary(vgg,input_size=(10,3,224,224),device="cpu")

Layer (type:depth-idx)                   Output Shape              Param #
VGG16                                    --                        --
├─Sequential: 1-1                        [10, 512, 7, 7]           --
│    └─Conv2d: 2-1                       [10, 64, 224, 224]        1,792
│    └─ReLU: 2-2                         [10, 64, 224, 224]        --
│    └─Conv2d: 2-3                       [10, 64, 224, 224]        36,928
│    └─ReLU: 2-4                         [10, 64, 224, 224]        --
│    └─MaxPool2d: 2-5                    [10, 64, 112, 112]        --
│    └─Conv2d: 2-6                       [10, 128, 112, 112]       73,856
│    └─ReLU: 2-7                         [10, 128, 112, 112]       --
│    └─Conv2d: 2-8                       [10, 128, 112, 112]       147,584
│    └─ReLU: 2-9                         [10, 128, 112, 112]       --
│    └─MaxPool2d: 2-10                   [10, 128, 56, 56]         --
│    └─Conv2d: 2-11                      [10, 256, 56, 56]         29

## FCN：物体检测（滑窗识别）

使用1 * 1 卷积核代替全连接层（参数会变多）

全局平均池化，池化核尺寸等于特征图尺寸，得到 1 * 1 输出，可以替代全连接层

In [51]:
data = torch.ones(10,7,7)

In [54]:
gap = nn.AvgPool2d(7)

gap(data).shape

torch.Size([10, 1, 1])

In [55]:
import torch
from torch import nn
from torchinfo import summary

In [57]:
data = torch.ones(size=(10,3,32,32))

In [60]:
class NiN(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1 = nn.Sequential(nn.Conv2d(3,192,5,padding=2),nn.ReLU(inplace=True)
                                  ,nn.Conv2d(192,160,1),nn.ReLU(inplace=True)
                                  ,nn.Conv2d(160,96,1),nn.ReLU(inplace=True)
                                  ,nn.MaxPool2d(kernel_size=3,stride=2)
                                  ,nn.Dropout(0.25))
        self.block2 = nn.Sequential(nn.Conv2d(96,192,5,padding=2),nn.ReLU(inplace=True)
                                  ,nn.Conv2d(192,192,1),nn.ReLU(inplace=True)
                                  ,nn.Conv2d(192,192,1),nn.ReLU(inplace=True)
                                  ,nn.MaxPool2d(kernel_size=3,stride=2)
                                  ,nn.Dropout(0.25))
        self.block3 = nn.Sequential(nn.Conv2d(192,192,3,padding=1),nn.ReLU(inplace=True)
                                  ,nn.Conv2d(192,192,1),nn.ReLU(inplace=True)
                                  ,nn.Conv2d(192,10,1),nn.ReLU(inplace=True)
                                  ,nn.AvgPool2d(7,stride=1)
                                  ,nn.Softmax(dim=1))
        
    def forward(self,x):
        output = self.block3(self.block2(self.block1(x)))
        return output

In [61]:
net = NiN()

In [63]:
net(data).shape

torch.Size([10, 10, 1, 1])

In [67]:
summary(net,(10,3,32,32),device="cpu")

Layer (type:depth-idx)                   Output Shape              Param #
NiN                                      --                        --
├─Sequential: 1-1                        [10, 96, 15, 15]          --
│    └─Conv2d: 2-1                       [10, 192, 32, 32]         14,592
│    └─ReLU: 2-2                         [10, 192, 32, 32]         --
│    └─Conv2d: 2-3                       [10, 160, 32, 32]         30,880
│    └─ReLU: 2-4                         [10, 160, 32, 32]         --
│    └─Conv2d: 2-5                       [10, 96, 32, 32]          15,456
│    └─ReLU: 2-6                         [10, 96, 32, 32]          --
│    └─MaxPool2d: 2-7                    [10, 96, 15, 15]          --
│    └─Dropout: 2-8                      [10, 96, 15, 15]          --
├─Sequential: 1-2                        [10, 192, 7, 7]           --
│    └─Conv2d: 2-9                       [10, 192, 15, 15]         460,992
│    └─ReLU: 2-10                        [10, 192, 15, 15]         -