# 前沿网络SOTA 

## GoogLeNet（Inception V1）

VGG 参数量过多，各层之间的链接过于稠密（Dense），计算量过大，容易过拟合。

解决这个问题的方法：
1. 消减参数量的操作，让网络整体变得“稀疏”
2. 引入随机的稀疏性。如类似 Dropout 的方式来随机的让特征矩阵或权重矩阵中的部分数据为0
3. 引入GPU进行计算

之前解决方式的缺点：
1. 在神经网络由稠密变得稀疏（Sparse）的过程中，网络的学习能力会波动甚至下降（稠密的学习能力强）
2. 随机的稀疏性与GPU计算之间存在巨大矛盾。现代硬件不擅长处理在随机或非均匀稀疏的数据上的计算，并且不擅长在矩阵计算上表现得尤其明显

 - VGG：在学习能力更强的稠密架构上增加Dropout
 - GoogLeNet:使用普通卷积、池化层这些稠密元素组成的块去无限逼近（approximate）一个稀疏架构，从而构造一种参数量与稀疏网络相似的稠密网络

GoogLeNet 团队使用了一个复杂的网络架构构造算法
-> 并让算法向着 “使用稠密成分逼近稀疏架构” 的方向进行训练
-> 产出多个可能有效的密集架构
-> 进行大量的实验后，选出了学习能力最强的密集架构及其相关参数（这个架构就是Inception块，将其以某种方式串联起来就是GoogLeNet）

Inception块：
对同一个输入并联四组卷积运算，得到不同通道数相同特征图尺寸的结果，进行拼接
1. 1 * 1 卷积核：最大程度保留像素之间的位置信息
2. 3 * 3 / 5 * 5 卷积核： 提取相邻像素之间的信息
3. 最大池化：提取局部最有价值的信息

- Inception块没有Dropout等，结构稠密
- 在 3 * 3 / 5 * 5 卷积核之前的 `1 * 1` 有“聚类”(聚合信息)的作用，使得架构更加稠密，且加深网络深度

`辅助分类器`：除了主题架构中的softmax分类器之外，另外存在的两个分类器（softmax）。
结构：
- 平均池化层
- 卷积层+ReLU
- 全连接层+ReLU
- Dropout（70%）
- 全连接层+softmax

在整体架构中，这两个分类器的输入分别是 inception4a 和 inception4d 的输出结果。也就是在这两层的后面，加上辅助分类器后，输出softmax结果。最后将这两个和全局架构中的softmax,这三个分类结果算三个损失函数值，对这三个损失函数值加权平均得到最终损失，基于这个损失反向传播。

版本1有 LRN层（局部响应归一化），后来用 BN 层代替

## GoogLeNet 复现

In [3]:
# conv + BN + ReLU --basicconv
# Inception
# AUXclf

In [4]:
import torch
from torch import nn
from torchinfo import summary

In [5]:
class BasicConv2d(nn.Module):
    def __init__(self,in_channels,out_channels,**kwargs): # 不写in_channels也可以
        super().__init__()
        self.conv = nn.Sequential(nn.Conv2d(in_channels,out_channels,bias=False,**kwargs)
                                 ,nn.BatchNorm2d(out_channels)
                                 ,nn.ReLU(inplace=True))
    def forward(self,x):
        x = self.conv(x)
        return x

In [6]:
# 测试，目标：没有明显报错
BasicConv2d(2,10,kernel_size=3)

BasicConv2d(
  (conv): Sequential(
    (0): Conv2d(2, 10, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
)

In [9]:
class Inception(nn.Module):
    def __init__(self
                 ,in_channels : int
                 ,ch1x1 : int
                 ,ch3x3red : int
                 ,ch3x3 : int
                 ,ch5x5red : int
                 ,ch5x5 : int
                 ,pool_proj : int
                ):
        super().__init__()
        self.branch1 = BasicConv2d(in_channels,ch1x1,kernel_size=1)
        self.branch2 = nn.Sequential(BasicConv2d(in_channels,ch3x3red,kernel_size=1)
                                    ,BasicConv2d(ch3x3red,ch3x3,kernel_size=3,padding=1))
        self.branch3 = nn.Sequential(BasicConv2d(in_channels,ch5x5red,kernel_size=1)
                                    ,BasicConv2d(ch5x5red,ch5x5,kernel_size=5,padding=2))
        self.branch4 = nn.Sequential(nn.MaxPool2d(kernel_size=3,stride=1,padding=1)
                                    ,BasicConv2d(in_channels,pool_proj,kernel_size=1))   
        
    def forward(self,x):
        branch1 = self.branch1(x) # 28*28,chi1x1
        branch2 = self.branch2(x) # 28*28,chi3x3
        branch3 = self.branch3(x) # 28*28,chi5x5
        branch4 = self.branch4(x) # 28*28,pool_proj
        output = [branch1,branch2,branch3,branch4]
        return torch.cat(output,1) # 合并 dim=1

In [10]:
# 测试
"""
 ,in_channels : int
 ,ch1x1 : int
 ,ch3x3red : int
 ,ch3x3 : int
 ,ch5x5red : int
 ,ch5x5 : int
 ,pool_proj : int
"""
Inception(192,64,96,128,16,32,32)

Inception(
  (branch1): BasicConv2d(
    (conv): Sequential(
      (0): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (branch2): Sequential(
    (0): BasicConv2d(
      (conv): Sequential(
        (0): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
    (1): BasicConv2d(
      (conv): Sequential(
        (0): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
  )
  (branch3): Sequential(
    (0): BasicConv2d(
      (conv): Sequential(
        (0): Conv2d(192, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): 

In [11]:
data = torch.ones(10,192,28,28)

In [13]:
in3a = Inception(192,64,96,128,16,32,32)
in3a(data).shape

torch.Size([10, 256, 28, 28])

In [14]:
# 辅助分类器的实现
# auxiliary classifier

In [17]:
class AuxClf(nn.Module):
    def __init__(self,in_channels:int,num_classes:int,**kwargs):
        super().__init__()
        self.feature_ = nn.Sequential(nn.AvgPool2d(kernel_size=5,stride=3)
                                     ,BasicConv2d(in_channels,128,kernel_size=1))
        self.clf_ = nn.Sequential(nn.Linear(4*4*128,1024)
                                 ,nn.ReLU(inplace=True)
                                 ,nn.Dropout(0.7)
                                 ,nn.Linear(1024,num_classes))
        
    def forward(self,x):
        x = self.feature_(x)
        x = x.view(-1,4*4*128)
        x = self.clf_(x)
        return x

In [18]:
# 4a后的辅助分类器
AuxClf(512,1000)

AuxClf(
  (feature_): Sequential(
    (0): AvgPool2d(kernel_size=5, stride=3, padding=0)
    (1): BasicConv2d(
      (conv): Sequential(
        (0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
  )
  (clf_): Sequential(
    (0): Linear(in_features=2048, out_features=1024, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.7, inplace=False)
    (3): Linear(in_features=1024, out_features=1000, bias=True)
  )
)

In [29]:
class GoogLeNet(nn.Module):
    def __init__(self,num_classes: int=1000, blocks = None):
        super().__init__()
        
        if blocks is None:
            blocks = [BasicConv2d,Inception,AuxClf]
        conv_block = blocks[0]
        inception_block = blocks[1]
        aux_clf_block = blocks[2]
        
        # block1
        self.conv1 = conv_block(3,64,kernel_size=7,stride=2,padding=3)
        self.maxpool1 = nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=True)
        
        # block2
        self.conv2 = conv_block(64,64,kernel_size=1)
        self.conv3 = conv_block(64,192,kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=True) # ceil_mode=True向上取整
        
        # block3
        self.inception3a = inception_block(192,64,96,128,16,32,32)
        self.inception3b = inception_block(256,128,128,192,32,96,64)
        self.maxpool3 = nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=True)
        
        # block4
        self.inception4a = inception_block(480,192,96,208,16,48,64)
        self.inception4b = inception_block(512,160,112,224,24,64,64)
        self.inception4c = inception_block(512,128,128,256,24,64,64)
        self.inception4d = inception_block(512,112,144,288,32,64,64)
        self.inception4e = inception_block(528,256,160,320,32,128,128)
        self.maxpool4 = nn.MaxPool2d(kernel_size=3,stride=2,ceil_mode=True)
        
        # block5
        self.inception5a = inception_block(832,256,160,320,32,128,128)
        self.inception5b = inception_block(832,384,192,384,48,128,128)
        
        # clf
        self.avgpool = nn.AdaptiveAvgPool2d((1,1)) # 自适应池化参数：需要输出的特征图尺寸
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024,num_classes)
        
        # auxclf
        self.aux1 = aux_clf_block(512, num_classes) # 4a
        self.aux2 = aux_clf_block(528, num_classes) # 4d
        
    def forward(self,x):
        # block1
        x = self.maxpool1(self.conv1(x))
        
        # block2
        x = self.maxpool2(self.conv3(self.conv2(x)))
        
        # block3
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)
        
        # block3
        x = self.inception4a(x)
        aux1 = self.aux1(x)
        
        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)
        aux2 = self.aux2(x)
        
        x = self.inception4e(x)
        x = self.maxpool4(x)
        
        # block5
        x = self.inception5a(x)
        x = self.inception5b(x)
        
        # clf
        x = self.avgpool(x) # 在全局平均池化后，特征图尺寸变为1x1
        x = torch.flatten(x,1)
        x = self.dropout(x)
        x = self.fc(x)
        
        return x, aux2, aux1

In [30]:
# 测试
data = torch.ones(10,3,224,224)

net = GoogLeNet(num_classes=1000)

In [31]:
fc2,fc1,fc0 = net(data)

In [32]:
for i in [fc2,fc1,fc0]:
    print(i.shape)

torch.Size([10, 1000])
torch.Size([10, 1000])
torch.Size([10, 1000])


In [33]:
summary(net,(10,3,224,224),device="cpu",depth=1)

Layer (type:depth-idx)                        Output Shape              Param #
GoogLeNet                                     --                        --
├─BasicConv2d: 1-1                            [10, 64, 112, 112]        9,536
├─MaxPool2d: 1-2                              [10, 64, 56, 56]          --
├─BasicConv2d: 1-3                            [10, 64, 56, 56]          4,224
├─BasicConv2d: 1-4                            [10, 192, 56, 56]         110,976
├─MaxPool2d: 1-5                              [10, 192, 28, 28]         --
├─Inception: 1-6                              [10, 256, 28, 28]         164,064
├─Inception: 1-7                              [10, 480, 28, 28]         389,376
├─MaxPool2d: 1-8                              [10, 480, 14, 14]         --
├─Inception: 1-9                              [10, 512, 14, 14]         376,800
├─AuxClf: 1-10                                [10, 1000]                3,188,968
├─Inception: 1-11                             [10, 512, 14, 14