## 含并行连结的网络(GoogLeNet)

* 2014
* 吸收了NiN网络中串联网络的思想,并做出很大改进.这里介绍这个系列模型的第一个版本.

### (1)Inception块

* GoogLeNet中的基础卷积块,得名于同名电影<<盗梦空间>>(Inception).
* Inception块里有4条并行的线路.  
前3条使用窗口大小为1, 3, 5的卷积层来``抽取不同空间尺寸下的信息``,  
中间2个线路会对输入先``做1x1卷积减少通道数``,以``降低模型复杂度``.  
第4条线路使用``3x3的最大池化层``,后接``1x1卷积层来改变通道数``.

Inception块中自定义的超参数是每个层的输出通道数,以此控制模型复杂度.

In [1]:
import time
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [8]:
class Inception(nn.Module):
    # c1-c4为每条线路里层的输出通道数
    def __init__(self, in_c, c1, c2, c3, c4):
        super(Inception, self).__init__()
        # 线路1,单1x1卷积层
        self.p1_1 = nn.Conv2d(in_c, c1, kernel_size=1)
        # 线路2,1x1卷积层后接3x3卷积层
        self.p2_1 = nn.Conv2d(in_c, c2[0], kernel_size=1)
        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
        # 线路3,1x1卷积层后接5x5卷积层
        self.p3_1 = nn.Conv2d(in_c, c3[0], kernel_size=1)
        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
        # 线路4,3x3最大池化层后接1x1卷积层
        self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.p4_2 = nn.Conv2d(in_c, c4, kernel_size=1)

    def forward(self, x):
        p1 = F.relu(self.p1_1(x))
        p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
        p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
        p4 = F.relu(self.p4_2(self.p4_1(x)))
        return torch.cat((p1, p2, p3, p4), dim=1) # 在通道维度上连结输出.


### (2)GoogLeNet模型

主体卷积部分使用５个模块(block),模块之间使用步幅为2的最大池化层来减小输出高和宽.  

In [9]:
# 第一个模块使用一个64通道的7x7卷积.
b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

In [10]:
# 第二个模块使用2个卷积层:首先是64通道的1x1卷积层,然后是将通道增大3倍的3x3卷积层.对应Inception块中的第二条线路.

In [11]:
b2 = nn.Sequential(
    nn.Conv2d(64, 64, kernel_size=1),
    nn.Conv2d(64, 192, kernel_size=3, padding=1),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

第三个模块串联2个完整的Inception块.  
第一个Inception块的输出通道数为64+128+32+32=25664+128+32+32=256.  
第二个Inception块输出通道数增至128+192+96+64=480128+192+96+64=480.  


In [19]:
b3 = nn.Sequential(
    Inception(192, 64, (96, 128), (16, 32), 32),
    Inception(256, 128, (128, 192), (32, 96), 64),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

第四模块串联了5个Inception块.  
其输出通道数分别是  
192+208+48+64=512  
160+224+64+64=512    
128+256+64+64=512  
112+288+64+64=528112+288+64+64=528  
256+320+128+128=832256+320+128+128=832  

首先含3×33×3卷积层的``第二条线路输出最多通道``，其次是仅含1×11×1卷积层的第一条线路，之后是含5×55×5卷积层的第三条线路和含3×33×3最大池化层的第四条线路。其中第二、第三条线路都会先``按比例减小通道数``。这些``比例``在各个Inception块中都``略有不同``。

In [20]:
b4 = nn.Sequential(
    Inception(480, 192, (96, 208), (16, 48), 64),
    Inception(512, 160, (112, 224), (24, 64), 64),
    Inception(512, 128, (128, 256), (24, 64), 64),
    Inception(512, 112, (144, 288), (32, 64), 64),
    Inception(528, 256, (160, 320), (32, 128), 128),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

第五模块有两个Inception块.  
输出通道数为  
256+320+128+128=832  
384+384+128+128=1024  

第五模块的后面``紧跟输出层``，该模块同NiN一样使用``全局平均池化层``来将每个通道的高和宽变成1。最后我们将输出变成二维数组后接上一个``输出个数为标签类别数``的全连接层。

In [21]:
# 全局平均池化
class GloablAvgPool2d(nn.Module):
    def __init__(self):
        super(GloablAvgPool2d, self).__init__()
    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])

In [22]:
b5 = nn.Sequential(
    Inception(832, 256, (160, 320), (32, 128), 128),
    Inception(832, 384, (192, 384), (48, 128), 128),
    GloablAvgPool2d()
)

GoogLeNet模型的计算复杂，而且不如VGG那样便于修改通道数。这里我们将输入的高和宽从224降到96来简化计算。

In [23]:
net = nn.Sequential(b1, b2, b3, b4, b5, nn.Flatten(), nn.Linear(1024, 10))
X = torch.rand(1, 1, 96, 96)
for blk in net.children():
    X = blk(X)
    print('output shape:', X.shape)

output shape: torch.Size([1, 64, 24, 24])
output shape: torch.Size([1, 192, 12, 12])
output shape: torch.Size([1, 480, 6, 6])
output shape: torch.Size([1, 832, 3, 3])
output shape: torch.Size([1, 1024, 1, 1])
output shape: torch.Size([1, 1024])
output shape: torch.Size([1, 10])


### (3)获取数据

In [24]:
# 使用高和宽均为96像素的图像来训练GoogLeNet模型
resize = 96
trans = []
trans.append(torchvision.transforms.Resize(size=resize))
trans.append(torchvision.transforms.ToTensor())
transform = torchvision.transforms.Compose(trans) # 将两个变换串联起来

mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transform)
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transform)

batch_size = 128

train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

### (4)训练模型

In [25]:
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print('train on', device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            # print("y.shape", y.shape) # [128]
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item() # loss复制到cpu上
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1

        with torch.no_grad():
            test_acc_sum, n_test = 0.0, 0 # 创建在内存(CPU)
            for X_test, y_test in test_iter:
                net.eval() # 评估模式
                test_acc_sum += (net(X_test.to(device)).argmax(dim=1) == y_test.to(device)).sum().item()  # 对Tensor进行.item()取值后,得到的就是一个Python Scalar.
                net.train() # 训练模式
                n_test += y_test.shape[0]
            test_acc = test_acc_sum / n_test

        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
        % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

### (5)训练

In [26]:
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

train on cuda
epoch 1, loss 1.0218, train acc 0.609, test acc 0.823, time 44.5 sec
epoch 2, loss 0.4298, train acc 0.844, test acc 0.858, time 44.4 sec
epoch 3, loss 0.3484, train acc 0.872, test acc 0.876, time 44.5 sec
epoch 4, loss 0.3056, train acc 0.887, test acc 0.882, time 44.6 sec
epoch 5, loss 0.2737, train acc 0.900, test acc 0.893, time 44.6 sec


* Inception块相当于一个有4条线路的子网络.通过不同窗口形状的卷积层和最大池化层来并行抽取信息,并使用1x1卷积层来减少通道数从而降低模型复杂度.
* GoogLeNet将多个Inception块和其他层串联起来.其中Inception块的通道数分配之比是在ImageNet数据集上通过大量的实验得来的.
* GoogLeNet系列一度是ImageNet上最高效的模型之一:在类似的测试精度下,它们的计算复杂度往往更低.