## 使用重复元素的网络
``VGG``:  
在LeNet的基础上增加了3个卷积层.  
AlexNet指明了深度卷积神经网路可以取得出色的结果,但并没有提供简单的规则以指导后来的研究者如何设计新的网络.

* VGG提出了可以通过``重复使用简单的基础块``来构建深度模型的思路。  
``vgg_block``:填充为1,窗口形状为3x3的卷积层;步幅为2,窗口形状为2x2的最大池化层.

``思路``:  
采用``堆积的小卷积核``优于采用大的卷积核.  

这样可以``增加网络的深度``来保证学习更复杂的模式,而且代价还比较小(参数较少).  
在VGG中,使用了3个3x3的卷积核来代替7x7卷积核,使用了2个3x3卷积核代替5x5卷积核,这样做的主要目的是在保证具有相同感受野的条件下,提升了网络的深度,在一定程度上提升了神将网络的效果.

In [11]:
import time
import torch
from torch import nn, optim
import torchvision

device = torch.device('cuda' if torch.cuda.is_available() else "cpu")

In [12]:
# 可以指定卷积层的数量和输入输出通道数
def vgg_block(num_convs, in_channels, out_channels):
    blk = []
    for i in range(num_convs):
        if i == 0:
            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        else:
            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        blk.append(nn.ReLU())
    blk.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*blk)


### 构造VGG网络,它有5个卷积块,前2块使用``单卷积层``,后三块使用``双卷积层``.

第一块的输入输出为1和64,之后每次对输出通道数量翻倍,直到变为512.  
因为这个网络使用了8个卷积层和3个全连接层,所以称为VGG-11.

卷积层串联数个vgg_block,其超参数由变了conv_arch定义.该变量指定了每个VGG块里卷积层个数和输入输出通道数.

In [13]:
conv_arch = (1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512)
# 经过5个vgg_block,宽高都减半5次,变成24/32=7
fc_features = 512*7*7
fc_hidden_units = 4096 # 任意

In [16]:
def vgg(conv_arch, fc_features, fc_hidden_units=4096):
    net = nn.Sequential()
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        net.add_module("vgg_block" + str(i+1), vgg_block(num_convs, in_channels, out_channels))
    
    # 全连接层
    net.add_module("fc", nn.Sequential(
        nn.Flatten(),
        nn.Linear(fc_features, fc_hidden_units),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(fc_hidden_units, fc_hidden_units),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(fc_hidden_units, 10)
    ))
    return net
net = vgg(conv_arch, fc_features, fc_hidden_units=4096)
print(net)

Sequential(
  (vgg_block1): Sequential(
    (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dil

In [17]:
X = torch.rand(1, 1, 224, 224)

# named_children获取一级子模块及其名字(named_modules会返回所有子模块,包括子模块的子模块)
for name, blk in net.named_children():
    X = blk(X)
    print(name, 'output shape:', X.shape)

vgg_block1 output shape: torch.Size([1, 64, 112, 112])
vgg_block2 output shape: torch.Size([1, 128, 56, 56])
vgg_block3 output shape: torch.Size([1, 256, 28, 28])
vgg_block4 output shape: torch.Size([1, 512, 14, 14])
vgg_block5 output shape: torch.Size([1, 512, 7, 7])
fc output shape: torch.Size([1, 10])


### 获取数据和训练模型

In [25]:
# 由于VGG-11计算比AlexNet更加复杂,这里构造一个通道数更小,或者说更窄的网络在Fashion-MNIST数据集上进行训练.
ratio =  8
small_conv_arch = [(1, 1, 64//ratio), (1, 64//ratio, 128//ratio), (2, 128//ratio, 256//ratio), (2, 256//ratio, 512//ratio), (2, 512//ratio, 512//ratio)]
net = vgg(small_conv_arch, fc_features // ratio, fc_hidden_units // ratio)
print(net)

Sequential(
  (vgg_block1): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block2): Sequential(
    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block3): Sequential(
    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block4): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ce

In [28]:
resize = 224
trans = []
trans.append(torchvision.transforms.Resize(size=resize))
trans.append(torchvision.transforms.ToTensor())
transform = torchvision.transforms.Compose(trans) # 将两个变换串联起来

mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transform)
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transform)

batch_size = 64

train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

In [27]:
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print('train on', device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            # print("y.shape", y.shape) # [128]
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item() # loss复制到cpu上
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1
        with torch.no_grad():
            for X, y in test_iter:
                test_acc_sum, n = 0.0, 0 # 创建在内存(CPU)
                net.eval() # 评估模式
                test_acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).sum().item()  # 对Tensor进行.item()取值后,得到的就是一个Python Scalar.
                net.train() # 训练模式
                n += y.shape[0]
            test_acc = test_acc_sum / n

        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
        % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

In [29]:
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

train on cuda
epoch 1, loss 0.7733, train acc 2635.438, test acc 0.812, time 56.1 sec
epoch 2, loss 0.3277, train acc 3305.125, test acc 0.938, time 55.5 sec
epoch 3, loss 0.2847, train acc 3356.125, test acc 1.000, time 56.5 sec
epoch 4, loss 0.2577, train acc 3394.500, test acc 0.938, time 56.5 sec
epoch 5, loss 0.2383, train acc 3423.125, test acc 1.000, time 55.7 sec
