### 使用重复元素的网络(VGG)

VGG提出了可以通过重复使用简单的基础块来构建深度模型的思路。

### VGG块

VGG块的组成规律是：连续使用数个相同的填充为1、形状为3 * 3的卷积层后接上一个步幅为2、窗口形状为2 * 2的最大池化层。

卷积层保持输入的高和宽不变，而池化层则对其减半。

VGG通过小卷积核，保证在具有相同感知野的条件下，提升了网络的深度，在一定程度上提升了神经网络的效果。

**每使用一个VGG块，形状会减半。**

### VGG网络

VGG网络由卷积层模块后接全连接层模块构成。卷积层模块串联数个vgg_block,其超参数由变量conv_arch定义。该变量指定了每个VGG块里卷积层个数和输入输出通道数。

### 简单实现

In [1]:
import time
import torch
from torch import nn
import utils

In [2]:
#定义VGG块
def vgg_block(num_convs,in_channels,out_channels):
    blk=[]
    for i in range(num_convs):
        if i==0:
            blk.append(nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1))
        else:
            blk.append(nn.Conv2d(out_channels,out_channels,kernel_size=3,padding=1))
    #加入池化层
    blk.append(nn.MaxPool2d(kernel_size=2,stride=2)) #这里会使宽高减半
    return nn.Sequential(*blk)

In [3]:
#定义VGG网络
def vgg(conv_arch,fc_features,fc_hidden_units=4096):
    net=nn.Sequential()
    #卷积层部分
    for i,(num_convs,in_channels,out_channels) in enumerate(conv_arch):
        #每经过一个vgg_block都会使高宽减半
        net.add_module('vgg_block_'+str(i+1),vgg_block(num_convs,in_channels,out_channels))
    #全连接层部分
    net.add_module('fc',nn.Sequential(
                        utils.FlattenLayer(),#首先要展开成向量
                        nn.Linear(fc_features,fc_hidden_units),
                        nn.ReLU(),
                        nn.Dropout(0.5),
                        nn.Linear(fc_hidden_units,fc_hidden_units),
                        nn.ReLU(),
                        nn.Dropout(0.5),
                        nn.Linear(fc_hidden_units,10)
                    ))
    return net

In [4]:
#观察每一层的输出形状
conv_arch=((1,1,64),(1,64,128),(2,128,256),(2,256,512),(2,512,512))
#经过5个vgg_block,宽高减半5次，变成224/32=7
fc_features=512*7*7
fc_hidden_units=4096 #任意

net=vgg(conv_arch,fc_features,fc_hidden_units)
X=torch.rand(1,1,224,224)
#named_children()获取一级子模块及其名字
#named_modules()会返回所有子模块，包括子模块的子模块
for name,blk in net.named_children():
    X=blk(X)
    print(name,'output shape:',X.shape)

vgg_block_1 output shape: torch.Size([1, 64, 112, 112])
vgg_block_2 output shape: torch.Size([1, 128, 56, 56])
vgg_block_3 output shape: torch.Size([1, 256, 28, 28])
vgg_block_4 output shape: torch.Size([1, 512, 14, 14])
vgg_block_5 output shape: torch.Size([1, 512, 7, 7])
fc output shape: torch.Size([1, 10])


In [5]:
#获取数据及训练
#为了简单化，这里我们使用更小通道数进行训练，节省时间
ratio=8
conv_arch=((1,1,64//ratio),(1,64//ratio,128//ratio),(2,128//ratio,256//ratio),(2,256//ratio,512//ratio),(2,512//ratio,512//ratio))
net=vgg(conv_arch,fc_features//ratio,fc_hidden_units//ratio)
print(net)

Sequential(
  (vgg_block_1): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_2): Sequential(
    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_3): Sequential(
    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_4): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (vgg_block_5): Sequential(
    (0): Conv2d(64, 64, kernel_size=(3, 3), 

In [6]:
#训练
batch_size=64
train_iter,test_iter=utils.load_data_fashion_mnist(batch_size,resize=224)

lr,num_epochs=0.001,5
optimizer=torch.optim.Adam(net.parameters(),lr=lr)
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
utils.train_ch5(net,train_iter,test_iter,num_epochs,device,optimizer)

train on cuda
epoch 1,train loss 0.5094,train acc 0.8169,test acc 0.8763,time 64.3 sec
epoch 2,train loss 0.3326,train acc 0.8803,test acc 0.8921,time 63.5 sec
epoch 3,train loss 0.2892,train acc 0.8945,test acc 0.9007,time 63.2 sec
epoch 4,train loss 0.2608,train acc 0.9054,test acc 0.9022,time 63.3 sec
epoch 5,train loss 0.2408,train acc 0.9123,test acc 0.9050,time 62.6 sec
