## 残差网络(ResNet)

2015

## 1.残差块
在残差块中,输入可通过跨层的数据线路更快地向前传播.

ResNet``沿用了VGG``全3x3卷积层的设计.  
残差块里首先有``两个``有相同输出通道数的``3x3卷积层``.每个卷积层后接一个``批量归一化层``和``ReLU激活函数``.  
然后将输入跳过两个卷积运算后直接加在最后的ReLU激活函数前.  
这样的设计要求,两个卷积层的``输出与输入形状一样``,从而可以相加.  
若要改变通道数,就需要引入一个额外的1x1卷积层来将输入变换成需要的形状后再做相加运算.


In [17]:
import time
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [18]:
# 残差块
class Residual(nn.Module):
    def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):
        super(Residual, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        return F.relu(Y + X)

In [19]:
blk = Residual(3, 3)
X = torch.rand((4, 3, 6, 6))
blk(X).shape # torch.Size([4, 3, 6, 6])

torch.Size([4, 3, 6, 6])

In [20]:
blk = Residual(3, 6, use_1x1conv=True, stride=2) # 输出的宽高减半
blk(X).shape

torch.Size([4, 6, 3, 3])

## 2.ResNet模型

前两层跟之前介绍的GoogLeNet一样:在输出通道数为64, 步幅为2的7x7``卷积层``后接步幅为2的3x3的``最大池化层``.  
不同之处在与ResNet每个``卷积层后增加的批量归一化层``.

In [21]:
net = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)

GoogLeNet后面接了4个由Inception块组成的模块.  
ResNet则使用``4个``由残差块组成的``模块``,  ``每个模块``使用``若干个``同样输出通道数的``残差块``.   


第一个模块的通道数同输入通道数一致.由于之前已经使用了步幅为2的最大池化层,所以无序减小高和宽.之后的每个模块在第一个残差块里将上一个模块的通道数翻倍,并将高和宽减半.

In [22]:
# 残差块组成的模块,第一个模块需要特别处理
def resnet_block(in_channels, out_channels, num_residuals, first_block=False):
    if first_block:
        assert in_channels == out_channels # 第一个模块的通道数与输入通道数一致
    blk = []
    for i in range(num_residuals):
        if i==0 and not first_block:
            blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))
        else:
            blk.append(Residual(out_channels, out_channels))
    return nn.Sequential(*blk)

In [23]:
# 为ResNet加入所有残差块.这里每个模块使用两个残差块.
net.add_module("resnet_block1", resnet_block(64, 64, 2, first_block=True))
net.add_module("resnet_block2", resnet_block(64, 128, 2))
net.add_module("resnet_block3", resnet_block(128, 256, 2))
net.add_module("resnet_block4", resnet_block(256, 512, 2))

In [24]:
# 全局平均池化
class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()
    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])

In [25]:
# 加入全局平均池化层后接上全连接层输出.
net.add_module("global_avg_pool", GlobalAvgPool2d())
net.add_module("fc", nn.Sequential(nn.Flatten(), nn.Linear(512, 10)))

每个模块(resnet_block)里有4个卷积层(不算上1x1卷积层的话),加上最开始的卷积层和最后的全连接层,共计18层.这个模型通常被称为``ResNet-18``.  

In [26]:
# 观察一下输入形状子啊ResNet不同模块之间的变化
X = torch.rand((1, 1, 224, 224))
for name, layer in net.named_children():
    X = layer(X)
    print(name, 'output shape:\t', X.shape)

0 output shape:	 torch.Size([1, 64, 112, 112])
1 output shape:	 torch.Size([1, 64, 112, 112])
2 output shape:	 torch.Size([1, 64, 112, 112])
3 output shape:	 torch.Size([1, 64, 56, 56])
resnet_block1 output shape:	 torch.Size([1, 64, 56, 56])
resnet_block2 output shape:	 torch.Size([1, 128, 28, 28])
resnet_block3 output shape:	 torch.Size([1, 256, 14, 14])
resnet_block4 output shape:	 torch.Size([1, 512, 7, 7])
global_avg_pool output shape:	 torch.Size([1, 512, 1, 1])
fc output shape:	 torch.Size([1, 10])


## 3.获取数据和训练模型

In [28]:
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print('train on', device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            # print("y.shape", y.shape) # [128]
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item() # loss复制到cpu上
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1

        with torch.no_grad():
            test_acc_sum, n_test = 0.0, 0 # 创建在内存(CPU)
            for X_test, y_test in test_iter:
                net.eval() # 评估模式
                test_acc_sum += (net(X_test.to(device)).argmax(dim=1) == y_test.to(device)).sum().item()  # 对Tensor进行.item()取值后,得到的就是一个Python Scalar.
                net.train() # 训练模式
                n_test += y_test.shape[0]
            test_acc = test_acc_sum / n_test

        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
        % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

In [29]:
mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transforms.ToTensor())

batch_size = 256

train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

In [30]:
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

train on cuda
epoch 1, loss 0.4422, train acc 0.837, test acc 0.831, time 14.0 sec
epoch 2, loss 0.3016, train acc 0.889, test acc 0.873, time 13.9 sec
epoch 3, loss 0.2622, train acc 0.902, test acc 0.883, time 13.9 sec
epoch 4, loss 0.2374, train acc 0.912, test acc 0.895, time 14.0 sec
epoch 5, loss 0.2170, train acc 0.918, test acc 0.894, time 14.1 sec
