ResNet沿用了VGG全3*3卷积层的设计。  
残差块首先有2个有相同输出通道数的3\*3卷积层。每个卷积层后接一个批量归一化层和relu激活函数。  
然后将输入跳过这两个卷积运算后直接加在最后的Relu函数前。  
这样的设计要求输出的与输入的形状形状一样，从而可以相加：  
1.通道数不相同->使用1\*1卷积层改变  
2.空间尺寸不同->引入相同的线性映射

In [9]:
import torch
from torch import nn
import torch.nn.functional as f

class Residual(nn.Module):
    def __init__(self,inChannels,outChannels,use_1x1conv=False,stride=1):
        super().__init__()
        self.conv1=nn.Sequential(
            nn.Conv2d(inChannels,outChannels,3,stride,1),
            nn.BatchNorm2d(outChannels),
            nn.ReLU()
        )
        self.conv2=nn.Sequential(
            nn.Conv2d(outChannels,outChannels,3,padding=1),
            nn.BatchNorm2d(outChannels)
        )
        if use_1x1conv:
            self.conv3=nn.Conv2d(inChannels,outChannels,1,stride=stride)
        else:
            self.conv3=None
    def forward(self,input):
        outputConv1=self.conv1(input)
        outputConv2=self.conv2(outputConv1)
        if self.conv3:
            input=self.conv3(input)
        return f.relu(outputConv2+input) 
        

In [10]:
blk = Residual(3, 3)
X = torch.rand((4, 3, 6, 6))
blk(X).shape # torch.Size([4, 3, 6, 6])

torch.Size([4, 3, 6, 6])

In [11]:
blk = Residual(3, 6, use_1x1conv=True, stride=2)
blk(X).shape

torch.Size([4, 6, 3, 3])

# ResNet模型

In [12]:
net=nn.Sequential(
    nn.Conv2d(1,64,7,2,3),
    nn.BatchNorm2d(64), 
    nn.ReLU(),
    nn.MaxPool2d(3, 2, 1)
)

GoogLeNet在后面接了4个由Inception块组成的模块。  
ResNet则使用4个由残差块组成的模块，每个模块使用若干个同样输出通道数的残差块。  
第一个模块的通道数同输入通道数一致。  
由于之前已经使用了步幅为2的最大池化层，所以无须减小高和宽。之后的每个模块在第一个残差块里将上一个模块的通道数翻倍，并将高和宽减半。

In [13]:
def resnetBlock(inChannels,outChannels,numResiduals,first_block=False):
    if first_block :
        assert inChannels == outChannels
    blk = []
    for i in range(numResiduals):
        if i==0 and not first_block:
            blk.append(Residual(inChannels, outChannels, use_1x1conv=True, stride=2))
        else:
            blk.append(Residual(outChannels, outChannels))
    return nn.Sequential(*blk)

In [14]:
net.add_module("resnet_block1", resnetBlock(64, 64, 2, first_block=True))
net.add_module("resnet_block2", resnetBlock(64, 128, 2))
net.add_module("resnet_block3", resnetBlock(128, 256, 2))
net.add_module("resnet_block4", resnetBlock(256, 512, 2))

与GoogLeNet一样，加入全局平均池化层后接上全连接层输出。

In [15]:
class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self,x):
        return torch.nn.functional.adaptive_avg_pool2d(x, (1,1))
net.add_module("global_avg_pool", GlobalAvgPool2d()) # GlobalAvgPool2d的输出: (Batch, 512, 1, 1)
net.add_module("fc", nn.Sequential(nn.Flatten(), nn.Linear(512, 10))) 

观察输入形状变化

In [16]:
X = torch.rand((1, 1, 224, 224))
for name, layer in net.named_children():
    X = layer(X)
    print(name, ' output shape:\t', X.shape)

0  output shape:	 torch.Size([1, 64, 112, 112])
1  output shape:	 torch.Size([1, 64, 112, 112])
2  output shape:	 torch.Size([1, 64, 112, 112])
3  output shape:	 torch.Size([1, 64, 56, 56])
resnet_block1  output shape:	 torch.Size([1, 64, 56, 56])
resnet_block2  output shape:	 torch.Size([1, 128, 28, 28])
resnet_block3  output shape:	 torch.Size([1, 256, 14, 14])
resnet_block4  output shape:	 torch.Size([1, 512, 7, 7])
global_avg_pool  output shape:	 torch.Size([1, 512, 1, 1])
fc  output shape:	 torch.Size([1, 10])


获取数据

In [17]:
import torchvision
from torchvision import transforms
#获取数据集
mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transforms.ToTensor())
#读取数据集
batchSize=256
trainIter=torch.utils.data.DataLoader(mnist_train,batch_size=batchSize,shuffle=True,num_workers=8)
testIter=torch.utils.data.DataLoader(mnist_test,batch_size=batchSize,shuffle=True,num_workers=8)

评价

In [18]:
def evaluate_accuracy(data_iter, net):
    device=torch.device('cuda')
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        net.eval() # 评估模式
        for X, y in data_iter:
            acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().item()
            n += y.shape[0]
        net.train() # 改回训练模式 
    return acc_sum / n

训练

In [19]:
lr,epochsNum = 0.001,10
net=net.cuda()
loss=torch.nn.CrossEntropyLoss()
optimzer=torch.optim.Adam(net.parameters(),lr)

for epoch in range(epochsNum):
    train_l_sum=n=train_acc_sum=0
    for x,y in trainIter:
        x=x.cuda()
        y=y.cuda()
        yP=net(x)
        l=loss(yP,y)
        l.backward()
        optimzer.step()
        optimzer.zero_grad()
        train_l_sum += l.item()
        train_acc_sum += (yP.argmax(dim=1) == y).sum().item()
        n +=y.shape[0]
    print(epoch,train_l_sum/n,train_acc_sum/n,evaluate_accuracy(testIter,net))

0 0.0017218314222991467 0.8382666666666667 0.8006
1 0.0011902463374038538 0.88765 0.8732
2 0.001012880956629912 0.9041833333333333 0.8889
3 0.0009097160746653875 0.9146666666666666 0.9003
4 0.0008314816568046809 0.92055 0.8758
5 0.0007617413983990748 0.9264 0.8972
6 0.0007065044568230709 0.9327833333333333 0.8932
7 0.0006406198677917322 0.93815 0.8954
8 0.00059957183363537 0.9425833333333333 0.9017
9 0.0005423044926176468 0.9472833333333334 0.903
