# ResNet
ResNet 通过引入了跨层链接解决了梯度回传消失的问题。

![](https://ws1.sinaimg.cn/large/006tNc79ly1fmptq2snv9j30j808t74a.jpg)

这就是普通的网络连接跟跨层残差连接的对比图，使用普通的连接，上层的梯度必须要一层一层传回来，而是用残差连接，相当于中间有了一条更短的路，梯度能够从这条更短的路传回来，避免了梯度过小的情况。

假设某层的输入是 x，期望输出是 H(x)， 如果我们直接把输入 x 传到输出作为初始结果，这就是一个更浅层的网络，更容易训练，而这个网络没有学会的部分，我们可以使用更深的网络 F(x) 去训练它，使得训练更加容易，最后希望拟合的结果就是 F(x) = H(x) - x，这就是一个残差的结构

残差网络的结构就是上面这种残差块的堆叠，下面让我们来实现一个 residual block

In [1]:
import numpy as np
import torch
from torch import nn
from torchvision.datasets import CIFAR10
import torch.nn.functional as F

In [2]:
def conv_3x3(in_channels, out_channels, stride = 1):
    return nn.Conv2d(in_channels, out_channels, 3, stride = stride, padding = 1, bias = False)

In [3]:
class residual_block(nn.Module):
    def __init__(self, in_channels, out_channels, same_shape = True):
        super(residual_block, self).__init__()
        self.same_shape = same_shape
        stride = 1 if self.same_shape else 2
        
        self.conv1 = conv_3x3(in_channels, out_channels, stride = stride)
        self.bn1 = nn.BatchNorm2d(out_channels)
        
        self.conv2 = conv_3x3(out_channels, out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        if not self.same_shape:
            self.conv3 = nn.Conv2d(in_channels, out_channels, 1, stride = stride)
        
    def forward(self, x):
        out = self.conv1(x)
        out = F.relu(self.bn1(out), inplace = True)
        out = self.conv2(out)
        out = F.relu(self.bn2(out), inplace = True)
        
        if not self.same_shape:
            x = self.conv3(x)
        return F.relu(x + out, True)

我们测试一下一个residual_block的输出

In [4]:
#输入输出形状相同
test_net = residual_block(32, 32)
test_x = torch.zeros(1, 32, 96, 96)
print("input:{}".format(test_x.shape))
test_y = test_net(test_x)
print("output:{}".format(test_y.shape))

input:torch.Size([1, 32, 96, 96])
output:torch.Size([1, 32, 96, 96])


In [5]:
#输入输出形状不同
test_net = residual_block(32, 32, False)
test_x = torch.zeros(1, 32, 96, 96)
print('input:{}'.format(test_x.shape))
test_y = test_net(test_x)
print('output:{}'.format(test_y.shape))

input:torch.Size([1, 32, 96, 96])
output:torch.Size([1, 32, 48, 48])


下面我们实现一个ResNet，它就是residual_block模块的堆叠

In [6]:
class ResNet(nn.Module):
    def __init__(self,in_channels, num_classes, verbose = False):
        super(ResNet, self).__init__()
        self.verbose = verbose
        
        self.block1 = nn.Conv2d(in_channels, 64, 7, 2)
        
        self.block2 = nn.Sequential(
                    nn.MaxPool2d(3, 2),
                    residual_block(64, 64),
                    residual_block(64, 64))
        
        self.block3 = nn.Sequential(
                    residual_block(64, 128, False),
                    residual_block(128, 128))
        
        self.block4 = nn.Sequential(
                    residual_block(128, 256, False),
                    residual_block(256, 256))
        
        self.block5 = nn.Sequential(
                    residual_block(256, 512, False),
                    residual_block(512, 512),
                    nn.AvgPool2d(3))
        
        self.classifier = nn.Linear(512, num_classes)
    
    def forward(self, x):
        x = self.block1(x)
        if self.verbose:
            print('block1 output:{}'.format(x.shape))
        x = self.block2(x)
        if self.verbose:
            print('block2 output:{}'.format(x.shape))
        x = self.block3(x)
        if self.verbose:
            print('block3 output:{}'.format(x.shape))
        x = self.block4(x)
        if self.verbose:
            print('block4 output:{}'.format(x.shape))
        x = self.block5(x)
        if self.verbose:
            print('block5 output:{}'.format(x.shape))
        x = x.view(x.shape[0], -1)
        x = self.classifier(x)
        return x

输出一下每个block后的大小

In [7]:
test_net = ResNet(3, 10, True)
test_x = torch.zeros(1, 3, 96, 96)
test_y = test_net(test_x)
print('output:{}'.format(test_y.shape))

block1 output:torch.Size([1, 64, 45, 45])
block2 output:torch.Size([1, 64, 22, 22])
block3 output:torch.Size([1, 128, 11, 11])
block4 output:torch.Size([1, 256, 6, 6])
block5 output:torch.Size([1, 512, 1, 1])
output:torch.Size([1, 10])


In [8]:
from utils import train

def data_tf(x):
    x = x.resize((96, 96),2)#将图片放大到96x96
    x = np.array(x, dtype = 'float32') / 255
    x = (x - 0.5) / 0.5
    x = x.transpose((2, 0, 1))
    x = torch.from_numpy(x)
    return x

train_set = CIFAR10('./data', train = True, transform = data_tf, download = True)
train_data = torch.utils.data.DataLoader(train_set, batch_size = 64, shuffle = True)
test_set = CIFAR10('./data', train = False, transform = data_tf, download = True)
test_data = torch.utils.data.DataLoader(test_set, batch_size = 128, shuffle = False)

net = ResNet(3, 10)
optimizer = torch.optim.Adam(net.parameters(), lr = 0.01)
criterion = nn.CrossEntropyLoss()

Files already downloaded and verified
Files already downloaded and verified


In [9]:
train(net, train_data, test_data, 20, optimizer, criterion)

  im = Variable(im.cuda(), volatile=True)
  label = Variable(label.cuda(), volatile=True)


Epoch 0. Train Loss: 1.634481, Train Acc: 0.398178, Valid Loss: 1.518368, Valid Acc: 0.462816, Time 00:02:06
Epoch 1. Train Loss: 1.195239, Train Acc: 0.571451, Valid Loss: 1.334072, Valid Acc: 0.536689, Time 00:02:20
Epoch 2. Train Loss: 1.026437, Train Acc: 0.633672, Valid Loss: 1.127485, Valid Acc: 0.607397, Time 00:02:24
Epoch 3. Train Loss: 0.918508, Train Acc: 0.674732, Valid Loss: 1.059035, Valid Acc: 0.624901, Time 00:02:21
Epoch 4. Train Loss: 0.824177, Train Acc: 0.707121, Valid Loss: 1.045619, Valid Acc: 0.651404, Time 00:02:16
Epoch 5. Train Loss: 0.747524, Train Acc: 0.736493, Valid Loss: 0.864958, Valid Acc: 0.698774, Time 00:02:18
Epoch 6. Train Loss: 0.671451, Train Acc: 0.762728, Valid Loss: 0.839617, Valid Acc: 0.715091, Time 00:02:20
Epoch 7. Train Loss: 0.614223, Train Acc: 0.783088, Valid Loss: 0.893341, Valid Acc: 0.702828, Time 00:02:15
Epoch 8. Train Loss: 0.549866, Train Acc: 0.803269, Valid Loss: 0.824935, Valid Acc: 0.727650, Time 00:02:08
Epoch 9. Train Loss