怎么理解res net 的作用？
用通俗的理解就是，由于res net 添加了一个short cut, 这个捷径加入后，就是说及时没有添加这个新层的也可以保证原来训练的数据函数可以继续使用。另外一个层面上，也可以理解，在新加的层面上如果有新的模型添加，再加上原来训练的函数，这样新学习的东西不会因为新层的增加导致整个学习的函数会被重新学习过，而丢弃远学习的东西。
其基本的原则也是：每个附加层都应该更容易地包含原始函数作为其元素之⼀

In [2]:
import torch
from torch import nn
from torchvision import datasets,transforms
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from torch.nn import functional as F
from torch import optim
import time
import util_fei

In [4]:
class Residual(nn.Module):
    def __init__(self, input_channels, num_channels,
                use_1x1conv=False, strides=1):
                super().__init__()

                self.conv1 = nn.Conv2d(input_channels, num_channels,
                                        kernel_size=3,
                                        padding=1,
                                        stride=strides)
                self.conv2 = nn.Conv2d(num_channels, num_channels,
                                        kernel_size=3,
                                        padding=1)
                if use_1x1conv:
                    self.conv3 = nn.Conv2d(input_channels, num_channels,
                                            kernel_size=1,
                                            stride=strides)
                else:
                    self.conv3 = None
                self.bn1 = nn.BatchNorm2d(num_channels)
                self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3 is not None:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)
        

In [8]:
# 高宽不变
blk = Residual(3,3)
X = torch.rand(4, 3, 6, 6)
Y = blk(X)
print(Y.shape)

torch.Size([4, 3, 6, 6])


In [7]:
# 高宽减半
blk = Residual(3,6, use_1x1conv=True, strides=2)
X = torch.rand(4, 3, 6, 6)
Y = blk(X)
print(Y.shape)

torch.Size([4, 6, 3, 3])


ResNet model

In [9]:
def resnet_block(input_channels, num_channels,
                num_residuals, first_block=False):
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(input_channels,num_channels,
                            use_1x1conv=True, strides=2))
        else:
            blk.append(Residual(num_channels, num_channels))
    return blk

In [11]:
b1 = nn.Sequential(nn.Conv2d(1,64, kernel_size=7,
                    stride=2, padding=3),
                    nn.BatchNorm2d(64),
                    nn.ReLU(),
                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))

In [14]:
net = nn.Sequential(b1, b2, b3, b4, b5,
                    nn.AdaptiveAvgPool2d((1,1)),
                    nn.Flatten(),
                    nn.Linear(512,10))

In [15]:
X = torch.rand(size=(1, 1, 224, 224))
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)


Sequential output shape:	 torch.Size([1, 64, 56, 56])
Sequential output shape:	 torch.Size([1, 64, 56, 56])
Sequential output shape:	 torch.Size([1, 128, 28, 28])
Sequential output shape:	 torch.Size([1, 256, 14, 14])
Sequential output shape:	 torch.Size([1, 512, 7, 7])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 512, 1, 1])
Flatten output shape:	 torch.Size([1, 512])
Linear output shape:	 torch.Size([1, 10])
