## 稠密连接网络(DenseNet)

受到了ResNet中的跨层连接的启发.

**DenseNet与ResNet的区别?**  
在ResNet中,跨层连接的两个输入输出形状必须相同,两个支路的输出直接相加.  
在DenseNet中,两个输入输出是``在通道维度上连结``.  

DenseNet的主要构建模块是``稠密快(dense block)``和``过渡层(transition layer)``.  
- 稠密快:定义了输入和输出是如何连结的.
- 过渡层:用来控制通道数,使之不过大.

## 1.稠密块
DenseNet使用了ResNet改良版的"批量归一化,激活,卷积"结构.

In [1]:
import time
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [2]:
def conv_block(in_channels, out_channels):
    blk = nn.Sequential(
        nn.BatchNorm2d(in_channels),
        nn.ReLU(),
        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
    )
    return blk

稠密块由多个conv_block组成,每块使用相同的输出通道数.在前向计算时,将每块的输入和输出在通道维上进行连结.

In [3]:
class DenseBlock(nn.Module):
    def __init__(self, num_convs, in_channels, out_channels):
        super(DenseBlock, self).__init__()
        net = []
        for i in range(num_convs):
            in_c = in_channels + i * out_channels
            net.append(conv_block(in_c, out_channels))
        self.net = nn.ModuleList(net)
        self.out_channels = in_channels + num_convs * out_channels # 计算输出通道数
    
    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            X = torch.cat((X, Y), dim=1) # 在通道维上将输入和输出连结
        return X

In [4]:
blk = DenseBlock(num_convs=2, in_channels=3, out_channels=10)
X = torch.rand(4, 3, 8, 8)
Y = blk(X)
Y.shape

torch.Size([4, 23, 8, 8])

## 2.过渡层

由于每个稠密块都会带来通道数的增加,使用过多则会带来过于复杂的模型.  
过渡层用来控制模型复杂度.  
它通过1x1卷积层来减小通道数,并使用步幅为2的平均池化层减半高和宽,从而进一步降低模型复杂度.

In [5]:
def transition_block(in_channels, out_channels):
    blk = nn.Sequential(
        nn.BatchNorm2d(in_channels),
        nn.ReLU(),
        nn.Conv2d(in_channels, out_channels, kernel_size=1), #  1x1卷积
        nn.AvgPool2d(kernel_size=2, stride=2) # 宽高减半
    )
    return blk

In [6]:
# 对上一个例子中稠密块的输出使用通道数为10的过渡层.此时通道数减为10,高和宽减半.
blk = transition_block(in_channels=23, out_channels=10)
blk(Y).shape

torch.Size([4, 10, 4, 4])

## 3.DenseNet模型

In [7]:
# 首先使用同ResNet一样的单卷积层和最大池化层.
net = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2,  padding=1)
)

类似于ResNet的4个残差块,DenseNet接下来使用``4个稠密块``.  
同ResNet一样,可以设置每个稠密块使用多少个卷积层,这里设为4.  
稠密块里的卷积层通道数(``增长率``)设为32,所以``每个稠密块将增加128个通道``.

In [8]:
num_channels, groth_rate = 64, 32 # num_channels为当前的通道数
num_convs_in_dense_block = [4, 4, 4, 4]
for i, num_convs in enumerate(num_convs_in_dense_block):
    DB = DenseBlock(num_convs, num_channels, groth_rate)
    net.add_module("DenseBlock_%d" % i, DB)
    # 上一个稠密块的输出通道数
    num_channels = DB.out_channels
    # 在稠密块之间加入通道数减半的过渡层
    if i != len(num_convs_in_dense_block) - 1:
        net.add_module("transition_block_%d" % i, transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2

In [9]:
# 全局平均池化
class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()
    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])

In [10]:
# 最后接上全局平均池化层和全连接层来输出
net.add_module("BN", nn.BatchNorm2d(num_channels))
net.add_module("relu", nn.ReLU())
net.add_module("global_avg_pool", GlobalAvgPool2d())
net.add_module("fc", nn.Sequential(nn.Flatten(), nn.Linear(num_channels, 10)))

In [11]:
# 尝试打印每个子模块的输出维度确保网络无误
X = torch.rand((1, 1, 96, 96))
for name, layer in net.named_children():
    X = layer(X)
    print(name, 'output shape:\t', X.shape)

0 output shape:	 torch.Size([1, 64, 48, 48])
1 output shape:	 torch.Size([1, 64, 48, 48])
2 output shape:	 torch.Size([1, 64, 48, 48])
3 output shape:	 torch.Size([1, 64, 24, 24])
DenseBlock_0 output shape:	 torch.Size([1, 192, 24, 24])
transition_block_0 output shape:	 torch.Size([1, 96, 12, 12])
DenseBlock_1 output shape:	 torch.Size([1, 224, 12, 12])
transition_block_1 output shape:	 torch.Size([1, 112, 6, 6])
DenseBlock_2 output shape:	 torch.Size([1, 240, 6, 6])
transition_block_2 output shape:	 torch.Size([1, 120, 3, 3])
DenseBlock_3 output shape:	 torch.Size([1, 248, 3, 3])
BN output shape:	 torch.Size([1, 248, 3, 3])
relu output shape:	 torch.Size([1, 248, 3, 3])
global_avg_pool output shape:	 torch.Size([1, 248, 1, 1])
fc output shape:	 torch.Size([1, 10])


## 4.获取数据训练模型

In [12]:
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print('train on', device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            # print("y.shape", y.shape) # [128]
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item() # loss复制到cpu上
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1

        with torch.no_grad():
            test_acc_sum, n_test = 0.0, 0 # 创建在内存(CPU)
            for X_test, y_test in test_iter:
                net.eval() # 评估模式
                test_acc_sum += (net(X_test.to(device)).argmax(dim=1) == y_test.to(device)).sum().item()  # 对Tensor进行.item()取值后,得到的就是一个Python Scalar.
                net.train() # 训练模式
                n_test += y_test.shape[0]
            test_acc = test_acc_sum / n_test

        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
        % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

In [13]:
resize = 96
trans = []
trans.append(torchvision.transforms.Resize(size=resize))
trans.append(torchvision.transforms.ToTensor())
transform = torchvision.transforms.Compose(trans) # 将两个变换串联起来

mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transform)
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transform)

batch_size = 256

train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

In [14]:
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

train on cuda
epoch 1, loss 0.4491, train acc 0.842, test acc 0.847, time 28.4 sec
epoch 2, loss 0.2741, train acc 0.901, test acc 0.868, time 27.8 sec
epoch 3, loss 0.2322, train acc 0.915, test acc 0.906, time 28.2 sec
epoch 4, loss 0.2099, train acc 0.924, test acc 0.893, time 28.9 sec
epoch 5, loss 0.1934, train acc 0.928, test acc 0.905, time 28.8 sec
