ResNet的主要区别在于，DenseNet里模块B的输出不是像ResNet那样和模块A的输出相加，而是在通道维上连结。

这样模块A的输出可以直接传入模块B后面的层。在这个设计里，模块A直接跟模块B后面的所有层连接在了一起。这也是它被称为“稠密连接”的原因。

# 稠密块

In [1]:
import torch
from torch import nn
import torch.nn.functional as F

def convBlock(inChannels,outChannels):
    return nn.Sequential(
        nn.BatchNorm2d(inChannels),
        nn.ReLU(),
        nn.Conv2d(inChannels,outChannels,kernel_size=3, padding=1)
    )

In [2]:
class DenseBlock(nn.Module):
    def __init__(self,numConvs,inChannels,outChannels):
        super().__init__()
        net=[]
        for i in range(numConvs):
            inC=inChannels+i*outChannels
            net.append(convBlock(inC,outChannels))
        self.net = nn.ModuleList(net)
        self.outChannels=inChannels+numConvs*outChannels
    def forward(self,x):
        for block in self.net:
            y=block(x)
            x=torch.cat((x,y),dim=1)
        return x

下面的例子中，我们定义了一个有2个输出通道的为10的卷积块。使用通道数为3的输入时，我们会得到通道数为3+2*10=23的输出

In [3]:
blk=DenseBlock(2, 3, 10)
x=torch.rand(4, 3, 8, 8)
y=blk(x)
y.shape

torch.Size([4, 23, 8, 8])

# 过渡层

由于每个稠密快都会带来通道数的增加，使用过多则会带来过于复杂的模型。  
过渡层用来控制模型复杂度。通过1*1卷积层来减小通道数，并使用歩幅为2的平均池化层减半高和宽，从而进一步降低模型复杂度。

In [4]:
def transitionBlock(inChannels,outChannels):
    block=nn.Sequential(
        nn.BatchNorm2d(inChannels),
        nn.ReLU(),
        nn.Conv2d(inChannels,outChannels,1),
        nn.AvgPool2d(2,2)
    )
    return block

In [5]:
blk=transitionBlock(23,10)
blk(y).shape

torch.Size([4, 10, 4, 4])

# DenseNet模型

In [6]:
net=nn.Sequential(
    nn.Conv2d(1,64,7,2,3),
    nn.BatchNorm2d(64),
    nn.ReLU(),
    nn.MaxPool2d(3,2,1)
)

DenseNet使用的是4个稠密块。  
稠密块之间使用过渡层减小高和宽，并减半通道数

In [7]:
numChannels,growthRate=64,32
numConvsInDenseBlock=[4,4,4,4]

for i in range(len(numConvsInDenseBlock)):
    denseBlock = DenseBlock(numConvsInDenseBlock[i],numChannels,growthRate)
    net.add_module('DenseBlock_%d' % i,denseBlock)
    # 上一个稠密块的输出通道数
    numChannels=denseBlock.outChannels
    # 在稠密块之间加入通道数减半的过渡层
    if i != len(numConvsInDenseBlock) - 1:
        net.add_module('transitionBlock_%d' % i,transitionBlock(numChannels,numChannels//2))
        numChannels = numChannels//2

最后接上全局池化层和全连接层来输出

In [8]:
class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self,x):
        return torch.nn.functional.adaptive_avg_pool2d(x, (1,1))

net.add_module('BN',nn.BatchNorm2d(numChannels))
net.add_module('relu',nn.ReLU())
net.add_module('GlobalAvgPool2d',GlobalAvgPool2d())
net.add_module("fc", nn.Sequential(nn.Flatten(), nn.Linear(numChannels, 10))) 

In [9]:
x = torch.rand((1, 1, 96, 96))
for name, layer in net.named_children():
    x = layer(x)
    print(name, '\t', x.shape)

0 	 torch.Size([1, 64, 48, 48])
1 	 torch.Size([1, 64, 48, 48])
2 	 torch.Size([1, 64, 48, 48])
3 	 torch.Size([1, 64, 24, 24])
DenseBlock_0 	 torch.Size([1, 192, 24, 24])
transitionBlock_0 	 torch.Size([1, 96, 12, 12])
DenseBlock_1 	 torch.Size([1, 224, 12, 12])
transitionBlock_1 	 torch.Size([1, 112, 6, 6])
DenseBlock_2 	 torch.Size([1, 240, 6, 6])
transitionBlock_2 	 torch.Size([1, 120, 3, 3])
DenseBlock_3 	 torch.Size([1, 248, 3, 3])
BN 	 torch.Size([1, 248, 3, 3])
relu 	 torch.Size([1, 248, 3, 3])
GlobalAvgPool2d 	 torch.Size([1, 248, 1, 1])
fc 	 torch.Size([1, 10])


# 获取数据

In [10]:
import torchvision

transform = torchvision.transforms.Compose(
    [torchvision.transforms.Resize(size=96),
    torchvision.transforms.ToTensor()]
)

#获取数据集
mnist_train = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=True, download=True, transform=transform)
mnist_test = torchvision.datasets.FashionMNIST(root='~/Datasets/FashionMNIST', train=False, download=True, transform=transform)
#读取数据集
batchSize=128
trainIter=torch.utils.data.DataLoader(mnist_train,batch_size=batchSize,shuffle=True,num_workers=8)
testIter=torch.utils.data.DataLoader(mnist_test,batch_size=batchSize,shuffle=True,num_workers=8)

# 评价

In [11]:
def evaluate_accuracy(data_iter, net):
    device=torch.device('cuda')
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        net.eval() # 评估模式
        for X, y in data_iter:
            acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().item()
            n += y.shape[0]
        net.train() # 改回训练模式 
    return acc_sum / n

# 训练

In [12]:
lr,epochsNum = 0.001,10
net=net.cuda()
loss=torch.nn.CrossEntropyLoss()
optimzer=torch.optim.Adam(net.parameters(),lr)

for epoch in range(epochsNum):
    train_l_sum=n=train_acc_sum=0
    for x,y in trainIter:
        x=x.cuda()
        y=y.cuda()
        yP=net(x)
        l=loss(yP,y)
        l.backward()
        optimzer.step()
        optimzer.zero_grad()
        train_l_sum += l.item()
        train_acc_sum += (yP.argmax(dim=1) == y).sum().item()
        n +=y.shape[0]
    print(epoch,train_l_sum/n,train_acc_sum/n,evaluate_accuracy(testIter,net))

0 0.003327920101583004 0.8473333333333334 0.8121
1 0.0021146094039082527 0.8996666666666666 0.8697
2 0.001800314412266016 0.91495 0.8578
3 0.0016219776230553787 0.92335 0.909
4 0.0014782136036703983 0.92945 0.9179
5 0.001370954975237449 0.9348166666666666 0.9208
6 0.0012583134833723307 0.9390166666666667 0.9279
7 0.001189739281994601 0.9441333333333334 0.9171
8 0.0010990946197882295 0.94725 0.9127
9 0.001010560538309316 0.9519833333333333 0.925
