<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Densely-Connected-Networks-(DenseNet)" data-toc-modified-id="Densely-Connected-Networks-(DenseNet)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Densely Connected Networks (DenseNet)</a></span></li><li><span><a href="#Transition-Layers" data-toc-modified-id="Transition-Layers-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Transition Layers</a></span></li><li><span><a href="#DenseNet-Model" data-toc-modified-id="DenseNet-Model-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>DenseNet Model</a></span></li></ul></div>

## Densely Connected Networks (DenseNet)

In [1]:
import d2l
from mxnet import np,npx
from mxnet.gluon import nn
npx.set_np()

 Dense Blocks
DenseNet uses the modified “batch normalization, activation, and convolution” structure of
ResNet

In [2]:
def conv_block(num_channels):
    blk = nn.Sequential()
    blk.add(nn.BatchNorm(),
            nn.Activation('relu'),
            nn.Conv2D(num_channels, kernel_size=3, padding=1))
    return blk

A dense block consists of multiple convolution blocks, each using the same number of output channels. In the forward propagation, however, we concatenate the input and output of each convolution block on the channel dimension.

In [3]:
class DenseBlock(nn.Block):
    def __init__(self, num_convs, num_channels):
        super().__init__()
        self.net = nn.Sequential()
        for _ in range(num_convs):
            self.net.add(conv_block(num_channels))

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            X = np.concatenate((X, Y), axis=1)
        return X

In the following example,
we define a `DenseBlock` instance with 2 convolution blocks of 10 output channels.
When using an input with 3 channels, we will get an output with  $3+2\times 10=23$ channels. The number of convolution block channels controls the growth in the number of output channels relative to the number of input channels. This is also referred to as the *growth rate*.


In [4]:
blk = DenseBlock(2, 10)
blk.initialize()

In [5]:
X = np.random.uniform(size=(4, 3, 8, 8))
Y = blk(X)
Y.shape

(4, 23, 8, 8)

In the following example,
we define a `DenseBlock` instance with 2 convolution blocks of 10 output channels.
When using an input with 3 channels, we will get an output with  $6+2\times 10=26$ channels. The number of convolution block channels controls the growth in the number of output channels relative to the number of input channels. This is also referred to as the *growth rate*.


In [6]:
blkk = DenseBlock(2, 10)
blkk.initialize()

In [7]:
X = np.random.uniform(size=(4, 6, 8, 8))
Z= blkk(X)
Z.shape

(4, 26, 8, 8)

## Transition Layers

Since each dense block will increase the number of channels, adding too many of them will lead to an excessively complex model. A *transition layer* is used to control the complexity of the model. It reduces the number of channels by using the $1\times 1$ convolutional layer and halves the height and width of the average pooling layer with a stride of 2, further reducing the complexity of the model.


In [8]:
def transition_block(num_channels):
    blk=nn.Sequential()
    blk.add(nn.BatchNorm(),
           nn.Activation('relu'),
           nn.Conv2D(num_channels,kernel_size=1),
           nn.AvgPool2D(pool_size=3,strides=2))
    return blk

Apply a transition layer with 10 channels to the output of the dense block in the previous example.  This reduces the number of output channels to 10, and halves the height and width.


In [9]:
blk = transition_block(10)
blk.initialize()
blk(Y).shape

(4, 10, 3, 3)

## DenseNet Model

In [10]:
desnet = nn.Sequential()
desnet.add(nn.Conv2D(64, kernel_size=7, strides=2, padding=3),
        nn.BatchNorm(), nn.Activation('relu'),
        nn.MaxPool2D(pool_size=3, strides=2, padding=1))
# `num_channels`: the current number of channels
num_channels, growth_rate = 64, 32
num_convs_in_dense_blocks = [4, 4, 4, 4]

for i, num_convs in enumerate(num_convs_in_dense_blocks):
    desnet.add(DenseBlock(num_convs, growth_rate))
    # This is the number of output channels in the previous dense block
    num_channels += num_convs * growth_rate
    # A transition layer that halves the number of channels is added between
    # the dense blocks
    if i != len(num_convs_in_dense_blocks) - 1:
        num_channels //= 2
        desnet.add(transition_block(num_channels))
        
desnet.add(nn.BatchNorm(),
        nn.Activation('relu'),
        nn.GlobalAvgPool2D(),
        nn.Dense(10))

In [11]:
desnet.initialize()

In [12]:
X = np.random.uniform(size=(1, 1, 224, 224))
for layer in desnet:
    X=layer(X)
    print(layer.name,'output shape:\t', X.shape)

conv5 output shape:	 (1, 64, 112, 112)
batchnorm5 output shape:	 (1, 64, 112, 112)
relu5 output shape:	 (1, 64, 112, 112)
pool1 output shape:	 (1, 64, 56, 56)
denseblock2 output shape:	 (1, 192, 56, 56)
sequential13 output shape:	 (1, 96, 27, 27)
denseblock3 output shape:	 (1, 224, 27, 27)
sequential19 output shape:	 (1, 112, 13, 13)
denseblock4 output shape:	 (1, 240, 13, 13)
sequential25 output shape:	 (1, 120, 6, 6)
denseblock5 output shape:	 (1, 248, 6, 6)
batchnorm25 output shape:	 (1, 248, 6, 6)
relu25 output shape:	 (1, 248, 6, 6)
pool5 output shape:	 (1, 248, 1, 1)
dense0 output shape:	 (1, 10)
