<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#LENET-NETWORK" data-toc-modified-id="LENET-NETWORK-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>LENET NETWORK</a></span></li><li><span><a href="#ALEXNET" data-toc-modified-id="ALEXNET-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>ALEXNET</a></span></li><li><span><a href="#VGG-Blocks" data-toc-modified-id="VGG-Blocks-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>VGG Blocks</a></span></li><li><span><a href="#Network-in-Network-(NiN)-BLOCK" data-toc-modified-id="Network-in-Network-(NiN)-BLOCK-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Network in Network (NiN) BLOCK</a></span></li><li><span><a href="#INCEPTION" data-toc-modified-id="INCEPTION-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>INCEPTION</a></span></li></ul></div>

In [1]:
import mxnet
from mxnet import gluon,npx,np,autograd
from mxnet.gluon import nn
npx.set_np()

## LENET NETWORK
<img src='images/lenet.jpg'>
<img src='../images/lenet.svg'>

 (source: Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola page 252-3)
  

To read more on LENET visit :

<a href='http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf'>GradientBased Learning Applied to Document Recognition</a>

In [2]:
lenet=nn.Sequential()
lenet.add(nn.Conv2D(channels=6,padding=2,kernel_size=5,activation='sigmoid'),
          nn.AvgPool2D(pool_size=2,strides=2),
          nn.Conv2D(channels=16,kernel_size=5,activation='sigmoid'),
          nn.AvgPool2D(pool_size=2,strides=2),
         nn.Dense(120, activation='sigmoid'),
         nn.Dense(84, activation='sigmoid'),
         nn.Dense(10))

As compared to the original network, we took the liberty of replacing the Gaussian activation in
the last layer by a regular dense layer, which tends to be significantly more convenient to train.
Other than that, this network matches the historical definition of LeNet5.
Next, let us take a look of an example. As shown in Fig. 6.6.2, we feed a single-channel example
of size 28 × 28 into the network and perform a forward computation layer by layer printing the
output shape at each layer to make sure we understand what is happening here

In [3]:
X = np.random.uniform(size=(1, 1, 28, 28))
lenet.initialize()
for layer in lenet:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)

conv0 output shape:	 (1, 6, 28, 28)
pool0 output shape:	 (1, 6, 14, 14)
conv1 output shape:	 (1, 16, 10, 10)
pool1 output shape:	 (1, 16, 5, 5)
dense0 output shape:	 (1, 120)
dense1 output shape:	 (1, 84)
dense2 output shape:	 (1, 10)


## ALEXNET
<img src='../images/alexnet.jpg'>
 (source: Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola page 261)
  
To read more on Alexnet visit :

<a href='https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf'>ImageNet Classification with Deep Convolutional
Neural Networks
</a>

In [4]:
alexnet=nn.Sequential()
alexnet.add(nn.Conv2D(channels=96,kernel_size=11,strides=4,activation='relu'),
            nn.MaxPool2D(pool_size=3,strides=2),
            nn.Conv2D(channels=256,kernel_size=5,padding=2,activation='relu'),
            nn.MaxPool2D(pool_size=3,strides=2),
            nn.Conv2D(channels=384,kernel_size=3,padding=1,activation='relu'),
            nn.Conv2D(channels=384,kernel_size=3,padding=1,activation='relu'),
            nn.Conv2D(384, kernel_size=3, padding=1, activation='relu'),
            nn.MaxPool2D(pool_size=3, strides=2),
            # Here, the number of outputs of the fully connected layer is several
            # times larger than that in LeNet. Use the dropout layer to mitigate
            # overfitting
            nn.Dense(4096, activation="relu"), nn.Dropout(0.5),
            nn.Dense(4096, activation="relu"), nn.Dropout(0.5),
            # Output layer. Since we are using Fashion-MNIST, the number of
            # classes is 10, instead of 1000 as in the paper
            nn.Dense(10)
           )

We construct a single-channel data instance with both height and width of 224 to observe the output shape of each layer. It matches our diagram above

In [5]:
X = np.random.uniform(size=(1, 1, 224, 224))
alexnet.initialize()
for layer in alexnet:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)

conv2 output shape:	 (1, 96, 54, 54)
pool2 output shape:	 (1, 96, 26, 26)
conv3 output shape:	 (1, 256, 26, 26)
pool3 output shape:	 (1, 256, 12, 12)
conv4 output shape:	 (1, 384, 12, 12)
conv5 output shape:	 (1, 384, 12, 12)
conv6 output shape:	 (1, 256, 12, 12)
pool4 output shape:	 (1, 256, 5, 5)
dense3 output shape:	 (1, 4096)
dropout0 output shape:	 (1, 4096)
dense4 output shape:	 (1, 4096)
dropout1 output shape:	 (1, 4096)
dense5 output shape:	 (1, 10)


## VGG Blocks
<img src='../images/vvg.jpg'>
The function takes two arguments corresponding to the number of convolutional layers num_convs and the number of output channels num_channels

In [6]:
def vgg_block(num_convs, num_channels):
    blk = nn.Sequential()
    for _ in range(num_convs):
        blk.add(nn.Conv2D(num_channels, kernel_size=3,padding=1, activation='relu'))
    blk.add(nn.MaxPool2D(pool_size=2, strides=2))
    return blk

The original VGG network had 5 convolutional blocks, among which the first two have one convolutional layer each and the latter three contain two convolutional layers each. The first block has
64 output channels and each subsequent block doubles the number of output channels, until that
number reaches 512. Since this network uses 8 convolutional layers and 3 fully-connected layers,
it is often called VGG-11.

In [7]:
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))

The following code implements VGG-11. This is a simple matter of executing a for loop over
conv_arch.

In [8]:
def vgg(conv_arch):
    net=nn.Sequential()
    for (num_convs, num_channels) in conv_arch:
        net.add(vgg_block(num_convs,num_channels))
    net.add(nn.Dense(4096, activation='relu'), nn.Dropout(0.5),nn.Dense(4096, activation='relu'),
            nn.Dropout(0.5),nn.Dense(10))
    return net

Next, we will construct a single-channel data example with a height and width of 224 to observe
the output shape of each layer.

In [9]:
vgg11 = vgg(conv_arch)
vgg11.initialize()

In [10]:

X = np.random.uniform(size=(1, 1, 224, 224))
for blk in vgg11:
    X = blk(X)
    print(blk.name, 'output shape:\t', X.shape)

sequential3 output shape:	 (1, 64, 112, 112)
sequential4 output shape:	 (1, 128, 56, 56)
sequential5 output shape:	 (1, 256, 28, 28)
sequential6 output shape:	 (1, 512, 14, 14)
sequential7 output shape:	 (1, 512, 7, 7)
dense6 output shape:	 (1, 4096)
dropout2 output shape:	 (1, 4096)
dense7 output shape:	 (1, 4096)
dropout3 output shape:	 (1, 4096)
dense8 output shape:	 (1, 10)


##  Network in Network (NiN) BLOCK
<img src='../images/nin.jpg'/>
The NiN block consists of one convolutional layer followed by two 1 × 1 convolutional layers that
act as per-pixel fully-connected layers with ReLU activations. The convolution width of the first
layer is typically set by the user. The subsequent widths are fixed to 1 × 1

In [11]:
def nin_block(num_channels, kernel_size, strides, padding):
    blk = nn.Sequential()
    blk.add(nn.Conv2D(num_channels, kernel_size, strides, padding,activation='relu'),
            nn.Conv2D(num_channels, kernel_size=1, activation='relu'),
            nn.Conv2D(num_channels, kernel_size=1, activation='relu'))
    return blk

In [12]:
nin = nn.Sequential()
nin.add(nin_block(96, kernel_size=11, strides=4, padding=0),nn.MaxPool2D(pool_size=3, strides=2),
        nin_block(256, kernel_size=5, strides=1, padding=2),nn.MaxPool2D(pool_size=3, strides=2),
        nin_block(384, kernel_size=3, strides=1, padding=1),nn.MaxPool2D(pool_size=3, strides=2),
        nn.Dropout(0.5),
        # There are 10 label classes
        nin_block(10, kernel_size=3, strides=1, padding=1),
         # The global average pooling layer automatically sets the window shape
          # to the height and width of the input
           nn.GlobalAvgPool2D(),
           # Transform the four-dimensional output into two-dimensional output
           # with a shape of (batch size, 10)
        nn.Flatten())

In [13]:
X = np.random.uniform(size=(1, 1, 224, 224))
nin.initialize()
for layer in nin:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)

sequential9 output shape:	 (1, 96, 54, 54)
pool10 output shape:	 (1, 96, 26, 26)
sequential10 output shape:	 (1, 256, 26, 26)
pool11 output shape:	 (1, 256, 12, 12)
sequential11 output shape:	 (1, 384, 12, 12)
pool12 output shape:	 (1, 384, 5, 5)
dropout4 output shape:	 (1, 384, 5, 5)
sequential12 output shape:	 (1, 10, 5, 5)
pool13 output shape:	 (1, 10, 1, 1)
flatten0 output shape:	 (1, 10)


## INCEPTION
<img src="../images/inception.jpg" />

In [16]:
class Inception_block(nn.Block):
    def __init__(self,c1,c2,c3,c4,**kwargs):
        super().__init__(**kwargs)
        # Path 1 is a single 1 x 1 convolutional layer
        self.p1_1=nn.Conv2D(c1,kernel_size=1,activation='relu')
        # Path 2 is a 1 x 1 convolutional layer followed by a 3 x 3
        # convolutional layer
        self.p2_1=nn.Conv2D(c2[0],kernel_size=1,activation='relu')
        self.p2_2=nn.Conv2D(c2[1],kernel_size=3,padding=1,activation='relu')
        # Path 2 is a 1 x 1 convolutional layer followed by a 5 x 5
        # convolutional layer
        self.p3_1=nn.Conv2D(c3[0],kernel_size=1,activation='relu')
        self.p3_2=nn.Conv2D(c3[1],kernel_size=5,padding=2,activation='relu')
        # Path 4 is a 3 x 3 maximum pooling layer followed by a 1 x 1
        # convolutional layer
        self.p4_1=nn.MaxPool2D(pool_size=3,padding=1,strides=1)
        self.p4_2=nn.Conv2D(c4,kernel_size=1,activation='relu')
    def forward(self,x):
        p1=self.p1_1(x)
        p2=self.p2_2(self.p2_1(x))
        p3=self.p3_2(self.p3_1(x))
        p4=self.p4_2(self.p4_1(x))
        return np.concatenate((p1,p2,p3,p4),axis=1)

GoogLeNet uses a stack of a total of 9 inception blocks and global average pooling to generate its estimates. Maximum pooling between inception blocks reduced the
dimensionality. The first part is identical to AlexNet and LeNet, the stack of blocks is inherited
from VGG and the global average pooling avoids a stack of fully-connected layers at the end. The
architecture is depicted below
<img src="../images/inception1.jpg" />
<img src="../images/inception2.jpg" />

In [17]:
inception=nn.Sequential()
inception.add(nn.Conv2D(64,kernel_size=7,strides=2,padding=1,activation='relu'),
              nn.MaxPool2D(pool_size=3,padding=1,strides=2),
              
              nn.Conv2D(64,kernel_size=1,activation='relu'),
              nn.Conv2D(192,kernel_size=3,padding=1,activation='relu'),
              nn.MaxPool2D(pool_size=3,padding=1,strides=2),
              # inception(3a)
              Inception_block(c1=64,c2=(96,128),c3=(16,32),c4=32),
              # inception(3b)
              Inception_block(c1=128,c2=(128,192),c3=(32,96),c4=64),
              nn.MaxPool2D(pool_size=3,strides=2,padding=1),
              # inception(4a)
              Inception_block(c1=192,c2=(96,208),c3=(16,48),c4=64),
              # inception(4b)
              Inception_block(c1=160,c2=(112,224),c3=(24,64),c4=64),
              # inception(4c)
              Inception_block(c1=128,c2=(128,256),c3=(24,64),c4=64),
              # inception(4d)
              Inception_block(112,(144,288),(32,64),64),
              # inception(4e)
              Inception_block(256,(160,320),(32,128),128),
              nn.MaxPool2D(pool_size=3,strides=2,padding=1),
              # inception(5a)
              Inception_block(256, (160, 320), (32, 128), 128),
              # inception(5b)
              Inception_block(384, (192, 384), (48, 128), 128),
              nn.GlobalAvgPool2D(),
              nn.Dense(10)
              )

In [18]:
X = np.random.uniform(size=(1, 1, 96, 96))
inception.initialize()
for layer in inception:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)

conv28 output shape:	 (1, 64, 46, 46)
pool14 output shape:	 (1, 64, 23, 23)
conv29 output shape:	 (1, 64, 23, 23)
conv30 output shape:	 (1, 192, 23, 23)
pool15 output shape:	 (1, 192, 12, 12)
inception_block0 output shape:	 (1, 256, 12, 12)
inception_block1 output shape:	 (1, 480, 12, 12)
pool18 output shape:	 (1, 480, 6, 6)
inception_block2 output shape:	 (1, 512, 6, 6)
inception_block3 output shape:	 (1, 512, 6, 6)
inception_block4 output shape:	 (1, 512, 6, 6)
inception_block5 output shape:	 (1, 528, 6, 6)
inception_block6 output shape:	 (1, 832, 6, 6)
pool24 output shape:	 (1, 832, 3, 3)
inception_block7 output shape:	 (1, 832, 3, 3)
inception_block8 output shape:	 (1, 1024, 3, 3)
pool27 output shape:	 (1, 1024, 1, 1)
dense9 output shape:	 (1, 10)
