<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#LENET-NETWORK" data-toc-modified-id="LENET-NETWORK-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>LENET NETWORK</a></span></li><li><span><a href="#ALEXNET" data-toc-modified-id="ALEXNET-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>ALEXNET</a></span></li><li><span><a href="#VGG-Blocks" data-toc-modified-id="VGG-Blocks-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>VGG Blocks</a></span></li><li><span><a href="#Network-in-Network-(NiN)-BLOCK" data-toc-modified-id="Network-in-Network-(NiN)-BLOCK-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Network in Network (NiN) BLOCK</a></span></li><li><span><a href="#INCEPTION" data-toc-modified-id="INCEPTION-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>INCEPTION</a></span></li></ul></div>

In [2]:
import torch
from torchsummary import summary
import torch.nn.functional as F
from torch import nn

## LENET NETWORK
<img src='../images/lenet.jpg'>
<img src='../images/lenet.svg'>

 (source: Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola page 252-3)
  

To read more on LENET visit :

<a href='http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf'>GradientBased Learning Applied to Document Recognition</a>

In [2]:
class Reshape(nn.Module):
    def forward(self, x):
        return x.view(-1,1,28,28)
lenet=nn.Sequential(
                      Reshape(),
                     nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5,padding=2),nn.Sigmoid(),
                     nn.AvgPool2d(kernel_size=2,stride=2),
                     nn.Conv2d(6,16,kernel_size=5),nn.Sigmoid(),
                     nn.AvgPool2d(kernel_size=2,stride=2),
                     nn.Flatten(),
                     nn.Linear(16*5*5,120),nn.Sigmoid(),
                     nn.Linear(120,84),nn.Sigmoid(),
                     nn.Linear(84,10))

In [3]:
class Lenet(nn.Module):
    def __init__(self):
        super().__init__()
        self.c1=nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5,padding=2)
        self.avg1=nn.AvgPool2d(kernel_size=2,stride=2)
        self.c2=nn.Conv2d(6,16,kernel_size=5)
        self.avg2=nn.AvgPool2d(kernel_size=2,stride=2)
        self.f=nn.Flatten()
        self.l1=nn.Linear(16*5*5,120)
        self.l2=nn.Linear(120,84)
        self.l3=nn.Linear(84,10)
        self.relu=nn.ReLU()
    def forward(self,x):
        x=x.reshape(-1,1,28,28)
        h1=self.avg1(self.relu(self.c1(x)))
        h2=self.avg2(self.relu(self.c2(h1)))
        h2=self.f(h2)
        h3=self.relu(self.l1(h2))
        h4=self.relu(self.l2(h3))
        output=self.l3(h4)
        return output

In [4]:
lenet1=Lenet()

As compared to the original network, we took the liberty of replacing the Gaussian activation in
the last layer by a regular dense layer, which tends to be significantly more convenient to train.
Other than that, this network matches the historical definition of LeNet5.
Next, let us take a look of an example. As shown in Fig. 6.6.2, we feed a single-channel example
of size 28 × 28 into the network and perform a forward computation layer by layer printing the
output shape at each layer to make sure we understand what is happening here

In [5]:
X = torch.randn(size=(1, 1, 28, 28), dtype=torch.float32)
summary(lenet1,X)

In [6]:
summary(lenet1,X)

Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 6, 28, 28]           156
├─ReLU: 1-2                              [-1, 6, 28, 28]           --
├─AvgPool2d: 1-3                         [-1, 6, 14, 14]           --
├─Conv2d: 1-4                            [-1, 16, 10, 10]          2,416
├─ReLU: 1-5                              [-1, 16, 10, 10]          --
├─AvgPool2d: 1-6                         [-1, 16, 5, 5]            --
├─Flatten: 1-7                           [-1, 400]                 --
├─Linear: 1-8                            [-1, 120]                 48,120
├─ReLU: 1-9                              [-1, 120]                 --
├─Linear: 1-10                           [-1, 84]                  10,164
├─ReLU: 1-11                             [-1, 84]                  --
├─Linear: 1-12                           [-1, 10]                  850
Total params: 61,706
Trainable params: 61,706
Non-trainable params: 0
To

Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 6, 28, 28]           156
├─ReLU: 1-2                              [-1, 6, 28, 28]           --
├─AvgPool2d: 1-3                         [-1, 6, 14, 14]           --
├─Conv2d: 1-4                            [-1, 16, 10, 10]          2,416
├─ReLU: 1-5                              [-1, 16, 10, 10]          --
├─AvgPool2d: 1-6                         [-1, 16, 5, 5]            --
├─Flatten: 1-7                           [-1, 400]                 --
├─Linear: 1-8                            [-1, 120]                 48,120
├─ReLU: 1-9                              [-1, 120]                 --
├─Linear: 1-10                           [-1, 84]                  10,164
├─ReLU: 1-11                             [-1, 84]                  --
├─Linear: 1-12                           [-1, 10]                  850
Total params: 61,706
Trainable params: 61,706
Non-trainable params: 0
To

In [7]:
X = torch.randn(size=(1, 1, 28, 28), dtype=torch.float32)
for layer in lenet:
    X = layer(X)
    print(layer.__class__.__name__,'output shape: \t',X.shape)

Reshape output shape: 	 torch.Size([1, 1, 28, 28])
Conv2d output shape: 	 torch.Size([1, 6, 28, 28])
Sigmoid output shape: 	 torch.Size([1, 6, 28, 28])
AvgPool2d output shape: 	 torch.Size([1, 6, 14, 14])
Conv2d output shape: 	 torch.Size([1, 16, 10, 10])
Sigmoid output shape: 	 torch.Size([1, 16, 10, 10])
AvgPool2d output shape: 	 torch.Size([1, 16, 5, 5])
Flatten output shape: 	 torch.Size([1, 400])
Linear output shape: 	 torch.Size([1, 120])
Sigmoid output shape: 	 torch.Size([1, 120])
Linear output shape: 	 torch.Size([1, 84])
Sigmoid output shape: 	 torch.Size([1, 84])
Linear output shape: 	 torch.Size([1, 10])


## ALEXNET
<img src='../images/alexnet.jpg'>
 (source: Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola page 261)
  
To read more on Alexnet visit :

<a href='https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf'>ImageNet Classification with Deep Convolutional
Neural Networks
</a>

In [8]:
alexnet=nn.Sequential(
    nn.Conv2d(in_channels=1,out_channels=96,stride=4,kernel_size=11),nn.ReLU(),
    nn.MaxPool2d(kernel_size=3,stride=2),
    nn.Conv2d(96,256,padding=2,kernel_size=5),nn.ReLU(),
    nn.MaxPool2d(kernel_size=3,stride=2),
    nn.Conv2d(256,384,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(384,384,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(384,384,kernel_size=3,padding=1),nn.ReLU(),
    nn.MaxPool2d(kernel_size=3,stride=2),
    nn.Flatten(),
    nn.Linear(384*5*5,4096),nn.ReLU(),nn.Dropout(p=0.5),
    nn.Linear(4096,4096),nn.ReLU(),nn.Dropout(p=0.5),
    # Output layer. Since we are using Fashion-MNIST, the number of classes is
    # 10, instead of 1000 as in the paper
    nn.Linear(4096,10))

We construct a single-channel data instance with both height and width of 224 to observe the output shape of each layer. It matches our diagram above

In [9]:
X = torch.rand(size=(1, 1, 224, 224))

for layer in alexnet:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape:\t', X.shape)

Conv2d output shape:	 torch.Size([1, 96, 54, 54])
ReLU output shape:	 torch.Size([1, 96, 54, 54])
MaxPool2d output shape:	 torch.Size([1, 96, 26, 26])
Conv2d output shape:	 torch.Size([1, 256, 26, 26])
ReLU output shape:	 torch.Size([1, 256, 26, 26])
MaxPool2d output shape:	 torch.Size([1, 256, 12, 12])
Conv2d output shape:	 torch.Size([1, 384, 12, 12])
ReLU output shape:	 torch.Size([1, 384, 12, 12])
Conv2d output shape:	 torch.Size([1, 384, 12, 12])
ReLU output shape:	 torch.Size([1, 384, 12, 12])
Conv2d output shape:	 torch.Size([1, 384, 12, 12])
ReLU output shape:	 torch.Size([1, 384, 12, 12])
MaxPool2d output shape:	 torch.Size([1, 384, 5, 5])
Flatten output shape:	 torch.Size([1, 9600])
Linear output shape:	 torch.Size([1, 4096])
ReLU output shape:	 torch.Size([1, 4096])
Dropout output shape:	 torch.Size([1, 4096])
Linear output shape:	 torch.Size([1, 4096])
ReLU output shape:	 torch.Size([1, 4096])
Dropout output shape:	 torch.Size([1, 4096])
Linear output shape:	 torch.Size([1,

## VGG Blocks
<img src='../images/vvg.jpg'>
The function takes two arguments corresponding to the number of convolutional layers num_convs and the number of output channels num_channels


To read more on VGG visit :

<a href='https://arxiv.org/pdf/1409.1556.pdf'>VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION</a>

<img src='../images/vgg.png'>

In [10]:
vgg_1=nn.Sequential(
    nn.Conv2d(1,64,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(64,64,kernel_size=3,padding=1),nn.ReLU(),nn.MaxPool2d(kernel_size=2, stride=2),
    
    nn.Conv2d(64,128,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(128,128,kernel_size=3,padding=1),nn.ReLU(),nn.MaxPool2d(kernel_size=2, stride=2),
    
    nn.Conv2d(128,256,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(256,256,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(256,256,kernel_size=3,padding=1),nn.ReLU(),nn.MaxPool2d(kernel_size=2, stride=2),
    
    nn.Conv2d(256,512,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(512,512,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(512,512,kernel_size=3,padding=1),nn.ReLU(),nn.MaxPool2d(kernel_size=2, stride=2),
      
    nn.Conv2d(512,512,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(512,512,kernel_size=3,padding=1),nn.ReLU(),
    nn.Conv2d(512,512,kernel_size=3,padding=1),nn.ReLU(),nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
        # The fully-connected part
    nn.Linear(512 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
    nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
    nn.Linear(4096, 10))

In [11]:
def vgg_block(num_convs, in_channels,out_channels):
    layers=[]
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels,out_channels, kernel_size=3,padding=1))
        layers.append(nn.ReLU())
        in_channels=out_channels
    layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
    blk=nn.Sequential(*layers)
    return blk

The original VGG network had 5 convolutional blocks, among which the first two have one convolutional layer each and the latter three contain two convolutional layers each but we will implement the vgg diagram above of which the first two have two convolutional layer each and the latter three contain three convolutional layers each.

The first block has 64 output channels and each subsequent block doubles the number of output channels, until that number reaches 512. Since this network uses 8 convolutional layers and 3 fully-connected layers.

In [12]:
conv_arch = ((2, 64), (2, 128), (3, 256), (3, 512), (3, 512))

In [13]:
def vgg(conv_arch):
    # The convolutional part
    conv_blks=[]
    in_channels=1
    for (num_convs, out_channels) in conv_arch:
        conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels

    return nn.Sequential(
        *conv_blks, 
        nn.Flatten(),
        # The fully-connected part
        nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 10))

Next, we will construct a single-channel data example with a height and width of 224 to observe
the output shape of each layer.

In [14]:
vgg_2 = vgg(conv_arch)

In [15]:
X = torch.randn(size=(1, 1, 224, 224))
summary(vgg_2,X)

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 64, 112, 112]        --
|    └─Conv2d: 2-1                       [-1, 64, 224, 224]        640
|    └─ReLU: 2-2                         [-1, 64, 224, 224]        --
|    └─Conv2d: 2-3                       [-1, 64, 224, 224]        36,928
|    └─ReLU: 2-4                         [-1, 64, 224, 224]        --
|    └─MaxPool2d: 2-5                    [-1, 64, 112, 112]        --
├─Sequential: 1-2                        [-1, 128, 56, 56]         --
|    └─Conv2d: 2-6                       [-1, 128, 112, 112]       73,856
|    └─ReLU: 2-7                         [-1, 128, 112, 112]       --
|    └─Conv2d: 2-8                       [-1, 128, 112, 112]       147,584
|    └─ReLU: 2-9                         [-1, 128, 112, 112]       --
|    └─MaxPool2d: 2-10                   [-1, 128, 56, 56]         --
├─Sequential: 1-3                        [-1, 256, 28, 28]         --
|

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 64, 112, 112]        --
|    └─Conv2d: 2-1                       [-1, 64, 224, 224]        640
|    └─ReLU: 2-2                         [-1, 64, 224, 224]        --
|    └─Conv2d: 2-3                       [-1, 64, 224, 224]        36,928
|    └─ReLU: 2-4                         [-1, 64, 224, 224]        --
|    └─MaxPool2d: 2-5                    [-1, 64, 112, 112]        --
├─Sequential: 1-2                        [-1, 128, 56, 56]         --
|    └─Conv2d: 2-6                       [-1, 128, 112, 112]       73,856
|    └─ReLU: 2-7                         [-1, 128, 112, 112]       --
|    └─Conv2d: 2-8                       [-1, 128, 112, 112]       147,584
|    └─ReLU: 2-9                         [-1, 128, 112, 112]       --
|    └─MaxPool2d: 2-10                   [-1, 128, 56, 56]         --
├─Sequential: 1-3                        [-1, 256, 28, 28]         --
|

In [16]:
X = torch.randn(size=(1, 1, 224, 224))
summary(vgg_1,X)

Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 64, 224, 224]        640
├─ReLU: 1-2                              [-1, 64, 224, 224]        --
├─Conv2d: 1-3                            [-1, 64, 224, 224]        36,928
├─ReLU: 1-4                              [-1, 64, 224, 224]        --
├─MaxPool2d: 1-5                         [-1, 64, 112, 112]        --
├─Conv2d: 1-6                            [-1, 128, 112, 112]       73,856
├─ReLU: 1-7                              [-1, 128, 112, 112]       --
├─Conv2d: 1-8                            [-1, 128, 112, 112]       147,584
├─ReLU: 1-9                              [-1, 128, 112, 112]       --
├─MaxPool2d: 1-10                        [-1, 128, 56, 56]         --
├─Conv2d: 1-11                           [-1, 256, 56, 56]         295,168
├─ReLU: 1-12                             [-1, 256, 56, 56]         --
├─Conv2d: 1-13                           [-1, 256, 56, 56]        

Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 64, 224, 224]        640
├─ReLU: 1-2                              [-1, 64, 224, 224]        --
├─Conv2d: 1-3                            [-1, 64, 224, 224]        36,928
├─ReLU: 1-4                              [-1, 64, 224, 224]        --
├─MaxPool2d: 1-5                         [-1, 64, 112, 112]        --
├─Conv2d: 1-6                            [-1, 128, 112, 112]       73,856
├─ReLU: 1-7                              [-1, 128, 112, 112]       --
├─Conv2d: 1-8                            [-1, 128, 112, 112]       147,584
├─ReLU: 1-9                              [-1, 128, 112, 112]       --
├─MaxPool2d: 1-10                        [-1, 128, 56, 56]         --
├─Conv2d: 1-11                           [-1, 256, 56, 56]         295,168
├─ReLU: 1-12                             [-1, 256, 56, 56]         --
├─Conv2d: 1-13                           [-1, 256, 56, 56]        

##  Network in Network (NiN) BLOCK
<img src='../images/nin.jpg'/>
The NiN block consists of one convolutional layer followed by two 1 × 1 convolutional layers that
act as per-pixel fully-connected layers with ReLU activations. The convolution width of the first
layer is typically set by the user. The subsequent widths are fixed to 1 × 1

In [17]:
def nin_block(in_channels, out_channels, kernel_size, strides, padding):
    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),
        nn.ReLU(),
        nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU(),
        nn.Conv2d(out_channels, out_channels, kernel_size=1), nn.ReLU())

In [18]:
nin=nn.Sequential(
  nin_block(in_channels=1,out_channels=96,kernel_size=11,strides=4,padding=0),
    nn.MaxPool2d(kernel_size=3,stride=2),
    nin_block(in_channels=96,out_channels=256,kernel_size=5,strides=1,padding=2),
    nn.MaxPool2d(kernel_size=3,stride=2),
    nin_block(in_channels=256,out_channels=384,kernel_size=5,strides=1,padding=1),
    nn.MaxPool2d(kernel_size=3,stride=2),
    nn.Dropout(0.5),
    nin_block(in_channels=384,out_channels=10,kernel_size=3,strides=1,padding=2),
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten()
    
)

In [19]:
X = torch.randn(size=(1, 1, 224, 224))
summary(nin,X)

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 96, 54, 54]          --
|    └─Conv2d: 2-1                       [-1, 96, 54, 54]          11,712
|    └─ReLU: 2-2                         [-1, 96, 54, 54]          --
|    └─Conv2d: 2-3                       [-1, 96, 54, 54]          9,312
|    └─ReLU: 2-4                         [-1, 96, 54, 54]          --
|    └─Conv2d: 2-5                       [-1, 96, 54, 54]          9,312
|    └─ReLU: 2-6                         [-1, 96, 54, 54]          --
├─MaxPool2d: 1-2                         [-1, 96, 26, 26]          --
├─Sequential: 1-3                        [-1, 256, 26, 26]         --
|    └─Conv2d: 2-7                       [-1, 256, 26, 26]         614,656
|    └─ReLU: 2-8                         [-1, 256, 26, 26]         --
|    └─Conv2d: 2-9                       [-1, 256, 26, 26]         65,792
|    └─ReLU: 2-10                        [-1, 256, 26, 26]        

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 96, 54, 54]          --
|    └─Conv2d: 2-1                       [-1, 96, 54, 54]          11,712
|    └─ReLU: 2-2                         [-1, 96, 54, 54]          --
|    └─Conv2d: 2-3                       [-1, 96, 54, 54]          9,312
|    └─ReLU: 2-4                         [-1, 96, 54, 54]          --
|    └─Conv2d: 2-5                       [-1, 96, 54, 54]          9,312
|    └─ReLU: 2-6                         [-1, 96, 54, 54]          --
├─MaxPool2d: 1-2                         [-1, 96, 26, 26]          --
├─Sequential: 1-3                        [-1, 256, 26, 26]         --
|    └─Conv2d: 2-7                       [-1, 256, 26, 26]         614,656
|    └─ReLU: 2-8                         [-1, 256, 26, 26]         --
|    └─Conv2d: 2-9                       [-1, 256, 26, 26]         65,792
|    └─ReLU: 2-10                        [-1, 256, 26, 26]        

In [20]:
X = torch.randn(size=(1, 1, 224, 224))
for layer in nin:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape:\t', X.shape)

Sequential output shape:	 torch.Size([1, 96, 54, 54])
MaxPool2d output shape:	 torch.Size([1, 96, 26, 26])
Sequential output shape:	 torch.Size([1, 256, 26, 26])
MaxPool2d output shape:	 torch.Size([1, 256, 12, 12])
Sequential output shape:	 torch.Size([1, 384, 10, 10])
MaxPool2d output shape:	 torch.Size([1, 384, 4, 4])
Dropout output shape:	 torch.Size([1, 384, 4, 4])
Sequential output shape:	 torch.Size([1, 10, 6, 6])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 10, 1, 1])
Flatten output shape:	 torch.Size([1, 10])


## INCEPTION
<img src="../images/inception.jpg" />

In [21]:
class Inception_block(nn.Module):
    def __init__(self,in_channels,c1,c2,c3,c4,**kwargs):
        super().__init__(**kwargs)
        # Path 1 is a single 1 x 1 convolutional layer
        self.p1_1=nn.Conv2d(in_channels,out_channels=c1,kernel_size=1)
        # Path 2 is a 1 x 1 convolutional layer followed by a 3 x 3
        # convolutional layer
        self.p2_1=nn.Conv2d(in_channels,out_channels=c2[0],kernel_size=1)
        self.p2_2=nn.Conv2d(c2[0],c2[1],kernel_size=3,padding=1)
        # Path 2 is a 1 x 1 convolutional layer followed by a 5 x 5
        # convolutional layer
        self.p3_1=nn.Conv2d(in_channels,out_channels=c3[0],kernel_size=1)
        self.p3_2=nn.Conv2d(c3[0],c3[1],kernel_size=5,padding=2)
        # Path 4 is a 3 x 3 maximum pooling layer followed by a 1 x 1
        # convolutional layer
        self.p4_1=nn.MaxPool2d(kernel_size=3,padding=1,stride=1)
        self.p4_2=nn.Conv2d(in_channels,out_channels=c4,kernel_size=1)
        self.relu=nn.ReLU()
    def forward(self,x):
        p1=self.relu(self.p1_1(x))
        p2=self.relu(self.p2_2(self.relu(self.p2_1(x))))
        p3=self.relu(self.p3_2(self.relu(self.p3_1(x))))
        p4=self.relu(self.p4_2(self.p4_1(x)))
        return torch.cat((p1,p2,p3,p4),axis=1)

GoogLeNet uses a stack of a total of 9 inception blocks and global average pooling to generate its estimates. Maximum pooling between inception blocks reduced the
dimensionality. The first part is identical to AlexNet and LeNet, the stack of blocks is inherited
from VGG and the global average pooling avoids a stack of fully-connected layers at the end. The
architecture is depicted below
<img src="../images/inception1.jpg" />
<img src="../images/inception2.jpg" />

In [22]:
inception=nn.Sequential(
        nn.Conv2d(1,64,kernel_size=7,stride=2,padding=1),nn.ReLU(),
              nn.MaxPool2d(kernel_size=3,padding=1,stride=2),
    
              nn.Conv2d(64,64,kernel_size=1),nn.ReLU(),
              nn.Conv2d(64,192,kernel_size=3,padding=1),nn.ReLU(),
              nn.MaxPool2d(kernel_size=3,padding=1,stride=2),
    
              # inception(3a)
              Inception_block(in_channels=192,c1=64,c2=(96,128),c3=(16,32),c4=32),
               # inception(3b)
              Inception_block(in_channels=256,c1=128,c2=(128,192),c3=(32,96),c4=64),
              nn.MaxPool2d(kernel_size=3,stride=2,padding=1),
    
              # inception(4a)
              Inception_block(in_channels=480,c1=192,c2=(96,208),c3=(16,48),c4=64),
              # inception(4b)
              Inception_block(in_channels=512,c1=160,c2=(112,224),c3=(24,64),c4=64),
              # inception(4c)
              Inception_block(in_channels=512,c1=128,c2=(128,256),c3=(24,64),c4=64),
              # inception(4d)
              Inception_block(512,112,(144,288),(32,64),64),
              # inception(4e)
              Inception_block(528,256,(160,320),(32,128),128),
              nn.MaxPool2d(kernel_size=3,stride=2,padding=1),
    
              # inception(5a)
              Inception_block(832,256, (160, 320), (32, 128), 128),
              # inception(5b)
              Inception_block(832,384, (192, 384), (48, 128), 128),
              nn.AdaptiveAvgPool2d((1,1)),
              nn.Flatten(),

              nn.Dropout(0.4),
              nn.Linear(1024,10),
              nn.Softmax(dim=1)
              )
    

In [23]:
X = torch.randn(size=(1, 1, 96, 96))
for layer in inception:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape:\t', X.shape)

Conv2d output shape:	 torch.Size([1, 64, 46, 46])
ReLU output shape:	 torch.Size([1, 64, 46, 46])
MaxPool2d output shape:	 torch.Size([1, 64, 23, 23])
Conv2d output shape:	 torch.Size([1, 64, 23, 23])
ReLU output shape:	 torch.Size([1, 64, 23, 23])
Conv2d output shape:	 torch.Size([1, 192, 23, 23])
ReLU output shape:	 torch.Size([1, 192, 23, 23])
MaxPool2d output shape:	 torch.Size([1, 192, 12, 12])
Inception_block output shape:	 torch.Size([1, 256, 12, 12])
Inception_block output shape:	 torch.Size([1, 480, 12, 12])
MaxPool2d output shape:	 torch.Size([1, 480, 6, 6])
Inception_block output shape:	 torch.Size([1, 512, 6, 6])
Inception_block output shape:	 torch.Size([1, 512, 6, 6])
Inception_block output shape:	 torch.Size([1, 512, 6, 6])
Inception_block output shape:	 torch.Size([1, 528, 6, 6])
Inception_block output shape:	 torch.Size([1, 832, 6, 6])
MaxPool2d output shape:	 torch.Size([1, 832, 3, 3])
Inception_block output shape:	 torch.Size([1, 832, 3, 3])
Inception_block output sh

In [24]:
X = torch.randn(size=(1, 1, 96, 96))
summary(inception,X)

Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 64, 46, 46]          3,200
├─ReLU: 1-2                              [-1, 64, 46, 46]          --
├─MaxPool2d: 1-3                         [-1, 64, 23, 23]          --
├─Conv2d: 1-4                            [-1, 64, 23, 23]          4,160
├─ReLU: 1-5                              [-1, 64, 23, 23]          --
├─Conv2d: 1-6                            [-1, 192, 23, 23]         110,784
├─ReLU: 1-7                              [-1, 192, 23, 23]         --
├─MaxPool2d: 1-8                         [-1, 192, 12, 12]         --
├─Inception_block: 1-9                   [-1, 256, 12, 12]         --
|    └─Conv2d: 2-1                       [-1, 64, 12, 12]          12,352
|    └─ReLU: 2-2                         [-1, 64, 12, 12]          --
|    └─Conv2d: 2-3                       [-1, 96, 12, 12]          18,528
|    └─ReLU: 2-4                         [-1, 96, 12, 12]         



Layer (type:depth-idx)                   Output Shape              Param #
├─Conv2d: 1-1                            [-1, 64, 46, 46]          3,200
├─ReLU: 1-2                              [-1, 64, 46, 46]          --
├─MaxPool2d: 1-3                         [-1, 64, 23, 23]          --
├─Conv2d: 1-4                            [-1, 64, 23, 23]          4,160
├─ReLU: 1-5                              [-1, 64, 23, 23]          --
├─Conv2d: 1-6                            [-1, 192, 23, 23]         110,784
├─ReLU: 1-7                              [-1, 192, 23, 23]         --
├─MaxPool2d: 1-8                         [-1, 192, 12, 12]         --
├─Inception_block: 1-9                   [-1, 256, 12, 12]         --
|    └─Conv2d: 2-1                       [-1, 64, 12, 12]          12,352
|    └─ReLU: 2-2                         [-1, 64, 12, 12]          --
|    └─Conv2d: 2-3                       [-1, 96, 12, 12]          18,528
|    └─ReLU: 2-4                         [-1, 96, 12, 12]         

<p>To understand how resnet work read </p>
<a href='https://arxiv.org/pdf/1512.03385.pdf'>Deep Residual Learning for Image Recognition<a/>
    
<a href='https://arxiv.org/pdf/1603.05027.pdf'>Identity Mappings in Deep Residual Networks</a>
<img src="../images/resnet.jpg"  width='1000px'>
Source:  <a href='https://arxiv.org/pdf/1512.03385.pdf'>Deep Residual Learning for Image Recognition<a/>


In [3]:

class Residual(nn.Module):
    def __init__(self, input_channels, num_channels,
                 downsmaple=False, strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels,kernel_size=3, padding=1,
                               stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels,kernel_size=3, padding=1)
        if downsmaple:
            self.downsmaple = nn.Conv2d(input_channels, num_channels,kernel_size=1,
                                        stride=strides)
        else:
            self.downsmaple = None
            
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)
        self.relu = nn.ReLU()

    def forward(self, X):
        Y = self.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.downsmaple:
            X = self.downsmaple(X)
        return self.relu(Y+X)

In [4]:
def resnet_block(input_channels, num_channels, num_residuals,
                 first_block=False):
    layers = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            layers.append(Residual(input_channels, num_channels,
                                downsmaple=True, strides=2))
        else:
            layers.append(Residual(num_channels, num_channels))
    return nn.Sequential(*layers)

In [5]:
resnet34=nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                   nn.BatchNorm2d(64), nn.ReLU(),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
                   resnet_block(64, 64, 2, first_block=True),
                   resnet_block(64, 128, 2),
                   resnet_block(128, 256, 2),
                   resnet_block(256, 512, 2),
                   nn.AdaptiveAvgPool2d((1,1)),
                   nn.Flatten(), nn.Linear(512, 10)
                 
                 )

In [6]:
X = torch.rand(size=(1, 1, 228, 228))
for layer in resnet34:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)

Conv2d output shape:	 torch.Size([1, 64, 114, 114])
BatchNorm2d output shape:	 torch.Size([1, 64, 114, 114])
ReLU output shape:	 torch.Size([1, 64, 114, 114])
MaxPool2d output shape:	 torch.Size([1, 64, 57, 57])
Sequential output shape:	 torch.Size([1, 64, 57, 57])
Sequential output shape:	 torch.Size([1, 128, 29, 29])
Sequential output shape:	 torch.Size([1, 256, 15, 15])
Sequential output shape:	 torch.Size([1, 512, 8, 8])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 512, 1, 1])
Flatten output shape:	 torch.Size([1, 512])
Linear output shape:	 torch.Size([1, 10])
