## Inception Architecture

### About GoogLeNet
- One particular incarnation based on the Inception Architecture
- The original paper **Going Deeper with Convolutions** can be found at <https://arxiv.org/abs/1409.4842>
- GoogLeNet, from Google Inc., is a 22 layers deep network that set the new SOTA results for classification / detection in ILSVRC 2014 (main competitor for VGGNet)

### The Inception Module
- Motivated by this meme 
![](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*hp93DT_YP2RfPs7eBDtPfw.jpeg "Inception Movie meme")

- Two versions of the Inception Module were proposed: 
![](https://media.geeksforgeeks.org/wp-content/uploads/20200429201304/Incepption-module.PNG "Inception Modules")

- One major drawback of the naïve version - "even a modest number of 5x5 convolutions can be prohibitively expensive on top of a convolutional layer with a large number of filters" (i.e. problem related to the growth of the depth dimension of the output volume)
- This problem is exacerbated by the presence of pooling layers, which only lead to an inevitable increase in the number of channels in the output volume
- This problem is overcome by the clever introduction of 1x1 convolutions (these maintain the spatial resolution while reducing the depth dimension of the output volume) before the expensive 3x3 and 5x5 convolutions and after the pooling layers in the inception module


### Architecture of GoogLeNet

![](https://media.geeksforgeeks.org/wp-content/uploads/20200429201421/Inception-layer-by-layer.PNG "GoogLeNet Architecture")

### Importing the necessary libraries

In [1]:
import torch
import torch.nn as nn

### Implementing the GoogLeNet Architecture
1. First, we implement the conv_block class which implements a Convolution layer found in the Inception module
- Note that the usage of **kwargs in the constructor helps us construct conv layers with different parameters
2. Then, we implement the Inception_block class which invokes conv_block class multiple times as needed
- Note that there are 4 parallel branches in the Inception module with reductions, all of which can be computed in parallel
- The outputs from these branches are then concatenated along the depth dimension
3. Finally, we implement the GoogLeNet class which builds the entire architecture making use of Inception blocks 

In [2]:
class conv_block(nn.Module):
  def __init__(self, in_channels, out_channels, **kwargs):
    super(conv_block, self).__init__()

    self.relu = nn.ReLU()
    self.conv = nn.Conv2d(in_channels, out_channels, **kwargs) #for ex. kernel_size = (1,1) or (3,3) or (5,5), we dont know in advance...
    self.batchnorm = nn.BatchNorm2d(out_channels)

  def forward(self, x):
    return self.relu(self.batchnorm(self.conv(x)))

In [3]:
class Inception_block(nn.Module):
  def __init__(self, in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool): #Refer to the inception block / all these params are # of filters.
    '''
    Args:
    in_channels : no of input channels
    out_1x1 : output from the first (leftmost) branch
    red_3x3 : depth reduction before 3x3 convolutions
    out_3x3 : output from the second branch
    red_5x5 : depth reduction before 5x5 convolutions
    out_5x5 : output from the third branch
    out_1x1pool: depth reduction after the pooling layer
    '''
    super(Inception_block, self).__init__()

    self.branch1 = conv_block(in_channels, out_1x1, kernel_size=1)
    self.branch2 = nn.Sequential(
        conv_block(in_channels, red_3x3, kernel_size=1),
        conv_block(red_3x3, out_3x3, kernel_size=3, stride=1, padding=1) #by default stride=1 and padding=0
    )
    self.branch3 = nn.Sequential(
        conv_block(in_channels, red_5x5, kernel_size=1),
        conv_block(red_5x5, out_5x5, kernel_size=5, stride=1, padding=2)
    )
    self.branch4 = nn.Sequential(
        nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
        conv_block(in_channels, out_1x1pool, kernel_size=1)
    )
  
  def forward(self,x):
    return torch.cat([self.branch1(x), self.branch2(x), self.branch3(x), self.branch4(x)], dim=1) #concatenate along C in N x C x H x W...


In [4]:
class GoogLeNet(nn.Module):
  def __init__(self, in_channels=3, num_classes=1000): #refer to the paper for the complete architecture to make sense...
    super(GoogLeNet, self).__init__()

    self.conv1 = conv_block(in_channels=in_channels, out_channels=64, kernel_size=(7,7), stride=(2,2), padding=(3,3))
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) #can be reused wherever needed...
    self.conv2 = conv_block(64, 192, kernel_size=3, stride=1, padding=1)

    #Now start the inception blocks...
    #Order of params for inception block: (in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_1x1pool)
    self.inception3a = Inception_block(192, 64, 96, 128, 16, 32, 32)
    self.inception3b = Inception_block(256, 128, 128, 192, 32, 96, 64)

    self.inception4a = Inception_block(480, 192, 96, 208, 16, 48, 64)
    self.inception4b = Inception_block(512, 160, 112, 224, 24, 64, 64)
    self.inception4c = Inception_block(512, 128, 128, 256, 24, 64, 64)
    self.inception4d = Inception_block(512, 112, 144, 288, 32, 64, 64)
    self.inception4e = Inception_block(528, 256, 160, 320, 32, 128, 128)

    self.inception5a = Inception_block(832, 256, 160, 320, 32, 128, 128)
    self.inception5b = Inception_block(832, 384, 192, 384, 48, 128, 128)
    self.avgpool = nn.AvgPool2d(kernel_size=7, stride=1)
    self.dropout = nn.Dropout(p=0.4)
    self.fc1 = nn.Linear(1024, 1000)

  def forward(self, x):
    x = self.conv1(x)
    x = self.maxpool(x)
    x = self.conv2(x)
    x = self.maxpool(x)

    x = self.inception3a(x)
    x = self.inception3b(x)
    x = self.maxpool(x)

    x = self.inception4a(x)
    x = self.inception4b(x)
    x = self.inception4c(x)
    x = self.inception4d(x)
    x = self.inception4e(x)
    x = self.maxpool(x)

    x = self.inception5a(x)
    x = self.inception5b(x)
    x = self.avgpool(x)
    x = x.reshape(x.shape[0], -1)
    x = self.dropout(x)
    x = self.fc1(x)

    return x
     

### Sanity check...

In [5]:
x = torch.randn(3, 3, 224, 224)
model = GoogLeNet()
print(model(x).shape)

torch.Size([3, 1000])
