📝 **Author:** Amirhossein Heydari - 📧 **Email:** amirhosseinheydari78@gmail.com - 📍 **Linktree:** [linktr.ee/mr_pylin](https://linktr.ee/mr_pylin)

---

# Dependencies
   - torchvision models:
      - class
         - brings in the model class directly
         - Allows more control and customization since you are dealing directly with the class. You can override methods, customize initialization, etc.
      - function
         - This import brings in a function that returns an instance of the model
         - Easier and quicker to use, especially for standard models
   - [pytorch.org/vision/stable/models.html](https://pytorch.org/vision/stable/models.html)

In [None]:
import torch
import torch.nn.functional as F
from torch import nn
from torchinfo import summary
from torchvision.models import GoogLeNet, googlenet

# GoogLe Net
   - GoogLeNet, officially known as `Inception v1`, Developed in 2014 by [Christian Szegedy](https://scholar.google.com/citations?user=bnQMuzgAAAAJ) and collaborators from [Google Research](https://research.google/)
   - It is based on the [Going Deeper with Convolutions](https://research.google/pubs/going-deeper-with-convolutions/) paper
   - It was trained on the [ImageNet](https://www.image-net.org/) dataset (first resized to 256x256 then center cropped to 224x224) [[ImageNet viewer](https://navigu.net/#imagenet)]
   - Known for its innovative Inception modules (concatenating filters of different sizes within the same module)
   - The architecture includes multiple [auxiliary classifiers](https://serp.ai/auxiliary-classifier/) to improve gradient flow and provide additional regularization
   - The losses of the auxiliary classifiers were weighted by 0.3
   - The `winner` of the ImageNet Large Scale Visual Recognition Challenge ([ILSVRC](https://image-net.org/challenges/LSVRC/2014/)) in 2014

<figure style="text-align: center;">
    <img src="../../../assets/images/original/cnn/architectures/googlenet.svg" alt="googlenet-architecture.svg" style="width: 100%;">
    <figcaption>GoogLeNet (Inception v1) Architecture</figcaption>
</figure>

## Custom GoogLeNet
   - `Softmax` is missing due to internal implementation of `LogSoftmax` in the `CrossEntropyLoss` function.

   - For better compatibility with various input sizes, `AveragePool2d` replaced with `AdaptiveAveragePool2d` to get the same output size.

   - Normalization:
      - In the original GoogLeNet paper, `Local Response Normalization` (LRN) was used [`nn.LocalResponseNorm`].
      - In many modern implementations including the PyTorch version, Batch Normalization (BN) is used instead [`nn.BatchNorm2d`].
      - BatchNorm generally leads to better performance and is more effective at stabilizing training.

   - Approximate number of parameters
      - without auxiliary classifiers: ~7 million
      - with auxiliary classifieres: ~13 million

In [None]:
class BasicConv2d(nn.Module):
    def __init__(self, in_channels: int, out_channels: int, kernel_size: int, **kwargs) -> None:
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.relu(self.bn(self.conv(x)))

In [None]:
class Inception(nn.Module):
    def __init__(
            self,
            in_channels,  # the number of input channels to the Inception module
            n1x1,         # the number of 1x1 convolution filters in the first branch (branch 1)
            n3x3red,      # the number of 1x1 convolution filters in the second branch (branch 2) before the 3x3 convolution
            n3x3,         # the number of 3x3 convolution filters in the second branch (branch 2)
            n5x5red,      # the number of 1x1 convolution filters in the third branch (branch 3) before the 5x5 convolution
            n5x5,         # the number of 5x5 convolution filters in the third branch (branch 3)
            pool_proj,    # the number of 1x1 convolution filters in the fourth branch (branch 4) after the max pooling
    ) -> None:

        super(Inception, self).__init__()

        # branch 1: 1x1 convolution
        self.branch1 = BasicConv2d(in_channels, n1x1, kernel_size=1)

        # branch 2: 1x1 convolution followed by 3x3 convolution
        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels, n3x3red, kernel_size=1),
            BasicConv2d(n3x3red, n3x3, kernel_size=3, padding=1)
        )

        # branch 3: 1x1 convolution followed by 5x5 convolution
        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels, n5x5red, kernel_size=1),
            BasicConv2d(n5x5red, n5x5, kernel_size=5, padding=2)
        )

        # branch 4: 3x3 max pooling followed by 1x1 convolution
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BasicConv2d(in_channels, pool_proj, kernel_size=1)
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # depth1: <in_channels> -> <n1x1>
        branch1 = self.branch1(x)

        # depth2: <in_channels> -> <n3x3>
        branch2 = self.branch2(x)

        # depth3: <in_channels> -> <n5x5>
        branch3 = self.branch3(x)

        # depth4: <in_channels> -> <pool_proj>
        branch4 = self.branch4(x)

        # depth concatenate: <in_channels> -> [depth1 + depth2 + depth3 + depth4]
        return torch.cat([branch1, branch2, branch3, branch4], dim=1)

In [None]:
class InceptionAux(nn.Module):
    def __init__(self, in_channels: int, num_classes: int = 1000) -> None:
        super(InceptionAux, self).__init__()
        self.conv = BasicConv2d(in_channels, 128, kernel_size=1)
        self.fc1 = nn.Linear(2048, 1024)
        self.dropout = nn.Dropout(p=0.7)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = F.adaptive_avg_pool2d(x, output_size=(4, 4))
        x = self.conv(x)
        x = torch.flatten(x, start_dim=1)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

In [None]:
class CustomGoogLeNet(nn.Module):
    def __init__(self, num_classes: int = 1000, use_aux: bool = True) -> None:
        super(CustomGoogLeNet, self).__init__()
        self.use_aux = use_aux

        # 3x224x224 -> 64x112x112
        self.conv1 = BasicConv2d(3, 64, kernel_size=7, padding=3, stride=2)

        # 64x112x112 -> 64x56x56
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 64x56x56 -> 64x56x56
        self.conv2 = BasicConv2d(64, 64, kernel_size=1)

        # 64x56x56 -> 192x56x56
        self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)

        # 192x56x56 -> 192x28x28
        self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 192x28x28 -> 256x28x28
        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)

        # 256x28x28 -> 480x28x28
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)

        # 480x28x28 -> 480x14x14
        self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 480x14x14 -> 512x14x14
        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)

        if self.use_aux:
            # 512x14x14 -> 1000
            self.aux1 = InceptionAux(in_channels=512, num_classes=num_classes)

        # 512x14x14 -> 512x14x14
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)

        # 512x14x14 -> 512x14x14
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)

        # 512x14x14 -> 528x14x14
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)

        if self.use_aux:
            # 528x14x14 -> 1000
            self.aux2 = InceptionAux(in_channels=528, num_classes=num_classes)

        # 528x14x14 -> 832x14x14
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)

        # 832x14x14 -> 832x7x7
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        # 832x7x7 -> 832x7x7
        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)

        # 832x7x7 -> 1024x7x7
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)

        # 1024x7x7 -> 1024x1x1
        self.avgpool = nn.AdaptiveAvgPool2d(output_size=(1, 1))

        # flatten: 1024x1x1 -> 1024
        # 1024 -> 1024
        self.dropout = nn.Dropout(p=0.4)

        # 1024 -> 1000
        self.fc = nn.Linear(1024, num_classes)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        aux1 = aux2 = None

        # feature extractor
        x = self.conv1(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.maxpool2(x)
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)
        x = self.inception4a(x)

        if self.training and self.use_aux:
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)

        if self.training and self.use_aux:
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        x = self.maxpool4(x)
        x = self.inception5a(x)
        x = self.inception5b(x)
        x = self.avgpool(x)

        # flatten: 1024x1x1 -> 1024
        x = torch.flatten(x, start_dim=1)

        # classifier
        x = self.dropout(x)
        x = self.fc(x)

        return x, aux1, aux2

In [None]:
model_1 = CustomGoogLeNet()
model_1

In [None]:
summary(model_1, (1, 3, 224, 224), device='cpu')

## PyTorch GoogLeNet
   - GoogLeNet is available in PyTorch: [pytorch.org/vision/stable/models/googlenet.html](https://pytorch.org/vision/stable/models/googlenet.html)
   - There's a bug in the `3rd branch` of the `Inception module` where the `kernel size` should be `5x5` but is `3x3` [[details](https://github.com/pytorch/vision/issues/906)]
      - `torch v2.4.0+cu124`
      - `torchvision v0.19.0+cu124`
      - `torchinfo v1.8.0`

In [None]:
model_2 = googlenet()
model_2

In [None]:
summary(model_2, (1, 3, 224, 224), device='cpu')