<a href="https://colab.research.google.com/github/ronald-hk-chung/ssnotebook/blob/main/computer_vision/googlenet_cifar100.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Building GoogleNet (Inception V1) to train on Cifar100

**Mission Statement**

1. Building a GoogleNet(Inception V1) from scratch
2. Train GoogleNet on Cifar-100 Dataset
3. Understand variations of Inception Model from V1 V3

## Introduction to GoogleNet (Inception V1)

GoogleNet, released in 2014, was proposed by resaerch at Google (with the collaboration of various universities) in the research paper titled [Going Deeper with Convolution](https://arxiv.org/pdf/1409.4842). It set a new benchmark in object classification and detection through it sinnovative approach in ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and achieve a top-5 error rate of 6.7%)

GoogleNet model is particularly known for its use of Inception modules, which serve as its building blocks by using parallel convolutions wiht various filter sizes (1x1, 3x3 and 5x5) with in a single layer. The outputs from these filters are then concatenated - the fusion of outputs from various filters creates a richer representation.

While the architecture is relatively deep with 22 layers, the model maintains computational efficiency despite the increase in the number of layers.

Below list the key features of GoogleNet:

*   Inception Module with The 1x1 Convolution
*   Global Average Pooling
*   Auxilliary Classifier for Training

Below shows the architecture of GoogleNet

<img src="https://github.com/ronald-hk-chung/ssnotebook/blob/main/computer_vision/assets/googlenet_architecture.png?raw=true">

## Inception Module

The Inception Module is the building block of GoogleNet, as the entire model is made by stacking Inception Modules. Key features include:

*   **Multi-Level Feature Extraction**:

    Image object can have large variation in size. Because of such variation in images, choosing the right kernel size for performing convolution operation becomes very difficult. A larger kernel is needed to extract information of object that is distributed more in teh picture while a smaller kernel is preferred to extract information of image that is distributed less in the picture. The major approach to increase the performace of neural networks is by increasing its size and depth with the downside of overfitting and increased computational resources. GoogleNet comes up with a novel solution to form a *wider* networker rather than *depper* which is called as Inception module. Inception module consists of multiple pooling and convolution operations with different kernel sizes in parallel, instead of using just one filter of a single size. Below shows teh naive version of the Inception module.

    <img src="https://github.com/ronald-hk-chung/ssnotebook/blob/main/computer_vision/assets/inception_module_naive.png?raw=true">

    The 'naive' incpetion module performs convolutions on input from previous layer, with 3 different size of kernels or filters specifically 1x1, 3x3 and 5x5. Max pooling is then performed with outputs then concatenated and sent to the next inception module.

*   **Dimension Reduction**:

    Stacking multiple layers of the `naive` inception module can increased computations significantly. To overcome this, the researchers incorporate 1x1 convolution before feeding the data into 3x3 or 5x5 convolutions.

    <img src="https://github.com/ronald-hk-chung/ssnotebook/blob/main/computer_vision/assets/inception_module_reduction.png?raw=true">

    By using 1x1 convolution, teh module reduces dimensionlaity before applying the more expensive 3x3 and 5x5 convolutions and pooling operations.

    it also results in better representation by incorporating filters of varying sizes and more layers which the network will be able to capture wider range of features in the input data

    The 1x1 convsolution is also called network in teh network as it act as a mcro-neural netowkr that learns to abstract the data before the main convolution filters are applied.



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Any, Callable, List, Optional, Tuple

class BasicConv2d(nn.Module):
    def __init__(self, in_channels: int, out_channels: int, **kwargs: Any) -> None:
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

    def forward(self, x: Tensor) -> Tensor:
        x = self.conv(x)
        x = self.bn(x)
        return F.relu(x, inplace=True)

class Inception(nn.Module):
    def __init__(self,
                 in_channels: int,
                 ch1x1: int,
                 ch3x3red: int,
                 ch3x3: int,
                 ch5x5red: int,
                 ch5x5: int,
                 pool_proj: int,
                 conv_block: Optional[nn.Module] = None):
        super().__init__()
        if conv_block is None:
            conv_block = BasicConv2d    #Conv2d -> BatchNorm2d -> ReLU

        self.branch1 = conv_block(in_channels, ch1x1, kernel_size=1)

        self.branch2 = nn.Sequential(
            conv_block(in_channels, ch3x3red, kernel_size=1),
            conv_block(ch3x3red, ch3x3, kernel_size=3, padding=1)
        )

        self.branch3 = nn.Sequential(
            conv_block(in_channels, ch5x5red, kernel_size=1),
            # conv_block(ch5x5red, kernel_size=3, padding=1)
            conv_block(ch5x5red, kernel_size=5, padding=1)
        )

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
            conv_block(in_channels, pool_proj, kernel_size=1)
        )

    def _forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        return [branch1, branch2, branch3, branch4]

    def forward(self, x):
        outputs = self._forward(x)
        return torch.cat(outputs, 1)