<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/inception.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inception/GoogLeNet Architecture

## Introduction

Inception, also known as GoogLeNet (named after LeNet but designed by Google), was introduced in 2014 by Szegedy et al. in their paper "Going Deeper with Convolutions." It was the winner of the ImageNet Large Scale Visual Recognition Challenge 2014 (ILSVRC2014) and marked a significant milestone in the development of convolutional neural networks (CNNs) for computer vision tasks.

The Inception architecture was revolutionary for two main reasons:
1. It introduced the concept of the "Inception Module" with parallel convolutional pathways
2. It achieved state-of-the-art performance while being computationally more efficient than predecessors like AlexNet and VGG

## The Inception Module

The core innovation of the Inception architecture is the Inception module. Instead of having to choose between different filter sizes (1×1, 3×3, 5×5) for convolutions at each layer, the Inception module performs all of these convolutions in parallel and concatenates the outputs.

### Key components of an Inception module:

1. **1×1 Convolutions**: These act as dimensionality reduction modules and help reduce the computational cost.
2. **Parallel Pathways**: Several convolution operations (1×1, 3×3, 5×5) and a max pooling operation run in parallel.
3. **Concatenation**: The outputs from all pathways are concatenated along the channel dimension.

![Inception Module](https://miro.medium.com/max/1400/1*ZFPOSAted10TPd3hBQU8iQ.png)

### Computational Efficiency

A key insight was using 1×1 convolutions before the expensive 3×3 and 5×5 convolutions to reduce the input channel dimensions, making the module computationally efficient. For example:

- Input with 256 channels → 1×1 convolution to reduce to 64 channels → 3×3 convolution
- This significantly reduces the number of parameters and computations

## GoogLeNet Architecture

GoogLeNet, the first implementation of the Inception architecture, is a 22-layer deep network (27 if counting the pooling layers). Key features include:

1. **Network in Network Philosophy**: Embedding micro-networks (Inception modules) within the larger network
2. **No Fully Connected Layers**: Unlike earlier CNN architectures, GoogLeNet uses global average pooling at the end instead of fully-connected layers, significantly reducing parameters
3. **Auxiliary Classifiers**: During training, additional softmax classifiers connected to intermediate layers to combat vanishing gradient problems
4. **Nine Inception Modules**: Stacked with occasional max-pooling layers in between to reduce spatial dimensions

### Network Architecture Overview

```
Input Image (224×224×3)
│
├── Conv 7×7+2(S) (112×112×64)
│   └── MaxPool 3×3+2(S) (56×56×64)
│       └── Conv 1×1+1(V) (56×56×64)
│           └── Conv 3×3+1(S) (56×56×192)
│               └── MaxPool 3×3+2(S) (28×28×192)
│                   └── Inception(3a) (28×28×256)
│                       └── Inception(3b) (28×28×480)
│                           └── MaxPool 3×3+2(S) (14×14×480)
│                               └── Inception(4a) (14×14×512)
│                                   └── Inception(4b) (14×14×512)
│                                       └── Inception(4c) (14×14×512)
│                                           └── Inception(4d) (14×14×528)
│                                               └── Inception(4e) (14×14×832)
│                                                   └── MaxPool 3×3+2(S) (7×7×832)
│                                                       └── Inception(5a) (7×7×832)
│                                                           └── Inception(5b) (7×7×1024)
│                                                               └── AvgPool 7×7+1(V) (1×1×1024)
│                                                                   └── Dropout (40%)
│                                                                       └── Linear (1×1×1000)
│                                                                           └── Softmax output
```

The notation "Conv 7×7+2(S)" indicates a 7×7 convolutional filter with stride 2 and same padding.

## Evolution: Inception v2, v3, v4

The Inception architecture continued to evolve after the original GoogLeNet (Inception v1):

### Inception v2 and v3
- Factorized convolutions: Replaced larger convolutions with multiple smaller ones (e.g., 5×5 → two 3×3)
- Asymmetric convolutions: Factorized n×n convolutions into 1×n followed by n×1
- Expanded filter banks
- Auxiliary classifiers as regularizers rather than for combating vanishing gradients
- Label smoothing: A regularization technique for the softmax classifier

### Inception v4 and Inception-ResNet
- Combined Inception modules with residual connections from ResNet
- More uniform simplified architecture
- Inception-ResNet showed faster training with similar accuracy to Inception v4

In [1]:
# Implementing a basic Inception module in PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F

class InceptionModule(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(InceptionModule, self).__init__()
        
        # 1x1 conv branch
        self.branch1 = nn.Conv2d(in_channels, ch1x1, kernel_size=1)
        
        # 1x1 -> 3x3 conv branch
        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, ch3x3red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1)
        )
        
        # 1x1 -> 5x5 conv branch
        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, ch5x5red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch5x5red, ch5x5, kernel_size=5, padding=2)
        )
        
        # 3x3 pool -> 1x1 conv branch
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_proj, kernel_size=1)
        )
        
    def forward(self, x):
        branch1 = F.relu(self.branch1(x))
        branch2 = F.relu(self.branch2(x))
        branch3 = F.relu(self.branch3(x))
        branch4 = F.relu(self.branch4(x))
        
        # Concatenate along channel dimension
        outputs = [branch1, branch2, branch3, branch4]
        return torch.cat(outputs, 1)

In [2]:
# Testing the Inception module with GoogLeNet's Inception (3a) parameters
batch_size = 1
in_channels = 192
height, width = 28, 28

# Parameters for Inception(3a)
ch1x1 = 64
ch3x3red = 96
ch3x3 = 128
ch5x5red = 16
ch5x5 = 32
pool_proj = 32

# Create a sample input tensor
x = torch.randn(batch_size, in_channels, height, width)
print(f"Input shape: {x.shape}")

# Create and apply the Inception module
inception_3a = InceptionModule(in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj)
output = inception_3a(x)
print(f"Output shape: {output.shape}")

Input shape: torch.Size([1, 192, 28, 28])
Output shape: torch.Size([1, 256, 28, 28])


In [3]:
# Using pre-trained Inception models with TensorFlow/Keras

# Note: This is just for demonstration, executing this requires TensorFlow to be installed
# Import the Inception models
# from tensorflow.keras.applications.inception_v3 import InceptionV3
# model = InceptionV3(weights='imagenet', include_top=True)

# Instead of real execution, just show information about the model
print("Model: InceptionV3")
print("Total Parameters: 23,851,784")
print("Imagenet Top-5 Accuracy: 94.4%")

Model: InceptionV3
Total Parameters: 23,851,784
Imagenet Top-5 Accuracy: 94.4%


## Applications and Impact

### Applications
Inception architectures have been successfully applied to a variety of computer vision tasks:

1. **Image Classification**: The original purpose for which it was designed
2. **Object Detection**: As a backbone in frameworks like SSD (Single Shot MultiBox Detector)
3. **Image Segmentation**: In DeepLab and other segmentation networks
4. **Transfer Learning**: Pre-trained Inception models are widely used for transferring to domain-specific tasks
5. **Computational Photography**: Used in image enhancement and computational imaging

### Historical Impact

The Inception architecture introduced several innovations that have widely influenced deep learning:

1. **Split-Transform-Merge Strategy**: Breaking down complex tasks into simpler, parallel operations
2. **Dimensionality Reduction**: Strategic use of 1×1 convolutions for computational efficiency
3. **Network Design Philosophy**: The idea that carefully crafted network modules can optimize both accuracy and computation
4. **Successor Influence**: Concepts from Inception influenced later architectures like ResNeXt and the design of efficient CNN models

### Limitations

Despite its success, Inception has some limitations:

1. **Complex Architecture**: Hard to modify, adapt, or interpret compared to more uniform designs
2. **Manual Design**: The module structures required significant hand-engineering
3. **Superseded Performance**: Later architectures like ResNets and EfficientNets have surpassed Inception in some benchmarks
4. **Higher Memory Usage**: Due to the parallel pathways, Inception can require more memory during training than sequential designs

## References

1. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). [Going deeper with convolutions](https://arxiv.org/abs/1409.4842). In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

2. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567). In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

3. Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261). In AAAI Conference on Artificial Intelligence.

4. Chollet, F. (2017). [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357). In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (A related architecture inspired by Inception)

5. Christian Szegedy's Google Scholar profile: [https://scholar.google.com/citations?user=3QeF7mAAAAAJ](https://scholar.google.com/citations?user=3QeF7mAAAAAJ)