<div style="display: flex; justify-content: space-between; align-items: center;">
    <div style="text-align: left; flex: 4">
        <strong>Author:</strong> Amirhossein Heydari ‚Äî 
        üìß <a href="mailto:amirhosseinheydari78@gmail.com">amirhosseinheydari78@gmail.com</a> ‚Äî 
        üêô <a href="https://github.com/mr-pylin/pytorch-workshop" target="_blank" rel="noopener">github.com/mr-pylin</a>
    </div>
    <div style="text-align: right; flex: 1;">
        <a href="https://pytorch.org/" target="_blank" rel="noopener noreferrer">
            <img src="../../../assets/images/pytorch/logo/pytorch-logo-dark.svg" 
                 alt="PyTorch Logo"
                 style="max-height: 48px; width: auto; background-color: #ffffff; border-radius: 8px;">
        </a>
    </div>
</div>
<hr>


**Table of contents**<a id='toc0_'></a>    
- [Dependensies](#toc1_)    
- [Classic / Foundational Architectures](#toc2_)    
  - [LeNet-5](#toc2_1_)    
    - [Manual Implementation](#toc2_1_1_)    
  - [AlexNet](#toc2_2_)    
    - [Manual Implementation](#toc2_2_1_)    
    - [Using Pytorch](#toc2_2_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependensies](#toc0_)


In [None]:
import torch
import torch.nn.functional as F
from torch import nn
from torchinfo import summary
from torchvision.models import alexnet

# <a id='toc2_'></a>[Classic and Foundational Architectures](#toc0_)


## <a id='toc2_1_'></a>[LeNet-5](#toc0_)

- One of the pioneering **Convolutional Neural Network (CNN)** architectures developed in 1998 by [Yann LeCun](https://en.wikipedia.org/wiki/Yann_LeCun) and his colleagues at [AT&T Bell Labs](https://en.wikipedia.org/wiki/Bell_Labs)
- It was introduced in the landmark paper *[Gradient-based learning applied to document recognition](https://ieeexplore.ieee.org/document/726791)*, which demonstrated end-to-end learning directly from raw pixel data
- It was designed primarily for **handwritten digit recognition**, such as ZIP code and bank check processing
- It was trained on the *[MNIST dataset](http://yann.lecun.com/exdb/mnist/)*, where the original **28√ó28 images were padded to 32√ó32** to match the network architecture [[MNIST viewer](https://observablehq.com/@davidalber/mnist-browser)]
- LeNet-5 demonstrated that CNNs can automatically learn **spatial hierarchies of features**, eliminating the need for manual feature engineering and establishing the foundation for modern computer vision systems

<div style="text-align: center; padding-top: 10px;">
    <img src="../../../assets/images/original/cnn/architectures/lenet5.svg" alt="lenet5.svg" style="min-width: 512px; width: 100%; height: auto; border-radius: 16px;">
    <p><em>Figure 1: LeNet-5 Architecture</em></p>
</div>


### <a id='toc2_1_1_'></a>[Manual Implementation](#toc0_)


In [None]:
# learnable subsampling layer (original LeNet-5 S2, S4)
class Subsampling(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)
        self.scale = nn.Parameter(torch.ones(channels))
        self.bias = nn.Parameter(torch.zeros(channels))

    def forward(self, x):
        x = self.pool(x)
        x = self.scale.view(1, -1, 1, 1) * x + self.bias.view(1, -1, 1, 1)
        return x

In [None]:
# gaussian RBF output layer (original LeNet-5)
class RBFLayer(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.centers = nn.Parameter(torch.randn(out_features, in_features))
        self.sigma = nn.Parameter(torch.ones(out_features))

    def forward(self, x):
        x = x.unsqueeze(1)  # (batch, 1, features)
        centers = self.centers.unsqueeze(0)  # (1, classes, features)
        dist = torch.sum((x - centers) ** 2, dim=2)
        return torch.exp(-dist / (2 * self.sigma**2))

In [None]:
class LeNet5(nn.Module):

    def __init__(self):
        super().__init__()

        # C1
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)

        # S2
        self.sub2 = Subsampling(6)

        # C3 (full connection approximation)
        self.conv3 = nn.Conv2d(6, 16, kernel_size=5)

        # S4
        self.sub4 = Subsampling(16)

        # C5
        self.conv5 = nn.Conv2d(16, 120, kernel_size=5)

        # F6
        self.fc6 = nn.Linear(120, 84)

        # output (RBF)
        self.rbf = RBFLayer(84, 10)

    def forward(self, x):

        # C1
        x = torch.tanh(self.conv1(x))

        # S2
        x = torch.tanh(self.sub2(x))

        # C3
        x = torch.tanh(self.conv3(x))

        # S4
        x = torch.tanh(self.sub4(x))

        # C5
        x = torch.tanh(self.conv5(x))

        # flatten
        x = x.view(x.size(0), -1)

        # F6
        x = torch.tanh(self.fc6(x))

        # output
        x = self.rbf(x)

        return x

In [None]:
# initialize a model
lenet5_manual = LeNet5()
lenet5_manual

In [None]:
# model summary report
summary(lenet5_manual, input_size=(1, 1, 32, 32), device="cpu")

## <a id='toc2_2_'></a>[AlexNet](#toc0_)

- A groundbreaking **deep Convolutional Neural Network (CNN)** architecture developed in 2012 by [Alex Krizhevsky](https://en.wikipedia.org/wiki/Alex_Krizhevsky), [Ilya Sutskever](https://en.wikipedia.org/wiki/Ilya_Sutskever), and [Geoffrey Hinton](https://en.wikipedia.org/wiki/Geoffrey_Hinton) at the [University of Toronto](https://en.wikipedia.org/wiki/University_of_Toronto)
- It was introduced in the landmark paper *[ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html)*, which revolutionized computer vision
- It achieved a breakthrough performance in the *[ImageNet Large Scale Visual Recognition Challenge (ILSVRC)](https://image-net.org/challenges/LSVRC/2012/)*, reducing the top-5 error rate from **26.2% to 15.3%**, far outperforming traditional methods
- It was trained on the *[ImageNet dataset](https://image-net.org/)*, which contains over **1.2 million high-resolution images across 1000 classes** [[ImageNet viewer](https://navigu.net/#imagenet)]
- AlexNet demonstrated that **deep CNNs trained on GPUs** can learn highly discriminative and hierarchical image representations, making deep learning the dominant approach in computer vision

**It introduced several key innovations that are still widely used today**
  - **ReLU activation function**, enabling faster training compared to sigmoid or tanh
  - **Dropout regularization**, reducing overfitting in fully connected layers
  - **GPU acceleration**, making large-scale deep network training practical
  - **Data augmentation**, improving generalization performance
  - **Overlapping max pooling**, improving feature robustness

> AlexNet marked the beginning of the **modern deep learning era in computer vision**, influencing nearly all subsequent CNN architectures such as VGG, GoogLeNet, and ResNet

<!-- <div style="text-align: center; padding-top: 10px;">
    <img src="../../../assets/images/original/cnn/architectures/alexnet.svg" alt="alexnet.svg" style="min-width: 512px; width: 100%; height: auto; border-radius: 16px;">
    <p><em>Figure 1: AlexNet Architecture</em></p>
</div> -->


### <a id='toc2_2_1_'></a>[Manual Implementation](#toc0_)


In [None]:
class AlexNetOriginal(nn.Module):
    """
    Original AlexNet implementation from the 2012 paper:
    "ImageNet Classification with Deep Convolutional Neural Networks"

    Key characteristics:
    - Input size: 227 x 227 x 3
    - Uses Local Response Normalization (LRN)
    - Uses grouped convolutions (GPU split in original paper)
    - Uses Dropout in classifier
    - Uses ReLU activation
    """

    def __init__(self, num_classes: int = 1000) -> None:
        super().__init__()

        # feature extractor
        self.features = nn.Sequential(
            # Conv1
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=0),
            nn.ReLU(inplace=True),
            nn.LocalResponseNorm(size=5, alpha=1e-4, beta=0.75, k=2),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # Conv2 (grouped)
            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2, groups=2),
            nn.ReLU(inplace=True),
            nn.LocalResponseNorm(size=5, alpha=1e-4, beta=0.75, k=2),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # Conv3
            nn.Conv2d(in_channels=256, out_channels=384, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            # Conv4 (grouped)
            nn.Conv2d(in_channels=384, out_channels=384, kernel_size=3, stride=1, padding=1, groups=2),
            nn.ReLU(inplace=True),
            # Conv5 (grouped)
            nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, stride=1, padding=1, groups=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        # classifier
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(in_features=256 * 6 * 6, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(in_features=4096, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Linear(in_features=4096, out_features=num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

In [None]:
# initialize a model
alexnet_manual = AlexNetOriginal()
alexnet_manual

In [None]:
# model summary report
summary(alexnet_manual, input_size=(1, 3, 227, 227), device="cpu")

### <a id='toc2_2_2_'></a>[Using Pytorch](#toc0_)

- AlexNet is available in PyTorch: [docs.pytorch.org/vision/stable/models/alexnet.html](https://docs.pytorch.org/vision/stable/models/alexnet.html)


In [None]:
# initialize a model
alexnet_pytorch = alexnet(weights=None)
alexnet_pytorch

In [None]:
# model summary report
summary(alexnet_pytorch, input_size=(1, 3, 224, 224), device="cpu")