📝 **Author:** Amirhossein Heydari - 📧 **Email:** amirhosseinheydari78@gmail.com - 📍 **Linktree:** [linktr.ee/mr_pylin](https://linktr.ee/mr_pylin)

---

# Dependencies
   - torchvision models:
      - class
         - brings in the model class directly
         - Allows more control and customization since you are dealing directly with the class. You can override methods, customize initialization, etc.
      - function
         - This import brings in a function that returns an instance of the model
         - Easier and quicker to use, especially for standard models
   - [pytorch.org/vision/stable/models.html](https://pytorch.org/vision/stable/models.html)

In [None]:
import torch
from torch import nn
from torchinfo import summary
from torchvision.models import AlexNet, alexnet

# AlexNet
   - One of the pioneering convolutional neural network architectures developed in 2012 by [Alex Krizhevsky](https://en.wikipedia.org/wiki/Alex_Krizhevsky), [Ilya Sutskever](https://en.wikipedia.org/wiki/Ilya_Sutskever), and [Geoffrey Hinton](https://en.wikipedia.org/wiki/Geoffrey_Hinton)
   - It is based on the [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html) paper
   - It was trained on the [ImageNet](https://www.image-net.org/) dataset (first resized to 256x256 then center cropped to 227x227) [[ImageNet viewer](https://navigu.net/#imagenet)]
   - The winner of the ImageNet Large Scale Visual Recognition Challenge ([ILSVRC](https://image-net.org/challenges/LSVRC/2012/)) in 2012

<figure style="text-align: center;">
    <img src="../../../assets/images/original/cnn/architectures/alexnet.svg" alt="alexnet-architecture.svg" style="width: 100%;">
    <figcaption>AlexNet Architecture</figcaption>
</figure>

## Custom AlexNet
   - `Softmax` is missing due to internal implementation of `LogSoftmax` in the `CrossEntropyLoss` function.

In [None]:
class CustomAlexNet(nn.Module):
    def __init__(self, num_classes: int = 1000, dropout: float = 0.5) -> None:
        super(CustomAlexNet, self).__init__()

        self.features = nn.Sequential(

            # 3x227x227 -> 64x55x55
            # trainable params: (3 * 11 * 11 + 1) * 64 = 23,296
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),

            # 64x55x55 -> 64x55x55
            # trainable params: 0
            nn.ReLU(inplace=True),

            # 64x55x55 -> 64x27x27
            # trainable params: 0
            nn.MaxPool2d(kernel_size=3, stride=2),

            # 64x27x27 -> 192x27x27
            # trainable params: (64 * 5 * 5 + 1) * 192 = 307,392
            nn.Conv2d(64, 192, kernel_size=5, padding=2),

            # 192x27x27 -> 192x27x27
            # trainable params: 0
            nn.ReLU(inplace=True),

            # 192x27x27 -> 192x13x13
            # trainable params: 0
            nn.MaxPool2d(kernel_size=3, stride=2),

            # 192x13x13 -> 384x13x13
            # trainable params: (192 * 3 * 3 + 1) * 384 = 663,936
            nn.Conv2d(192, 384, kernel_size=3, padding=1),

            # 384x13x13 -> 384x13x13
            # trainable params: 0
            nn.ReLU(inplace=True),

            # 384x13x13 -> 256x13x13
            # trainable params: (384 * 3 * 3 + 1) * 256 = 884,992
            nn.Conv2d(384, 256, kernel_size=3, padding=1),

            # 256x13x13 -> 256x13x13
            # trainable params: 0
            nn.ReLU(inplace=True),

            # 256x13x13 -> 256x13x13
            # trainable params: (256 * 3 * 3 + 1) * 256 = 590,080
            nn.Conv2d(256, 256, kernel_size=3, padding=1),

            # 256x13x13 -> 256x13x13
            # trainable params: 0
            nn.ReLU(inplace=True),

            # 256x13x13 -> 256x6x6
            # trainable params: 0
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        # 256x6x6 -> 256x6x6
        # trainable params: 0
        self.avgpool = nn.AdaptiveAvgPool2d(output_size=(6, 6))

        # flatten : 256x6x6 -> 9216
        # 9216 -> 1000
        self.classifier = nn.Sequential(

            # 9216 -> 9216
            # trainable params: 0
            nn.Dropout(p=dropout),

            # 9216 -> 4096
            # trainable params: (9216 + 1) * 4096 = 37,752,832
            nn.Linear(256 * 6 * 6, 4096),

            # 4096 -> 4096
            # trainable params: 0
            nn.ReLU(inplace=True),

            # 4096 -> 4096
            # trainable params: 0
            nn.Dropout(p=dropout),

            # 4096 -> 4096
            # trainable params: (4096 + 1) * 4096 = 16,781,312
            nn.Linear(4096, 4096),

            # 4096 -> 4096
            # trainable params: 0
            nn.ReLU(inplace=True),

            # 4096 -> 1000
            # trainable params: (4096 + 1) * 1000 = 4,097,000
            nn.Linear(4096, num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:

        # feature extractor
        x = self.features(x)

        # adaptive average pooling
        x = self.avgpool(x)

        # flatten : 256x6x6 -> 9216
        x = torch.flatten(x, start_dim=1)

        # classifier
        x = self.classifier(x)

        return x

In [None]:
model_1 = CustomAlexNet(num_classes=1000, dropout=0.5)
model_1

In [None]:
summary(model_1, (1, 3, 227, 227), device='cpu')

## PyTorch AlexNet
   - AlexNet is available in PyTorch: [pytorch.org/vision/stable/models/alexnet.html](https://pytorch.org/vision/stable/models/alexnet.html)

In [None]:
model_2 = alexnet()
model_2

In [None]:
summary(model_2, (1, 3, 227, 227), device='cpu')