📝 **Author:** Amirhossein Heydari - 📧 **Email:** amirhosseinheydari78@gmail.com - 📍 **Linktree:** [linktr.ee/mr_pylin](https://linktr.ee/mr_pylin)

---

# Dependencies
   - torchvision models:
      - class
         - brings in the model class directly
         - Allows more control and customization since you are dealing directly with the class. You can override methods, customize initialization, etc.
      - function
         - This import brings in a function that returns an instance of the model
         - Easier and quicker to use, especially for standard models
   - [pytorch.org/vision/stable/models.html](https://pytorch.org/vision/stable/models.html)

In [None]:
import math

import torch
from torch import nn
from torch.functional import F
from torchinfo import summary
from torchvision.models import (EfficientNet, efficientnet_b0, efficientnet_b1,
                                efficientnet_b2, efficientnet_b3,
                                efficientnet_b4, efficientnet_b5,
                                efficientnet_b6, efficientnet_b7)

# EfficientNet
   - EfficientNet, developed in 2019 by researchers at [Google AI](https://research.google/)
   - It is based on the [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/pdf/1905.11946) paper
   - It was trained on the [ImageNet](https://www.image-net.org/) dataset (first resized to 256x256 then center cropped to 224x224) [[ImageNet viewer](https://navigu.net/#imagenet)]
   - Known for its balance of accuracy and efficiency, achieving state-of-the-art performance while being significantly more computationally efficient than previous models
   - The EfficientNet family includes several variants, such as `EfficientNet-B0` through `EfficientNet-B7`, where the number indicates the scaling factor, with B0 being the base model
   - Achieved high performance in various benchmarks and demonstrated significant efficiency improvements due to the compound scaling method

<figure style="text-align: center;">
    <img src="../../../assets/images/third_party/efficientnet-architecture.svg" alt="efficientnet-architecture.svg" style="width: 100%;">
    <figcaption>Figure 2. Model Scaling. (a) is a baseline network example; (b)-(d) are conventional scaling that only increases one dimension of network width, depth, or resolution. (e) is our proposed compound scaling method that uniformly scales all three dimensions with a fixed ratio.<br>©️ Image: <a href= "https://arxiv.org/pdf/1905.11946">EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks</a></figcaption>
</figure>

## Custom EfficientNet
   - `Softmax` is missing due to internal implementation of `LogSoftmax` in the `CrossEntropyLoss` function.

In [None]:
# swish activation function
class Swish(nn.Module):
    def forward(self, x):
        return x * torch.sigmoid(x)

# squeeze-and-excitation (SE) block
class SEBlock(nn.Module):
    def __init__(self, in_channels, reduction=4):
        super(SEBlock, self).__init__()
        reduced_channels = in_channels // reduction
        self.fc1 = nn.Conv2d(in_channels, reduced_channels, 1)
        self.fc2 = nn.Conv2d(reduced_channels, in_channels, 1)

    def forward(self, x):
        out = F.adaptive_avg_pool2d(x, 1)
        out = F.relu(self.fc1(out))
        out = torch.sigmoid(self.fc2(out))
        return x * out

# mobile inverted bottleneck convolution (MBConv) block
# the core building block of `EfficientNet`, originally introduced in `MobileNetV2`.
class MBConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, expand_ratio, kernel_size, stride, reduction=4, drop_connect_rate=0.2):
        super(MBConvBlock, self).__init__()
        self.drop_connect_rate = drop_connect_rate
        self.has_se = reduction > 0
        self.stride = stride
        mid_channels = in_channels * expand_ratio

        # expand phase
        if expand_ratio != 1:
            self.expand = nn.Conv2d(in_channels, mid_channels, 1, bias=False)
            self.bn0 = nn.BatchNorm2d(mid_channels)
            self.act0 = Swish()

        # depth-wise convolution phase
        self.depthwise = nn.Conv2d(mid_channels, mid_channels, kernel_size, stride, (kernel_size-1)//2, groups=mid_channels, bias=False)
        self.bn1 = nn.BatchNorm2d(mid_channels)
        self.act1 = Swish()

        # squeeze and excitation phase
        if self.has_se:
            self.se = SEBlock(mid_channels, reduction)

        # output phase
        self.project = nn.Conv2d(mid_channels, out_channels, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def drop_connect(self, x):
        if not self.training:
            return x
        keep_prob = 1 - self.drop_connect_rate
        random_tensor = keep_prob + torch.rand((x.shape[0], 1, 1, 1), dtype=x.dtype, device=x.device)
        binary_tensor = torch.floor(random_tensor)
        return x / keep_prob * binary_tensor

    def forward(self, x):
        identity = x

        if hasattr(self, 'expand'):
            x = self.expand(x)
            x = self.bn0(x)
            x = self.act0(x)

        x = self.depthwise(x)
        x = self.bn1(x)
        x = self.act1(x)

        if self.has_se:
            x = self.se(x)

        x = self.project(x)
        x = self.bn2(x)

        if self.stride == 1 and x.shape == identity.shape:
            if self.drop_connect_rate:
                x = self.drop_connect(x)
            x += identity
        return x

In [None]:
class CustomEfficientNet(nn.Module):
    def __init__(self, width_coefficient, depth_coefficient, dropout_rate=0.2, num_classes=1000):
        super(CustomEfficientNet, self).__init__()
        self.dropout_rate = dropout_rate

        def round_filters(filters, divisor=8):
            filters *= width_coefficient
            new_filters = max(divisor, int(filters + divisor / 2) // divisor * divisor)
            if new_filters < 0.9 * filters:
                new_filters += divisor
            return int(new_filters)

        def round_repeats(repeats):
            return int(torch.ceil(torch.tensor(depth_coefficient * repeats)).item())

        # stem: the initial convolutional layer that processes the input image before it enters the main network architecture
        out_channels = round_filters(32)
        self.stem = nn.Sequential(
            nn.Conv2d(3, out_channels, 3, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(out_channels),
            Swish()
        )

        # Blocks
        self.blocks = nn.ModuleList([])
        self.blocks.append(MBConvBlock(in_channels=out_channels, out_channels=round_filters(16), expand_ratio=1, kernel_size=3, stride=1))
        
        in_channels = round_filters(16)
        block_args = [
            # (out_channels, num_repeats, kernel_size, stride, expand_ratio)
            (24, 2, 3, 2, 6),
            (40, 2, 5, 2, 6),
            (80, 3, 3, 2, 6),
            (112, 3, 5, 1, 6),
            (192, 4, 5, 2, 6),
            (320, 1, 3, 1, 6)
        ]
        for (out_channels, num_repeats, kernel_size, stride, expand_ratio) in block_args:
            out_channels = round_filters(out_channels)
            repeats = round_repeats(num_repeats)
            for i in range(repeats):
                if i == 0:
                    self.blocks.append(MBConvBlock(in_channels=in_channels, out_channels=out_channels, expand_ratio=expand_ratio, kernel_size=kernel_size, stride=stride))
                else:
                    self.blocks.append(MBConvBlock(in_channels=in_channels, out_channels=out_channels, expand_ratio=expand_ratio, kernel_size=kernel_size, stride=1))
                in_channels = out_channels

        # head: the final layers of the network that process the high-level features extracted by previous layers and produce the final output
        out_channels = round_filters(1280)
        self.head = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            Swish(),
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Dropout(dropout_rate),
            nn.Linear(out_channels, num_classes)
        )

    def forward(self, x):
        x = self.stem(x)
        for block in self.blocks:
            x = block(x)
        x = self.head(x)
        return x

### EfficientNet-B0

In [38]:
efficientnet_b0_1 = CustomEfficientNet(width_coefficient=1.0, depth_coefficient=1.0, dropout_rate=0.2, num_classes=1000)
efficientnet_b0_1

CustomEfficientNet(
  (stem): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): Swish()
  )
  (blocks): ModuleList(
    (0): MBConvBlock(
      (depthwise): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): Swish()
      (se): SEBlock(
        (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
        (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
      )
      (project): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): MBConvBlock(
      (expand): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn0): BatchNorm2d(96, eps=1e-05, momentum=0.1, affi

In [39]:
summary(efficientnet_b0_1, (1, 3, 224, 224), device='cpu')

Layer (type:depth-idx)                   Output Shape              Param #
CustomEfficientNet                       [1, 1000]                 --
├─Sequential: 1-1                        [1, 32, 112, 112]         --
│    └─Conv2d: 2-1                       [1, 32, 112, 112]         864
│    └─BatchNorm2d: 2-2                  [1, 32, 112, 112]         64
│    └─Swish: 2-3                        [1, 32, 112, 112]         --
├─ModuleList: 1-2                        --                        --
│    └─MBConvBlock: 2-4                  [1, 16, 112, 112]         --
│    │    └─Conv2d: 3-1                  [1, 32, 112, 112]         288
│    │    └─BatchNorm2d: 3-2             [1, 32, 112, 112]         64
│    │    └─Swish: 3-3                   [1, 32, 112, 112]         --
│    │    └─SEBlock: 3-4                 [1, 32, 112, 112]         552
│    │    └─Conv2d: 3-5                  [1, 16, 112, 112]         512
│    │    └─BatchNorm2d: 3-6             [1, 16, 112, 112]         32
│    └─MBCo

### EfficientNet-B1

In [None]:
efficientnet_b1_1 = CustomEfficientNet(width_coefficient=1.0, depth_coefficient=1.1, dropout_rate=0.2, num_classes=1000)
efficientnet_b1_1

In [None]:
ientnet_b1_1 = CustomEfficientNet(width_coefficient=1.0, depth_coefficient=1.1, dropout_rate=0.2, num_classes=1000)
efficientnet_b1_1

In [None]:
summary(efficientnet_b1_1, (1, 3, 224, 224), device='cpu')

### EfficientNet-B2

In [None]:
efficientnet_b2_1 = CustomEfficientNet(width_coefficient=1.1, depth_coefficient=1.2, dropout_rate=0.3, num_classes=1000)
efficientnet_b2_1

In [None]:
summary(efficientnet_b2_1, (1, 3, 224, 224), device='cpu')

### EfficientNet-B3

In [None]:
efficientnet_b3_1 = CustomEfficientNet(width_coefficient=1.2, depth_coefficient=1.4, dropout_rate=0.3, num_classes=1000)
efficientnet_b3_1

In [None]:
summary(efficientnet_b3_1, (1, 3, 224, 224), device='cpu')

### EfficientNet-B4

In [None]:
efficientnet_b4_1 = CustomEfficientNet(width_coefficient=1.4, depth_coefficient=1.8, dropout_rate=0.4, num_classes=1000)
efficientnet_b4_1

In [None]:
summary(efficientnet_b4_1, (1, 3, 224, 224), device='cpu')

### EfficientNet-B5

In [None]:
efficientnet_b5_1 = CustomEfficientNet(width_coefficient=1.6, depth_coefficient=2.2, dropout_rate=0.4, num_classes=1000)
efficientnet_b5_1

In [None]:
summary(efficientnet_b5_1, (1, 3, 224, 224), device='cpu')

### EfficientNet-B6

In [None]:
efficientnet_b6_1 = CustomEfficientNet(width_coefficient=1.8, depth_coefficient=2.6, dropout_rate=0.5, num_classes=1000)
efficientnet_b6_1

In [None]:
summary(efficientnet_b6_1, (1, 3, 224, 224), device='cpu')

### EfficientNet-B7

In [None]:
efficientnet_b7_1 = CustomEfficientNet(width_coefficient=2.0, depth_coefficient=3.1, dropout_rate=0.5, num_classes=1000)
efficientnet_b7_1

In [None]:
summary(efficientnet_b7_1, (1, 3, 224, 224), device='cpu')

## PyTorch EfficientNet
   - EfficientNet is available in PyTorch: [pytorch.org/vision/main/models/efficientnet.html](https://pytorch.org/vision/main/models/efficientnet.html)

### EfficientNet-B0

In [None]:
efficientnet_b0_2 = efficientnet_b0()
efficientnet_b0_2

In [None]:
summary(efficientnet_b0_2, (1, 3, 224, 224), device='cpu')

### EfficientNet-B1

In [None]:
efficientnet_b1_2 = efficientnet_b1()
efficientnet_b1_2

In [None]:
summary(efficientnet_b1_2, (1, 3, 224, 224), device='cpu')

### EfficientNet-B2

In [None]:
efficientnet_b2_2 = efficientnet_b2()
efficientnet_b2_2

In [None]:
summary(efficientnet_b2_2, (1, 3, 224, 224), device='cpu')

### EfficientNet-B3

In [None]:
efficientnet_b3_2 = efficientnet_b3()
efficientnet_b3_2

In [None]:
summary(efficientnet_b3_2, (1, 3, 224, 224), device='cpu')

### EfficientNet-B4

In [None]:
efficientnet_b4_2 = efficientnet_b4()
efficientnet_b4_2

In [None]:
summary(efficientnet_b4_2, (1, 3, 224, 224), device='cpu')

### EfficientNet-B5

In [None]:
efficientnet_b5_2 = efficientnet_b5()
efficientnet_b5_2

In [None]:
summary(efficientnet_b5_2, (1, 3, 224, 224), device='cpu')

### EfficientNet-B6

In [None]:
efficientnet_b6_2 = efficientnet_b6()
efficientnet_b6_2

In [None]:
summary(efficientnet_b6_2, (1, 3, 224, 224), device='cpu')

### EfficientNet-B7

In [None]:
efficientnet_b7_2 = efficientnet_b7()
efficientnet_b7_2

In [None]:
summary(efficientnet_b7_2, (1, 3, 224, 224), device='cpu')