📝 **Author:** Amirhossein Heydari - 📧 **Email:** <amirhosseinheydari78@gmail.com> - 📍 **Origin:** [mr-pylin/pytorch-workshop](https://github.com/mr-pylin/pytorch-workshop)

---


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [EfficientNet](#toc2_)    
  - [Custom EfficientNet](#toc2_1_)    
    - [EfficientNet-B0](#toc2_1_1_)    
      - [Initialize the Model](#toc2_1_1_1_)    
      - [Model Summary](#toc2_1_1_2_)    
    - [EfficientNet-B1](#toc2_1_2_)    
      - [Initialize the Model](#toc2_1_2_1_)    
      - [Model Summary](#toc2_1_2_2_)    
    - [EfficientNet-B2](#toc2_1_3_)    
      - [Initialize the Model](#toc2_1_3_1_)    
      - [Model Summary](#toc2_1_3_2_)    
    - [EfficientNet-B3](#toc2_1_4_)    
      - [Initialize the Model](#toc2_1_4_1_)    
      - [Model Summary](#toc2_1_4_2_)    
    - [EfficientNet-B4](#toc2_1_5_)    
      - [Initialize the Model](#toc2_1_5_1_)    
      - [Model Summary](#toc2_1_5_2_)    
    - [EfficientNet-B5](#toc2_1_6_)    
      - [Initialize the Model](#toc2_1_6_1_)    
      - [Model Summary](#toc2_1_6_2_)    
    - [EfficientNet-B6](#toc2_1_7_)    
      - [Initialize the Model](#toc2_1_7_1_)    
      - [Model Summary](#toc2_1_7_2_)    
    - [EfficientNet-B7](#toc2_1_8_)    
      - [Initialize the Model](#toc2_1_8_1_)    
      - [Model Summary](#toc2_1_8_2_)    
  - [PyTorch EfficientNet](#toc2_2_)    
    - [EfficientNet-B0](#toc2_2_1_)    
      - [Initialize the Model](#toc2_2_1_1_)    
      - [Model Summary](#toc2_2_1_2_)    
    - [EfficientNet-B1](#toc2_2_2_)    
      - [Initialize the Model](#toc2_2_2_1_)    
      - [Model Summary](#toc2_2_2_2_)    
    - [EfficientNet-B2](#toc2_2_3_)    
      - [Initialize the Model](#toc2_2_3_1_)    
      - [Model Summary](#toc2_2_3_2_)    
    - [EfficientNet-B3](#toc2_2_4_)    
      - [Initialize the Model](#toc2_2_4_1_)    
      - [Model Summary](#toc2_2_4_2_)    
    - [EfficientNet-B4](#toc2_2_5_)    
      - [Initialize the Model](#toc2_2_5_1_)    
      - [Model Summary](#toc2_2_5_2_)    
    - [EfficientNet-B5](#toc2_2_6_)    
      - [Initialize the Model](#toc2_2_6_1_)    
      - [Model Summary](#toc2_2_6_2_)    
    - [EfficientNet-B6](#toc2_2_7_)    
      - [Initialize the Model](#toc2_2_7_1_)    
      - [Model Summary](#toc2_2_7_2_)    
    - [EfficientNet-B7](#toc2_2_8_)    
      - [Initialize the Model](#toc2_2_8_1_)    
      - [Model Summary](#toc2_2_8_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)

- torchvision models:
  - class
    - brings in the model class directly
    - Allows more control and customization since you are dealing directly with the class. You can override methods, customize initialization, etc.
  - function
    - This import brings in a function that returns an instance of the model
    - Easier and quicker to use, especially for standard models
- [docs.pytorch.org/vision/stable/models.html](https://docs.pytorch.org/vision/stable/models.html)


In [None]:
import math

import torch
from torch import nn
from torch.functional import F
from torchinfo import summary
from torchvision.models import (
    EfficientNet,
    efficientnet_b0,
    efficientnet_b1,
    efficientnet_b2,
    efficientnet_b3,
    efficientnet_b4,
    efficientnet_b5,
    efficientnet_b6,
    efficientnet_b7,
)

# <a id='toc2_'></a>[EfficientNet](#toc0_)

- EfficientNet, developed in 2019 by researchers at [Google AI](https://research.google/)
- It is based on the [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/pdf/1905.11946) paper
- It was trained on the [ImageNet](https://www.image-net.org/) dataset (first resized to 256x256 then center cropped to 224x224) [[ImageNet viewer](https://navigu.net/#imagenet)]
- Known for its balance of accuracy and efficiency, achieving state-of-the-art performance while being significantly more computationally efficient than previous models
- The EfficientNet family includes several variants, such as `EfficientNet-B0` through `EfficientNet-B7`, where the number indicates the scaling factor, with B0 being the base model
- Achieved high performance in various benchmarks and demonstrated significant efficiency improvements due to the compound scaling method

<figure style="text-align: center;">
  <img src="../../../assets/images/third_party/cnn/architectures/efficientnet.svg" alt="efficientnet-architecture.svg" style="width: 100%;">
  <figcaption>Figure 2. Model Scaling. (a) is a baseline network example; (b)-(d) are conventional scaling that only increases one dimension of network width, depth, or resolution. (e) is our proposed compound scaling method that uniformly scales all three dimensions with a fixed ratio.<br>©️ Image: <a href= "https://arxiv.org/pdf/1905.11946">EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks</a></figcaption>
</figure>


## <a id='toc2_1_'></a>[Custom EfficientNet](#toc0_)

- `Softmax` is missing due to internal implementation of `LogSoftmax` in the `CrossEntropyLoss` function.


In [None]:
# swish activation function
class Swish(nn.Module):
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return x * torch.sigmoid(x)


# squeeze-and-excitation (SE) block
class SEBlock(nn.Module):
    def __init__(self, in_channels: int, reduction: int = 4):
        super().__init__()
        reduced_channels = in_channels // reduction
        self.fc1 = nn.Conv2d(in_channels, reduced_channels, 1)
        self.fc2 = nn.Conv2d(reduced_channels, in_channels, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        out = F.adaptive_avg_pool2d(x, 1)
        out = F.relu(self.fc1(out))
        out = torch.sigmoid(self.fc2(out))
        return x * out


# mobile inverted bottleneck convolution (MBConv) block
# the core building block of `EfficientNet`, originally introduced in `MobileNetV2`.
class MBConvBlock(nn.Module):
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        expand_ratio: int,
        kernel_size: int,
        stride: int,
        reduction: int = 4,
        drop_connect_rate: float = 0.2,
    ):
        super().__init__()
        self.drop_connect_rate = drop_connect_rate
        self.has_se = reduction > 0
        self.stride = stride
        mid_channels = in_channels * expand_ratio

        # expand phase
        if expand_ratio != 1:
            self.expand = nn.Conv2d(in_channels, mid_channels, 1, bias=False)
            self.bn0 = nn.BatchNorm2d(mid_channels)
            self.act0 = Swish()

        # depth-wise convolution phase
        self.depthwise = nn.Conv2d(
            mid_channels, mid_channels, kernel_size, stride, (kernel_size - 1) // 2, groups=mid_channels, bias=False
        )
        self.bn1 = nn.BatchNorm2d(mid_channels)
        self.act1 = Swish()

        # squeeze and excitation phase
        if self.has_se:
            self.se = SEBlock(mid_channels, reduction)

        # output phase
        self.project = nn.Conv2d(mid_channels, out_channels, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def drop_connect(self, x: torch.Tensor) -> torch.Tensor:
        if not self.training:
            return x
        keep_prob = 1 - self.drop_connect_rate
        random_tensor = keep_prob + torch.rand((x.shape[0], 1, 1, 1), dtype=x.dtype, device=x.device)
        binary_tensor = torch.floor(random_tensor)
        return x / keep_prob * binary_tensor

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        identity = x

        if hasattr(self, "expand"):
            x = self.expand(x)
            x = self.bn0(x)
            x = self.act0(x)

        x = self.depthwise(x)
        x = self.bn1(x)
        x = self.act1(x)

        if self.has_se:
            x = self.se(x)

        x = self.project(x)
        x = self.bn2(x)

        if self.stride == 1 and x.shape == identity.shape:
            if self.drop_connect_rate:
                x = self.drop_connect(x)
            x += identity
        return x

In [None]:
class CustomEfficientNet(nn.Module):
    def __init__(
        self,
        width_coefficient: float,
        depth_coefficient: float,
        dropout_rate: float = 0.2,
        num_classes: int = 1000,
    ):
        super().__init__()
        self.dropout_rate = dropout_rate

        def round_filters(filters: int, divisor: int = 8) -> int:
            filters *= width_coefficient
            new_filters = max(divisor, int(filters + divisor / 2) // divisor * divisor)
            if new_filters < 0.9 * filters:
                new_filters += divisor
            return int(new_filters)

        def round_repeats(repeats: int) -> int:
            return int(torch.ceil(torch.tensor(depth_coefficient * repeats)).item())

        # stem: the initial convolutional layer that processes the input image before it enters the main network architecture
        out_channels = round_filters(32)
        self.stem = nn.Sequential(
            nn.Conv2d(3, out_channels, 3, stride=2, padding=1, bias=False), nn.BatchNorm2d(out_channels), Swish()
        )

        # Blocks
        self.blocks = nn.ModuleList([])
        self.blocks.append(
            MBConvBlock(
                in_channels=out_channels, out_channels=round_filters(16), expand_ratio=1, kernel_size=3, stride=1
            )
        )

        in_channels = round_filters(16)
        block_args = [
            # (out_channels, num_repeats, kernel_size, stride, expand_ratio)
            (24, 2, 3, 2, 6),
            (40, 2, 5, 2, 6),
            (80, 3, 3, 2, 6),
            (112, 3, 5, 1, 6),
            (192, 4, 5, 2, 6),
            (320, 1, 3, 1, 6),
        ]
        for out_channels, num_repeats, kernel_size, stride, expand_ratio in block_args:
            out_channels = round_filters(out_channels)
            repeats = round_repeats(num_repeats)
            for i in range(repeats):
                if i == 0:
                    self.blocks.append(
                        MBConvBlock(
                            in_channels=in_channels,
                            out_channels=out_channels,
                            expand_ratio=expand_ratio,
                            kernel_size=kernel_size,
                            stride=stride,
                        )
                    )
                else:
                    self.blocks.append(
                        MBConvBlock(
                            in_channels=in_channels,
                            out_channels=out_channels,
                            expand_ratio=expand_ratio,
                            kernel_size=kernel_size,
                            stride=1,
                        )
                    )
                in_channels = out_channels

        # head: the final layers of the network that process the high-level features extracted by previous layers and produce the final output
        out_channels = round_filters(1280)
        self.head = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 1, bias=False),
            nn.BatchNorm2d(out_channels),
            Swish(),
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Dropout(dropout_rate),
            nn.Linear(out_channels, num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.stem(x)
        for block in self.blocks:
            x = block(x)
        x = self.head(x)
        return x

### <a id='toc2_1_1_'></a>[EfficientNet-B0](#toc0_)


#### <a id='toc2_1_1_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b0_1 = CustomEfficientNet(width_coefficient=1.0, depth_coefficient=1.0, dropout_rate=0.2)

In [None]:
efficientnet_b0_1

#### <a id='toc2_1_1_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b0_1, (1, 3, 224, 224), device="cpu")

### <a id='toc2_1_2_'></a>[EfficientNet-B1](#toc0_)


#### <a id='toc2_1_2_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b1_1 = CustomEfficientNet(width_coefficient=1.0, depth_coefficient=1.1, dropout_rate=0.2)

In [None]:
efficientnet_b1_1

#### <a id='toc2_1_2_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b1_1, (1, 3, 224, 224), device="cpu")

### <a id='toc2_1_3_'></a>[EfficientNet-B2](#toc0_)


#### <a id='toc2_1_3_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b2_1 = CustomEfficientNet(width_coefficient=1.1, depth_coefficient=1.2, dropout_rate=0.3)

In [None]:
efficientnet_b2_1

#### <a id='toc2_1_3_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b2_1, (1, 3, 224, 224), device="cpu")

### <a id='toc2_1_4_'></a>[EfficientNet-B3](#toc0_)


#### <a id='toc2_1_4_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b3_1 = CustomEfficientNet(width_coefficient=1.2, depth_coefficient=1.4, dropout_rate=0.3)

In [None]:
efficientnet_b3_1

#### <a id='toc2_1_4_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b3_1, (1, 3, 224, 224), device="cpu")

### <a id='toc2_1_5_'></a>[EfficientNet-B4](#toc0_)


#### <a id='toc2_1_5_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b4_1 = CustomEfficientNet(width_coefficient=1.4, depth_coefficient=1.8, dropout_rate=0.4)

In [None]:
efficientnet_b4_1

#### <a id='toc2_1_5_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b4_1, (1, 3, 224, 224), device="cpu")

### <a id='toc2_1_6_'></a>[EfficientNet-B5](#toc0_)


#### <a id='toc2_1_6_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b5_1 = CustomEfficientNet(width_coefficient=1.6, depth_coefficient=2.2, dropout_rate=0.4)

In [None]:
efficientnet_b5_1

#### <a id='toc2_1_6_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b5_1, (1, 3, 224, 224), device="cpu")

### <a id='toc2_1_7_'></a>[EfficientNet-B6](#toc0_)


#### <a id='toc2_1_7_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b6_1 = CustomEfficientNet(width_coefficient=1.8, depth_coefficient=2.6, dropout_rate=0.5)

In [None]:
efficientnet_b6_1

#### <a id='toc2_1_7_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b6_1, (1, 3, 224, 224), device="cpu")

### <a id='toc2_1_8_'></a>[EfficientNet-B7](#toc0_)


#### <a id='toc2_1_8_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b7_1 = CustomEfficientNet(width_coefficient=2.0, depth_coefficient=3.1, dropout_rate=0.5)

In [None]:
efficientnet_b7_1

#### <a id='toc2_1_8_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b7_1, (1, 3, 224, 224), device="cpu")

## <a id='toc2_2_'></a>[PyTorch EfficientNet](#toc0_)

- EfficientNet is available in PyTorch: [docs.pytorch.org/vision/stable/models/efficientnet.html](https://docs.pytorch.org/vision/stable/models/efficientnet.html)


### <a id='toc2_2_1_'></a>[EfficientNet-B0](#toc0_)


#### <a id='toc2_2_1_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b0_2 = efficientnet_b0()

In [None]:
efficientnet_b0_2

#### <a id='toc2_2_1_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b0_2, (1, 3, 224, 224), device="cpu")

### <a id='toc2_2_2_'></a>[EfficientNet-B1](#toc0_)


#### <a id='toc2_2_2_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b1_2 = efficientnet_b1()

In [None]:
efficientnet_b1_2

#### <a id='toc2_2_2_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b1_2, (1, 3, 224, 224), device="cpu")

### <a id='toc2_2_3_'></a>[EfficientNet-B2](#toc0_)


#### <a id='toc2_2_3_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b2_2 = efficientnet_b2()

In [None]:
efficientnet_b2_2

#### <a id='toc2_2_3_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b2_2, (1, 3, 224, 224), device="cpu")

### <a id='toc2_2_4_'></a>[EfficientNet-B3](#toc0_)


#### <a id='toc2_2_4_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b3_2 = efficientnet_b3()

In [None]:
efficientnet_b3_2

#### <a id='toc2_2_4_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b3_2, (1, 3, 224, 224), device="cpu")

### <a id='toc2_2_5_'></a>[EfficientNet-B4](#toc0_)


#### <a id='toc2_2_5_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b4_2 = efficientnet_b4()

In [None]:
efficientnet_b4_2

#### <a id='toc2_2_5_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b4_2, (1, 3, 224, 224), device="cpu")

### <a id='toc2_2_6_'></a>[EfficientNet-B5](#toc0_)


#### <a id='toc2_2_6_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b5_2 = efficientnet_b5()

In [None]:
efficientnet_b5_2

#### <a id='toc2_2_6_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b5_2, (1, 3, 224, 224), device="cpu")

### <a id='toc2_2_7_'></a>[EfficientNet-B6](#toc0_)


#### <a id='toc2_2_7_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b6_2 = efficientnet_b6()

In [None]:
efficientnet_b6_2

#### <a id='toc2_2_7_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b6_2, (1, 3, 224, 224), device="cpu")

### <a id='toc2_2_8_'></a>[EfficientNet-B7](#toc0_)


#### <a id='toc2_2_8_1_'></a>[Initialize the Model](#toc0_)


In [None]:
efficientnet_b7_2 = efficientnet_b7()

In [None]:
efficientnet_b7_2

#### <a id='toc2_2_8_2_'></a>[Model Summary](#toc0_)


In [None]:
summary(efficientnet_b7_2, (1, 3, 224, 224), device="cpu")