📝 **Author:** Amirhossein Heydari - 📧 **Email:** <amirhosseinheydari78@gmail.com> - 📍 **Origin:** [mr-pylin/pytorch-workshop](https://github.com/mr-pylin/pytorch-workshop)

---


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [Model Creation](#toc2_)    
  - [Built-in Models](#toc2_1_)    
    - [Torchvision Models](#toc2_1_1_)    
    - [Torchaudio Models](#toc2_1_2_)    
  - [Hugging Face](#toc2_2_)    
  - [Custom Models](#toc2_3_)    
    - [Generate Artificial Data](#toc2_3_1_)    
    - [Sequential Model](#toc2_3_2_)    
      - [Example 1: Using `nn.Sequential`](#toc2_3_2_1_)    
      - [Example 2: Using `nn.ModuleList`](#toc2_3_2_2_)    
      - [Example 3: Mix of `nn.Sequential` and `nn.ModuleList`](#toc2_3_2_3_)    
    - [Non-Sequential (Functional) Model](#toc2_3_3_)    
      - [`torch.Tensor` vs. `torch.nn.Parameter`](#toc2_3_3_1_)    
        - [`torch.Tensor`](#toc2_3_3_1_1_)    
        - [`torch.nn.Parameter`](#toc2_3_3_1_2_)    
      - [Example 1: Using `nn.Linear`, `nn.Conv2d`, ...](#toc2_3_3_2_)    
      - [Example 2: Mix of Sequential and Non-sequential methods](#toc2_3_3_3_)    
      - [Example 3: Separate Class for each Module](#toc2_3_3_4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)


In [None]:
import torch
from torch import nn
from torchaudio.models import hubert_base, wav2vec2_base
from torchinfo import summary
from torchvision import models

In [None]:
# set a seed for deterministic results
seed = 42

# <a id='toc2_'></a>[Model Creation](#toc0_)

## <a id='toc2_1_'></a>[Built-in Models](#toc0_)

- PyTorch provides a variety of **pre-trained models** for different tasks, including **image classification**, **object detection**, **segmentation**, and **audio processing**.
- These models are available directly in libraries like `torchvision`, `torchaudio`, and `torchtext`, making it easier to leverage state-of-the-art architectures.

📝 **Docs**:

- Torchvision Models: [pytorch.org/vision/stable/models.html](https://pytorch.org/vision/stable/models.html)
- Torchaudio Models: [pytorch.org/audio/stable/models.html](https://pytorch.org/audio/stable/models.html)
- Torchtext Models: [pytorch.org/text/stable/models.html](https://pytorch.org/text/stable/models.html)
- Check Manual Implementations with details in [**../models/**](../models/) directory.


### <a id='toc2_1_1_'></a>[Torchvision Models](#toc0_)

- This is a **subset** of available pre-trained models in `torchvision`.

<table style="margin:0 auto;">
  <thead>
    <tr>
      <th>Task</th>
      <th>Model</th>
      <th>Type</th>
      <th>Input Format</th>
      <th>Import Path</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td rowspan="6">Image Classification</td>
      <td>ResNet-18, ResNet-50</td>
      <td>Residual Network</td>
      <td>RGB images (224x224)</td>
      <td style="font-family: monospace;">torchvision.models</td>
    </tr>
    <tr>
      <td>VGG-16, VGG-19</td>
      <td>CNN</td>
      <td>RGB images (224x224)</td>
      <td style="font-family: monospace;">torchvision.models</td>
    </tr>
    <tr>
      <td>DenseNet-121, DenseNet-161</td>
      <td>Dense Network</td>
      <td>RGB images (224x224)</td>
      <td style="font-family: monospace;">torchvision.models</td>
    </tr>
    <tr>
      <td>MobileNetV2, MobileNetV3</td>
      <td>Lightweight CNN</td>
      <td>RGB images (224x224)</td>
      <td style="font-family: monospace;">torchvision.models</td>
    </tr>
    <tr>
      <td>Inception v3</td>
      <td>Inception Network</td>
      <td>RGB images (299x299)</td>
      <td style="font-family: monospace;">torchvision.models</td>
    </tr>
    <tr>
      <td>EfficientNet</td>
      <td>Efficient Network</td>
      <td>RGB images (224x224)</td>
      <td style="font-family: monospace;">torchvision.models</td>
    </tr>
    <tr>
      <td rowspan="4">Object Detection & Segmentation</td>
      <td>Faster R-CNN (ResNet-50)</td>
      <td>Region Proposal Network</td>
      <td>RGB images (varied sizes)</td>
      <td style="font-family: monospace;">torchvision.models.detection</td>
    </tr>
    <tr>
      <td>Mask R-CNN (ResNet-50)</td>
      <td>Instance Segmentation</td>
      <td>RGB images (varied sizes)</td>
      <td style="font-family: monospace;">torchvision.models.detection</td>
    </tr>
    <tr>
      <td>RetinaNet</td>
      <td>Single-stage Object Detection</td>
      <td>RGB images (varied sizes)</td>
      <td style="font-family: monospace;">torchvision.models.detection</td>
    </tr>
    <tr>
      <td>Keypoint R-CNN</td>
      <td>Keypoint Detection</td>
      <td>RGB images (varied sizes)</td>
      <td style="font-family: monospace;">torchvision.models.detection</td>
    </tr>
    <tr>
      <td rowspan="2">Semantic Segmentation</td>
      <td>DeepLabV3</td>
      <td>Atrous Convolution Network</td>
      <td>RGB images (224x224)</td>
      <td style="font-family: monospace;">torchvision.models.segmentation</td>
    </tr>
    <tr>
      <td>FCN (Fully Convolutional Network)</td>
      <td>Fully Convolutional Network</td>
      <td>RGB images (224x224)</td>
      <td style="font-family: monospace;">torchvision.models.segmentation</td>
    </tr>
    <tr>
      <td rowspan="2">Video Classification</td>
      <td>ResNet3D</td>
      <td>3D Convolution Network</td>
      <td>Video (varied sizes)</td>
      <td style="font-family: monospace;">torchvision.models.video</td>
    </tr>
    <tr>
      <td>Swin Transformer 3D</td>
      <td>Transformer-based 3D Video Classification</td>
      <td>Video (varied sizes)</td>
      <td style="font-family: monospace;">torchvision.models.video</td>
    </tr>
  </tbody>
</table>


In [None]:
resnet50 = models.resnet50(weights=None)

# log
print(resnet50)

In [None]:
fasterrcnn_resnet50 = models.detection.fasterrcnn_resnet50_fpn(weights=None)

# log
print(fasterrcnn_resnet50)

In [None]:
deeplabv3 = models.segmentation.deeplabv3_resnet50(weights=None)

# log
print(deeplabv3)

### <a id='toc2_1_2_'></a>[Torchaudio Models](#toc0_)

- This is a **subset** of available pre-trained models in `torchaudio`.

<table style="margin:0 auto;">
  <thead>
    <tr>
      <th>Task</th>
      <th>Model</th>
      <th>Type</th>
      <th>Input Format</th>
      <th>Import Path</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td rowspan="4">Speech Recognition</td>
      <td>Wav2Vec2</td>
      <td>Self-Supervised Speech Model</td>
      <td>Waveform (1D Tensor)</td>
      <td style="font-family: monospace;">torchaudio.models</td>
    </tr>
    <tr>
      <td>Hubert</td>
      <td>Self-Supervised Speech Model</td>
      <td>Waveform (1D Tensor)</td>
      <td style="font-family: monospace;">torchaudio.models</td>
    </tr>
    <tr>
      <td>DeepSpeech</td>
      <td>End-to-End Speech Recognition</td>
      <td>Waveform (1D Tensor)</td>
      <td style="font-family: monospace;">torchaudio.models.deepspeech</td>
    </tr>
    <tr>
      <td>Conformer</td>
      <td>Convolution-augmented Transformer</td>
      <td>Waveform (1D Tensor)</td>
      <td style="font-family: monospace;">torchaudio.models.conformer</td>
    </tr>
  </tbody>
</table>


In [None]:
wav2vec2 = wav2vec2_base()

# log
print(wav2vec2)

In [None]:
hubert = hubert_base()

# log
print(hubert)

## <a id='toc2_2_'></a>[Hugging Face](#toc0_)

- Hugging Face offers a vast collection of models and pretrained weights.
- It is renowned for state-of-the-art models in NLP, computer vision, and more.
- Models include transformers, BERT, GPT, and many others (pretrained on large datasets).
- They can be fine-tuned for specific tasks using regular pytorch code.

📝 **Docs**:

- Documentations: [huggingface.co/docs](https://huggingface.co/docs)


## <a id='toc2_3_'></a>[Custom Models](#toc0_)

- PyTorch allows you to define **custom** models by extending the `torch.nn.Module` class.
- To create a custom model, subclass `torch.nn.Module` and implement the `__init__` and `forward` methods, where `__init__` initializes the layers, and `forward` defines the computation that takes place when the model is called.

📝 **Docs**:

- Building Models with PyTorch: [pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html](https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html)
- Build the Neural Network: [pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
- Neural Networks: [pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial)
- `torch.nn`: [docs.pytorch.org/docs/stable/nn.html](https://docs.pytorch.org/docs/stable/nn.html)
- `nn.Module`: [docs.pytorch.org/docs/stable/generated/torch.nn.Module.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html)
- `nn.Sequential`: [docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html)
- `nn.ModuleList`: [docs.pytorch.org/docs/stable/generated/torch.nn.ModuleList.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.ModuleList.html)


### <a id='toc2_3_1_'></a>[Generate Artificial Data](#toc0_)

In [None]:
batch_size, num_features, num_classes = 4, 6, 2

x = torch.randn(size=(batch_size, num_features))
y = torch.randint(low=0, high=num_classes, size=(batch_size,))

# log
print(f"x:\n{x}\n")
print(f"y:\n{y}")

### <a id='toc2_3_2_'></a>[Sequential Model](#toc0_)

- **Overview**:
  - Ideal for **simpler** models where layers are stacked in a **linear** sequence.
  - The `torch.nn.Sequential` class allows you to stack layers in a sequence, passing the output of one layer directly to the next.
  - Suitable for straightforward models like fully-connected neural networks or basic CNNs.
  - `nn.ModuleList` provides more flexibility, allowing dynamic layer configurations and custom forward passes.

- **Key Points**:
  - **nn.Sequential**:
    - Layers are defined in the order they are passed to `Sequential`.
    - No need to manually define the `forward` method; PyTorch handles it for you.
  - **nn.ModuleList**:
    - Layers are stored in a list-like structure.
    - You have full control over the forward pass, allowing for more complex architectures.

#### <a id='toc2_3_2_1_'></a>[Example 1: Using `nn.Sequential`](#toc0_)

In [None]:
torch.manual_seed(seed)

In [58]:
sequential_model_1 = nn.Sequential(nn.Linear(num_features, 20), nn.ReLU(), nn.Linear(20, num_classes))

In [None]:
# log
print(sequential_model_1)

In [None]:
# summary
summary(sequential_model_1, input_size=(batch_size, num_features), device="cpu")

In [None]:
# feed-forward
sequential_model_1(x)

#### <a id='toc2_3_2_2_'></a>[Example 2: Using `nn.ModuleList`](#toc0_)

In [None]:
torch.manual_seed(seed)

In [63]:
sequential_model_2 = nn.ModuleList([nn.Linear(num_features, 20), nn.ReLU(), nn.Linear(20, num_classes)])

In [None]:
# log
print(sequential_model_2)

In [None]:
# feed-forward
def forward(layers: nn.ModuleList, x: torch.Tensor) -> torch.Tensor:
    for layer in layers:
        x = layer(x)
    return x


forward(sequential_model_2, x)

#### <a id='toc2_3_2_3_'></a>[Example 3: Mix of `nn.Sequential` and `nn.ModuleList`](#toc0_)

In [None]:
sequential_model_3 = nn.ModuleList(
    [
        nn.Linear(num_features, 10),
        nn.Sequential(
            nn.ReLU(),
            nn.Linear(10, 20),
            nn.ReLU(),
        ),
        nn.Linear(20, num_classes),
    ]
)

In [None]:
# log
print(sequential_model_3)

In [None]:
# feed-forward
def forward(layers: nn.ModuleList, x: torch.Tensor) -> torch.Tensor:
    for layer in layers:
        x = layer(x)
    return x


forward(sequential_model_3, x)

### <a id='toc2_3_3_'></a>[Non-Sequential (Functional) Model](#toc0_)

- **Overview**:

  - Allows for **complex** architectures with **non-linear** layer connections (e.g., skip connections in ResNet).
  - Models are created by subclassing `torch.nn.Module`.
  - Enables the definition of any neural network architecture, from simple feedforward networks to complex architectures like GANs or transformers.

- **Key Points**:

  - Use `torch.nn.Module` as the parent class and implement the `forward` method.


#### <a id='toc2_3_3_1_'></a>[`torch.Tensor` vs. `torch.nn.Parameter`](#toc0_)


##### <a id='toc2_3_3_1_1_'></a>[`torch.Tensor`](#toc0_)

- **Definition**: A general-purpose tensor used to store data in PyTorch.
- **Gradient Tracking**: Gradients are only tracked if `requires_grad=True`.
- **Optimization**: It is not automatically registered as a parameter in a model when assigned as an attribute.
- **Use Case**: Storing data, intermediate computations, or tensors that do not need to be optimized during training.
- **Integration with Optimizer**: Must be explicitly added to the optimizer (if `requires_grad=True`).


In [None]:
class CustomModel1(nn.Module):
    def __init__(self, num_features, num_classes):
        super().__init__()

        # manually define weights as nn.Parameter
        self.weight1 = torch.randn(num_features, 20)
        self.bias1 = torch.randn(20)
        self.weight2 = torch.randn(20, num_classes)
        self.bias2 = torch.randn(num_classes)

    def forward(self, x):
        x = torch.matmul(x, self.weight1) + self.bias1  # x * weight1 + bias1
        x = torch.relu(x)
        x = torch.matmul(x, self.weight2) + self.bias2
        return x


# initialization
functional_model_1 = CustomModel1(num_features, num_classes)

In [None]:
# log
print(functional_model_1)

In [None]:
# summary
summary(functional_model_1, input_size=(batch_size, num_features), device="cpu")

In [None]:
# feed-forward
functional_model_1(x)

##### <a id='toc2_3_3_1_2_'></a>[`torch.nn.Parameter`](#toc0_)

- **Definition**: A subclass of `torch.Tensor` specifically designed to represent learnable parameters in `torch.nn.Module`.
- **Gradient Tracking**: Always tracks gradients (<code>requires_grad=True</code> by default).
- **Optimization**: It is automatically registered as a parameter of the model if assigned as an attribute to a subclass of `torch.nn.Module`.
- **Use Case**: Learnable weights or biases of a model.
- **Integration with Optimizer**: Automatically included in `model.parameters()` when assigned as an attribute to an `nn.Module`.


In [None]:
class CustomModel2(nn.Module):
    def __init__(self, num_features, num_classes):
        super().__init__()

        # manually define weights as nn.Parameter
        self.weight1 = nn.Parameter(torch.randn(num_features, 20))
        self.bias1 = nn.Parameter(torch.randn(20))
        self.weight2 = nn.Parameter(torch.randn(20, num_classes))
        self.bias2 = nn.Parameter(torch.randn(num_classes))

    def forward(self, x):
        x = torch.matmul(x, self.weight1) + self.bias1  # x * weight1 + bias1
        x = torch.relu(x)
        x = torch.matmul(x, self.weight2) + self.bias2
        return x


# initialization
functional_model_2 = CustomModel2(num_features, num_classes)

In [None]:
# log
print(functional_model_2)

In [None]:
# summary
summary(functional_model_2, input_size=(batch_size, num_features), device="cpu")

In [None]:
# feed-forward
functional_model_2(x)

#### <a id='toc2_3_3_2_'></a>[Example 1: Using `nn.Linear`, `nn.Conv2d`, ...](#toc0_)

In [None]:
class CustomModel3(nn.Module):
    def __init__(self, num_features, num_classes):
        super().__init__()

        # define linear transformation layers using `nn.Linear`
        self.fc1 = nn.Linear(num_features, 20)
        self.fc2 = nn.Linear(20, num_classes)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x


# initialization
functional_model_3 = CustomModel3(num_features, num_classes)

In [None]:
# log
print(functional_model_3)

In [None]:
# summary
summary(functional_model_3, input_size=(batch_size, num_features), device="cpu")

In [None]:
# feed-forward
functional_model_3(x)

#### <a id='toc2_3_3_3_'></a>[Example 2: Mix of Sequential and Non-sequential methods](#toc0_)

In [None]:
class CustomModel4(nn.Module):
    def __init__(self, num_classes):
        super().__init__()

        # feature extractor
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        # classifier
        self.classifier = nn.Sequential(
            nn.Flatten(start_dim=1), nn.Linear(32 * 8 * 8, 128), nn.ReLU(), nn.Linear(128, num_classes)
        )

    def forward(self, x):
        x = self.feature_extractor(x)
        x = self.classifier(x)
        return x


# initialization
functional_model_4 = CustomModel4(10)

In [None]:
# log
print(functional_model_4)

In [None]:
# summary
summary(functional_model_4, input_size=(1, 3, 32, 32), device="cpu")

#### <a id='toc2_3_3_4_'></a>[Example 3: Separate Class for each Module](#toc0_)

In [None]:
class FeatureExtractor(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

    def forward(self, x):
        return self.fc(x)


class Classifier(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Flatten(start_dim=1), nn.Linear(32 * 8 * 8, 128), nn.ReLU(), nn.Linear(128, num_classes)
        )

    def forward(self, x):
        return self.fc(x)

In [85]:
class CustomModel5(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.feature_extractor = FeatureExtractor()
        self.classifier = Classifier(num_classes)

    def forward(self, x):
        x = self.feature_extractor(x)
        x = self.classifier(x)
        return x


# initialization
functional_model_5 = CustomModel5(10)

In [None]:
# log
print(functional_model_5)

In [None]:
# summary
summary(functional_model_5, input_size=(1, 3, 32, 32), device="cpu")