### Creating a Custom Model by Inheriting from `nn.Module`

* Build a model based on a Linear layer with 784 input features and 100 output features, followed by a ReLU activation layer, and a final Linear layer with 10 output features.
* Declare these layers in the `__init__(self, ..)` method.
* In the `forward(self, x)` method, define the forward pass of the input tensor, connecting these layers accordingly.
* The model input must be a tensor, and the output will also be a tensor.


In [1]:
import torch
import torch.nn as nn
#from torch import nn

# Create Custom Model 
class LinearModel(nn.Module):
    def __init__(self, num_classes=10):
        # require to call super() 
        super().__init__()
        #Linear Layer와 ReLU Layer 생성. 
        self.linear_01 = nn.Linear(in_features=784, out_features=100)
        self.relu_01 = nn.ReLU()
        self.linear_02 = nn.Linear(in_features=100, out_features=num_classes)
        
    # forward propagation
    def forward(self, x):
        x = self.linear_01(x)
        x = self.relu_01(x)
        output = self.linear_02(x)
        return output

In [2]:
# Create a random input tensor
input_tensor = torch.randn(size=(64, 784))
print(input_tensor.size())

# Create a LinearModel object. Initialize it by passing the arguments declared in __init__(self, num_classes)
linear_model = LinearModel(num_classes=10)

# The LinearModel object is callable, so passing the input tensor like a function call invokes the forward() method
output_tensor = linear_model(input_tensor)
print(output_tensor.size())


torch.Size([64, 784])
torch.Size([64, 10])


### Exploring Layers (e.g., `nn.Linear`, `nn.Conv2d`, `nn.ReLU`, `nn.MaxPool2d`)

* Layers in PyTorch are intuitive building blocks designed to easily construct neural networks.
* Layers are also created by inheriting from `nn.Module`, supporting automatic differentiation and GPU acceleration.
* Layers define the structure and transformations applied to input data, performing tasks such as linear transformations, convolutions, activation functions, pooling, and normalization.
* Some layers have learnable parameters (e.g., `nn.Linear`, `nn.Conv2d`), while others primarily perform transformations without learnable parameters (e.g., `nn.ReLU`, `nn.MaxPool2d`).
* Similar to how callable objects work, a Layer object is created by passing its initialization arguments, and the input tensor to be transformed is then passed to the Layer object for processing.


In [3]:
import torch.nn as nn

linear_01 = nn.Linear(in_features=784, out_features=100)
# Has learnable parameters
print(linear_01.weight)
print(linear_01.bias)

# Learnable parameters are of type nn.parameter.Parameter, which is a special kind of tensor that supports training (automatic differentiation)
print(type(linear_01.weight))


Parameter containing:
tensor([[ 0.0230, -0.0069,  0.0003,  ..., -0.0336,  0.0287,  0.0182],
        [ 0.0054,  0.0207,  0.0081,  ..., -0.0138, -0.0024,  0.0080],
        [ 0.0187, -0.0319,  0.0180,  ..., -0.0018, -0.0234, -0.0066],
        ...,
        [-0.0132, -0.0209,  0.0346,  ...,  0.0260,  0.0187, -0.0124],
        [-0.0312,  0.0121,  0.0293,  ...,  0.0136, -0.0056,  0.0311],
        [ 0.0198,  0.0099,  0.0174,  ..., -0.0133,  0.0263, -0.0051]],
       requires_grad=True)
Parameter containing:
tensor([-3.0316e-02, -1.5920e-02, -1.1828e-02,  1.3264e-02,  5.5066e-05,
         1.9530e-02, -2.9638e-03,  3.5274e-02, -3.2051e-02,  7.7152e-03,
         1.0679e-02, -5.7659e-03,  2.6313e-02, -8.0426e-03, -2.8829e-02,
        -3.2332e-02,  4.2979e-03,  5.5229e-03, -9.2702e-03,  8.7338e-03,
         1.7945e-02, -6.8632e-03, -3.4427e-02,  1.2617e-02,  5.7750e-03,
         2.1583e-02,  6.9227e-04, -1.1774e-02, -3.2414e-02, -2.9523e-02,
        -1.2394e-02,  3.3678e-02,  1.9604e-02,  4.2619e-0

In [4]:
# linear_01 = nn.Linear(in_features=784, out_features=100)
# The weight shape is transposed to (out_features, in_features) to align with matmul() for the forward pass.
print(linear_01.weight.shape, linear_01.bias.shape)


torch.Size([100, 784]) torch.Size([100])


#### `nn.Parameter` is a special type of tensor that supports training (automatic differentiation)

* Any object inheriting from `nn.Module` can register its `Parameter` tensors with an optimizer.
* Regular tensors can also have `requires_grad=True` to enable automatic differentiation, but they cannot be registered with an optimizer, so their gradients are not updated during optimization.
* This allows automatic tracking during backpropagation, enabling gradients to be computed and updated automatically.


In [5]:
# The parameters() method of a layer returns an iterator over all the parameters the layer contains.
for parameter in linear_01.parameters():
    print(parameter)


Parameter containing:
tensor([[ 0.0230, -0.0069,  0.0003,  ..., -0.0336,  0.0287,  0.0182],
        [ 0.0054,  0.0207,  0.0081,  ..., -0.0138, -0.0024,  0.0080],
        [ 0.0187, -0.0319,  0.0180,  ..., -0.0018, -0.0234, -0.0066],
        ...,
        [-0.0132, -0.0209,  0.0346,  ...,  0.0260,  0.0187, -0.0124],
        [-0.0312,  0.0121,  0.0293,  ...,  0.0136, -0.0056,  0.0311],
        [ 0.0198,  0.0099,  0.0174,  ..., -0.0133,  0.0263, -0.0051]],
       requires_grad=True)
Parameter containing:
tensor([-3.0316e-02, -1.5920e-02, -1.1828e-02,  1.3264e-02,  5.5066e-05,
         1.9530e-02, -2.9638e-03,  3.5274e-02, -3.2051e-02,  7.7152e-03,
         1.0679e-02, -5.7659e-03,  2.6313e-02, -8.0426e-03, -2.8829e-02,
        -3.2332e-02,  4.2979e-03,  5.5229e-03, -9.2702e-03,  8.7338e-03,
         1.7945e-02, -6.8632e-03, -3.4427e-02,  1.2617e-02,  5.7750e-03,
         2.1583e-02,  6.9227e-04, -1.1774e-02, -3.2414e-02, -2.9523e-02,
        -1.2394e-02,  3.3678e-02,  1.9604e-02,  4.2619e-0

In [6]:
import torch

tensor_01 = torch.rand(size=(100, 784))
print(tensor_01.requires_grad)

param_01 = nn.Parameter(data=tensor_01)
print(param_01.shape, param_01.requires_grad)

False
torch.Size([100, 784]) True


### Exploring Model Components (Layers and Submodules)

* A submodule is a component created by inheriting from `nn.Module` (layers are also submodules). Generally, it refers to a class or block composed of multiple layers, used to modularize complex model structures into smaller, manageable blocks.
* Submodules inherit various features from `nn.Module`, such as registering child modules, automatically registering parameters, and supporting automatic differentiation.
* When creating a submodule, you must implement both the constructor (`__init__`) and the `forward()` method.
* Terminology for the lecture will be defined as follows:

  * **Model**: The final neural network model.
  * **Submodule (or Block)**: A block module composed of multiple connected layers.
  * **Submodule (Module)**: Any object inheriting from `nn.Module`, including layers and submodules.
  * **Layer**: A single layer, e.g., `nn.Linear`.
  * **nn.Module**: The base class `nn.Module`.


In [7]:
import torch
import torch.nn as nn

# Create Submodule
class SimpleBlock(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.linear_01 = nn.Linear(in_features=in_features,
                                   out_features=out_features)
        self.relu_01 = nn.ReLU()

    def forward(self, x):
        x = self.linear_01(x)
        x = self.relu_01(x)
        return x

class LinearModel(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.simple_01 = SimpleBlock(in_features=784,
                                     out_features=100)
        self.linear_02 = nn.Linear(in_features=100, out_features=num_classes)
        
    def forward(self, x):
        x = self.simple_01(x)
        output = self.linear_02(x)
        return output

In [8]:
input_tensor = torch.randn(size=(64, 784))
print(input_tensor.size())

# Create a LinearModel object by passing the initialization arguments declared in __init__(self, num_classes)
linear_model = LinearModel(num_classes=10)

# The LinearModel object is callable, so passing the input tensor like a function call invokes the forward() method
output_tensor = linear_model(input_tensor)
print(output_tensor.size())


torch.Size([64, 784])
torch.Size([64, 10])


#### Inspecting All Modules (Layers and Submodules) in a Model

* The `modules()` method returns all nested submodules (including layers), including the module itself, that inherit from `nn.Module`.
* The `named_modules()` method returns all nested submodules along with their names and class types, including the module itself.
* Internal member variables of a model (variables assigned with `self`) can be accessed directly using `object_name.variable_name`.


In [9]:
print(linear_model)

LinearModel(
  (simple_01): SimpleBlock(
    (linear_01): Linear(in_features=784, out_features=100, bias=True)
    (relu_01): ReLU()
  )
  (linear_02): Linear(in_features=100, out_features=10, bias=True)
)


In [10]:
# Print all nested submodules, including the module itself
for module in linear_model.modules():
    print(module)


LinearModel(
  (simple_01): SimpleBlock(
    (linear_01): Linear(in_features=784, out_features=100, bias=True)
    (relu_01): ReLU()
  )
  (linear_02): Linear(in_features=100, out_features=10, bias=True)
)
SimpleBlock(
  (linear_01): Linear(in_features=784, out_features=100, bias=True)
  (relu_01): ReLU()
)
Linear(in_features=784, out_features=100, bias=True)
ReLU()
Linear(in_features=100, out_features=10, bias=True)


In [11]:
for module in linear_model.children():
    print(module)

SimpleBlock(
  (linear_01): Linear(in_features=784, out_features=100, bias=True)
  (relu_01): ReLU()
)
Linear(in_features=100, out_features=10, bias=True)


In [12]:
# The named_modules() method prints the names and classes of all nested submodules, including the module itself
for name, module in linear_model.named_modules():
    print(f"Module Name: {name}, Module: {module}")


Module Name: , Module: LinearModel(
  (simple_01): SimpleBlock(
    (linear_01): Linear(in_features=784, out_features=100, bias=True)
    (relu_01): ReLU()
  )
  (linear_02): Linear(in_features=100, out_features=10, bias=True)
)
Module Name: simple_01, Module: SimpleBlock(
  (linear_01): Linear(in_features=784, out_features=100, bias=True)
  (relu_01): ReLU()
)
Module Name: simple_01.linear_01, Module: Linear(in_features=784, out_features=100, bias=True)
Module Name: simple_01.relu_01, Module: ReLU()
Module Name: linear_02, Module: Linear(in_features=100, out_features=10, bias=True)


In [13]:
# The named_children() method prints only the immediate child submodules with their names and classes
for name, module in linear_model.named_children():
    print(f"Submodule Name: {name}, Submodule: {module}")


Submodule Name: simple_01, Submodule: SimpleBlock(
  (linear_01): Linear(in_features=784, out_features=100, bias=True)
  (relu_01): ReLU()
)
Submodule Name: linear_02, Submodule: Linear(in_features=100, out_features=10, bias=True)


In [14]:
# Internal member variables of a model (assigned with self) can be accessed directly using object_name.variable_name
print('simple_01:', linear_model.simple_01)
print('linear_02:', linear_model.linear_02)
print('linear_01 in simple_01:', linear_model.simple_01.linear_01)


simple_01: SimpleBlock(
  (linear_01): Linear(in_features=784, out_features=100, bias=True)
  (relu_01): ReLU()
)
linear_02: Linear(in_features=100, out_features=10, bias=True)
linear_01 in simple_01: Linear(in_features=784, out_features=100, bias=True)


#### Accessing All Parameters of a Model

* Any class inheriting from `nn.Module` can retrieve its registered parameter tensors using the `parameters()` method.
* You can use `parameters()` to retrieve all parameters from a model, including those in its submodules.
* The `named_parameters()` method returns both the names (e.g., weight/bias) of the parameters and the corresponding parameter tensors from layers or submodules.


In [15]:
# for parameter in linear_model.parameters():
#     print(parameter)
for name, parameter in linear_model.named_parameters():
    print(name, parameter)

simple_01.linear_01.weight Parameter containing:
tensor([[ 0.0276,  0.0173, -0.0304,  ...,  0.0026, -0.0104,  0.0243],
        [ 0.0346, -0.0011, -0.0245,  ..., -0.0029,  0.0158,  0.0126],
        [-0.0193,  0.0229,  0.0281,  ..., -0.0354,  0.0326,  0.0218],
        ...,
        [-0.0233,  0.0152, -0.0297,  ...,  0.0322, -0.0297,  0.0162],
        [-0.0293, -0.0042,  0.0328,  ...,  0.0023, -0.0301,  0.0292],
        [ 0.0339, -0.0093, -0.0306,  ..., -0.0298,  0.0012, -0.0067]],
       requires_grad=True)
simple_01.linear_01.bias Parameter containing:
tensor([-3.3666e-02, -1.6048e-02, -2.0537e-02,  3.2391e-02,  4.7197e-03,
        -3.7419e-03,  3.1773e-02, -1.2888e-02, -1.6494e-02,  7.3672e-03,
         1.7864e-02,  1.2012e-02, -1.4381e-02, -2.9633e-02,  3.3953e-02,
         2.1953e-02,  1.3528e-04, -1.5345e-02,  1.7952e-02,  1.6095e-02,
        -1.3286e-02, -1.5059e-02, -7.5549e-03,  1.9864e-02, -1.4455e-02,
         1.0902e-02,  1.5050e-02, -2.3534e-02, -6.8009e-03,  1.3137e-02,
     

### `torchinfo` `summary()`

* The `summary()` function from the `torchinfo` package provides a detailed overview of a model’s structure.

  * **model**: The model object.
  * **input\_size**: Size of the input tensor, usually including the batch dimension.
  * **col\_names**: List of column names to display in the summary output.

    * `input_size`: Size of the input tensor.
    * `output_size`: Size of the output tensor.
    * `num_params`: Number of learnable parameters.
    * `trainable`: Whether the parameters are trainable (`requires_grad`).
  * **row\_settings**: Controls what is shown in each row.

    * `var_names`: Names of module variables.
    * `depth`: Depth of submodules to display.


In [16]:
from torchinfo import summary


In [17]:
input_tensor = torch.randn(size=(64, 784))
print(input_tensor.size())


linear_model = LinearModel(num_classes=10)

output_tensor = linear_model(input_tensor)
print(output_tensor.size())

torch.Size([64, 784])
torch.Size([64, 10])


In [18]:
from torchinfo import summary

summary(model=linear_model, input_size=(64, 784),
        col_names=['input_size', 'output_size', 'num_params'], #'trainable'
        row_settings=['var_names', 'depth'],
        depth=3
       )

Layer (type (var_name):depth-idx)        Input Shape               Output Shape              Param #
LinearModel (LinearModel)                [64, 784]                 [64, 10]                  --
├─SimpleBlock (simple_01): 1-1           [64, 784]                 [64, 100]                 --
│    └─Linear (linear_01): 2-1           [64, 784]                 [64, 100]                 78,500
│    └─ReLU (relu_01): 2-2               [64, 100]                 [64, 100]                 --
├─Linear (linear_02): 1-2                [64, 100]                 [64, 10]                  1,010
Total params: 79,510
Trainable params: 79,510
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 5.09
Input size (MB): 0.20
Forward/backward pass size (MB): 0.06
Params size (MB): 0.32
Estimated Total Size (MB): 0.58