<a href="https://colab.research.google.com/github/shuvad23/Deep-learning-with-PyTorch/blob/main/PyTorch_NN_Module.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


---

## üß† What is `torch.nn.Module`?

In PyTorch, **`nn.Module`** is the **base class for all neural network components** ‚Äî layers, models, and even custom building blocks.

Every neural network you define **inherits** from this class.

Think of it as a blueprint that gives you:

* A structure to define layers and parameters
* A way to organize forward computations
* Automatic tracking of trainable parameters
* Easy integration with optimizers and loss functions

---

## ‚öôÔ∏è Basic Structure

Here‚Äôs what a minimal PyTorch model looks like:

```python
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()  # initialize base class
        
        # define layers
        self.linear1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(20, 1)

    def forward(self, x):
        # define forward pass
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        return x

# Instantiate the model
model = MyModel()

# Forward pass example
input_data = torch.randn(5, 10)
output = model(input_data)
print(output.shape)
```

---

## üß© Key Components of `nn.Module`

### 1. **`__init__()`**

* You define **layers** and **submodules** here.
* Layers like `nn.Linear`, `nn.Conv2d`, `nn.LSTM`, etc., automatically register their parameters.

### 2. **`forward()`**

* Defines **how the input passes** through the network.
* This is where you describe your model‚Äôs logic.
* You *don‚Äôt* call `forward()` directly ‚Äî you just do:

  ```python
  output = model(input)
  ```

  which internally calls `model.forward(input)`.

### 3. **`parameters()`**

* Returns an iterator over all **trainable parameters** (weights, biases).

  ```python
  for param in model.parameters():
      print(param.shape)
  ```

### 4. **`named_parameters()`**

* Same as above but also gives the **names** of each parameter.

  ```python
  for name, param in model.named_parameters():
      print(name, param.shape)
  ```

### 5. **`state_dict()`**

* Returns a Python dictionary containing all model parameters.

  ```python
  torch.save(model.state_dict(), "model_weights.pth")
  ```
* You can load it later using:

  ```python
  model.load_state_dict(torch.load("model_weights.pth"))
  ```

### 6. **`eval()` and `train()`**

* `model.train()` ‚Üí enables dropout, batchnorm, etc. for training.
* `model.eval()` ‚Üí disables dropout, batchnorm updates (used during testing/inference).

---

## üßÆ Example ‚Äî Custom Neural Network

Let‚Äôs make a small neural network for regression:

```python
import torch
import torch.nn as nn
import torch.nn.functional as F

class RegressionNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RegressionNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
```

---

## üîß Using the Model in Training

```python
model = RegressionNN(10, 32, 1)

criterion = nn.MSELoss()          # Loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)  # Optimizer

for epoch in range(100):
    inputs = torch.randn(5, 10)
    targets = torch.randn(5, 1)

    optimizer.zero_grad()         # Reset gradients
    outputs = model(inputs)       # Forward pass
    loss = criterion(outputs, targets)  # Compute loss
    loss.backward()               # Backpropagation
    optimizer.step()              # Update weights

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
```

---

## üß± `nn.Module` Hierarchy ‚Äî Nested Modules

You can have **modules inside modules**, like building blocks:

```python
class Block(nn.Module):
    def __init__(self, in_dim, out_dim):
        super(Block, self).__init__()
        self.layer = nn.Sequential(
            nn.Linear(in_dim, out_dim),
            nn.ReLU()
        )
    def forward(self, x):
        return self.layer(x)

class ComplexModel(nn.Module):
    def __init__(self):
        super(ComplexModel, self).__init__()
        self.block1 = Block(10, 20)
        self.block2 = Block(20, 30)
        self.output = nn.Linear(30, 1)
    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        return self.output(x)
```

PyTorch automatically registers all nested submodules and parameters.

---

## üí° Useful Methods Summary

| Method                 | Description                              |
| ---------------------- | ---------------------------------------- |
| `.parameters()`        | Returns all parameters (weights, biases) |
| `.named_parameters()`  | Returns (name, parameter) pairs          |
| `.children()`          | Returns direct submodules                |
| `.modules()`           | Returns all nested submodules            |
| `.state_dict()`        | Returns model‚Äôs state (weights, buffers) |
| `.load_state_dict()`   | Loads model state                        |
| `.to(device)`          | Moves model to CPU or GPU                |
| `.train()` / `.eval()` | Sets training/inference mode             |

---

## üöÄ Advantages of Using `nn.Module`

‚úÖ Simplifies model definition
‚úÖ Automatically registers layers and parameters
‚úÖ Works seamlessly with autograd
‚úÖ Integrates cleanly with optimizers
‚úÖ Allows saving/loading of models easily
‚úÖ Supports GPU/CPU switching with one line

---

## üß† Summary Diagram

```
        +----------------------+
        |    nn.Module         |
        +----------+-----------+
                   |
        +----------v-----------+
        | Your Custom Class     |
        | (inherits nn.Module)  |
        +----------+-----------+
                   |
          +--------v--------+
          |  __init__()     |
          |  forward()      |
          +-----------------+
                   |
          +--------v--------+
          | Model Object     |
          | .parameters()    |
          | .train()/.eval() |
          | .state_dict()    |
          +-----------------+
```

---



In [None]:
# ‚öôÔ∏è Basic Structure
# Here‚Äôs what a minimal PyTorch model looks like:
import torch
import torch.nn as nn

# define class
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()

        #define layers
        self.linear1 = nn.Linear(10,30)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(30,1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # define forward pass
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        x = self.sigmoid(x)
        return x

# instantiate the model
model = MyModel()

# forward pass example
input_data = torch.randn(5,10) # batch size of 5, input size of 10
output = model(input_data)
print(output)
print(output.shape)


tensor([[0.5093],
        [0.5601],
        [0.5681],
        [0.5144],
        [0.5173]], grad_fn=<SigmoidBackward0>)
torch.Size([5, 1])


In [None]:
model.linear1.weight

In [None]:
model.linear1.bias

Parameter containing:
tensor([-0.0415, -0.1561, -0.0629,  0.2592,  0.2734,  0.2189, -0.2004, -0.1361,
        -0.1401,  0.0034,  0.1555,  0.0037, -0.0946,  0.2217, -0.0662,  0.1300,
         0.1112,  0.0823, -0.2613, -0.2897, -0.3081,  0.2925, -0.1519,  0.0620,
        -0.3029,  0.0196, -0.2717, -0.0968,  0.2977,  0.2388],
       requires_grad=True)

In [None]:
model.linear2.weight

Parameter containing:
tensor([[-0.1808, -0.1670,  0.0623,  0.1002,  0.1004, -0.0437, -0.1101,  0.1174,
          0.1531, -0.0341, -0.1300,  0.0074, -0.1106,  0.0968,  0.1296, -0.0861,
          0.0352,  0.0741, -0.0855, -0.1740,  0.0778, -0.0329, -0.0609, -0.1046,
          0.0457, -0.1098, -0.0706, -0.1547,  0.0016,  0.1140]],
       requires_grad=True)

In [None]:
model.linear2.bias

Parameter containing:
tensor([0.1308], requires_grad=True)

In [None]:
! pip install torchinfo

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


In [None]:
from torchinfo import summary
summary(model, input_size=(5,10))

Layer (type:depth-idx)                   Output Shape              Param #
MyModel                                  [5, 1]                    --
‚îú‚îÄLinear: 1-1                            [5, 30]                   330
‚îú‚îÄReLU: 1-2                              [5, 30]                   --
‚îú‚îÄLinear: 1-3                            [5, 1]                    31
‚îú‚îÄSigmoid: 1-4                           [5, 1]                    --
Total params: 361
Trainable params: 361
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

In [None]:
# another example
import torch
import torch.nn as nn

class MyTestModel(nn.Module):

    def __init__(self):
        super(MyTestModel, self).__init__()
        self.linear1 = nn.Linear(6,4)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(4,2)
        self.relu2 = nn.ReLU()
        self.linear3 = nn.Linear(2,1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.linear1(x)
        x = self.relu(x)
        x = self.linear2(x)
        x = self.relu2(x)
        x = self.linear3(x)
        x = self.sigmoid(x)
        return x

model = MyTestModel()
input_data = torch.randn(5,6)
output = model(input_data)
print(output)
print(output.shape)

tensor([[0.4977],
        [0.5101],
        [0.5093],
        [0.5091],
        [0.5098]], grad_fn=<SigmoidBackward0>)
torch.Size([5, 1])


In [None]:
model.linear1.weight #(linear one -(6*4) = 24 weight)

Parameter containing:
tensor([[ 0.3445,  0.2752,  0.3715, -0.1136,  0.3578, -0.3004],
        [-0.2442,  0.2207,  0.2197,  0.2534, -0.0851,  0.0199],
        [ 0.0486, -0.0233,  0.1396, -0.2920,  0.0227, -0.2233],
        [ 0.0283, -0.1525, -0.0417,  0.3226, -0.3698,  0.0537]],
       requires_grad=True)

In [None]:
model.linear2.weight #(linear two -(4*2) = 8 weight)

Parameter containing:
tensor([[ 0.3356,  0.3246,  0.4793,  0.4041],
        [-0.0530,  0.1229, -0.1731, -0.0078]], requires_grad=True)

In [None]:
model.linear3.weight #(linear three -(2*1) = 2 weight)

Parameter containing:
tensor([[-0.1029, -0.0560]], requires_grad=True)

In [None]:
model.linear1.bias #(linear one -(4) = 4 bias)

Parameter containing:
tensor([-0.0530,  0.2785, -0.0681,  0.1063], requires_grad=True)

In [None]:
model.linear2.bias #(linear two -(2) = 2 bias)

Parameter containing:
tensor([-0.3893,  0.2026], requires_grad=True)

In [None]:
model.linear3.bias #(linear three -(1) = 1 bias)

Parameter containing:
tensor([0.0512], requires_grad=True)

In [None]:
from torchinfo import summary
summary(model, input_size=(5,6))

Layer (type:depth-idx)                   Output Shape              Param #
MyTestModel                              [5, 1]                    --
‚îú‚îÄLinear: 1-1                            [5, 4]                    28
‚îú‚îÄReLU: 1-2                              [5, 4]                    --
‚îú‚îÄLinear: 1-3                            [5, 2]                    10
‚îú‚îÄReLU: 1-4                              [5, 2]                    --
‚îú‚îÄLinear: 1-5                            [5, 1]                    3
‚îú‚îÄSigmoid: 1-6                           [5, 1]                    --
Total params: 41
Trainable params: 41
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

In [None]:
# using sequential
import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(6,4),
            nn.ReLU(),
            nn.Linear(4,2),
            nn.ReLU(),
            nn.Linear(2,1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.network(x)

model = Net()
input_data = torch.randn(5,6)
output = model(input_data)
print(output)
print(output.shape)


tensor([[0.5936],
        [0.5936],
        [0.5936],
        [0.5936],
        [0.5936]], grad_fn=<SigmoidBackward0>)
torch.Size([5, 1])


üß† 1. Building a Neural Network using nn.Module:

1Ô∏è‚É£ Define model using torch.nn.Module

2Ô∏è‚É£ Use built-in activation functions

3Ô∏è‚É£ Use built-in loss functions

4Ô∏è‚É£ Use built-in optimizers

In [None]:
# üöÄ Full Working Example (Step-by-Step)
import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU() # built-in activation
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out



# Instantiate the model
input_size = 10
hidden_size = 32
output_size = 1
model = NeuralNetwork(input_size, hidden_size, output_size)


# define the loss function (mean squred error)
criterion = nn.MSELoss()

# define the optimizer(adam)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Dummy training data
input_data = torch.randn(100, 10)
targets = torch.randn(100, 1)

# Training loop
num_epochs = 100
for epoch in range(num_epochs):
    # forward pass
    outputs = model(input_data)
    loss = criterion(outputs, targets)

    # backward pass and optimization
    optimizer.zero_grad() # clear old gradients
    loss.backward() # compute gradients
    optimizer.step() # update weights

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [10/100], Loss: 1.1533
Epoch [20/100], Loss: 1.1028
Epoch [30/100], Loss: 1.0607
Epoch [40/100], Loss: 1.0228
Epoch [50/100], Loss: 0.9881
Epoch [60/100], Loss: 0.9555
Epoch [70/100], Loss: 0.9244
Epoch [80/100], Loss: 0.8944
Epoch [90/100], Loss: 0.8662
Epoch [100/100], Loss: 0.8390


| Step                    | Description              | Example                                 |
| ----------------------- | ------------------------ | --------------------------------------- |
| 1Ô∏è‚É£ Build model         | Subclass `nn.Module`     | `class Net(nn.Module): ...`             |
| 2Ô∏è‚É£ Activation function | Use built-in activations | `nn.ReLU()` / `torch.relu()`            |
| 3Ô∏è‚É£ Loss function       | Built-in from `nn`       | `nn.MSELoss()`, `nn.CrossEntropyLoss()` |
| 4Ô∏è‚É£ Optimizer           | From `torch.optim`       | `optim.Adam()`, `optim.SGD()`           |




## üß† What is `torch.optim`?

`torch.optim` is **PyTorch‚Äôs optimization package**.
It provides implementations of **gradient-based optimization algorithms** like **SGD**, **Adam**, **RMSprop**, etc., that help minimize the loss function during training.

In simple terms:

> üîπ Your model computes predictions.
> üîπ The loss function measures error.
> üîπ `torch.optim` adjusts model parameters to reduce that error.

---

## ‚öôÔ∏è The Training Loop Concept

The training process using `torch.optim` always follows this pattern:

```python
for epoch in range(num_epochs):
    optimizer.zero_grad()   # 1Ô∏è‚É£ Reset gradients
    outputs = model(inputs) # 2Ô∏è‚É£ Forward pass
    loss = criterion(outputs, targets)  # 3Ô∏è‚É£ Compute loss
    loss.backward()         # 4Ô∏è‚É£ Backpropagation
    optimizer.step()        # 5Ô∏è‚É£ Update parameters
```

Let‚Äôs explain this step-by-step üëá

---

## üß© Step-by-Step Breakdown

### 1Ô∏è‚É£ `optimizer = torch.optim.OptimizerClass(parameters, lr=...)`

* You first create an optimizer and tell it **which parameters** to update (usually the model‚Äôs parameters).
* Example:

```python
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
```

### 2Ô∏è‚É£ `optimizer.zero_grad()`

* Clears old gradients (otherwise PyTorch *accumulates* them by default).
* Always call this **before** `loss.backward()`.

### 3Ô∏è‚É£ `loss.backward()`

* Computes the gradient of the loss with respect to each model parameter.

### 4Ô∏è‚É£ `optimizer.step()`

* Uses those gradients to **update the parameters** (weights/biases).

---

## üß† Example: Simple Regression with `torch.optim`

```python
import torch
import torch.nn as nn
import torch.optim as optim

# Model
model = nn.Linear(1, 1)   # y = wx + b

# Loss
criterion = nn.MSELoss()

# Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Dummy data
x = torch.randn(10, 1)
y = 3 * x + 2  # actual relation

# Training
for epoch in range(50):
    optimizer.zero_grad()       # reset gradients
    outputs = model(x)          # forward
    loss = criterion(outputs, y)
    loss.backward()             # backprop
    optimizer.step()            # update weights

    if (epoch+1) % 10 == 0:
        print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
```

---

## ‚ö° Common Optimizers in `torch.optim`

| Optimizer  | Description                  | Key Parameters       | When to Use                            |
| ---------- | ---------------------------- | -------------------- | -------------------------------------- |
| `SGD`      | Stochastic Gradient Descent  | `lr`, `momentum`     | Simple and effective for small models  |
| `Adam`     | Adaptive Moment Estimation   | `lr`, `betas`, `eps` | Default choice for deep learning       |
| `RMSprop`  | Root Mean Square Propagation | `lr`, `alpha`, `eps` | Works well for RNNs                    |
| `Adagrad`  | Adaptive Gradient            | `lr`, `eps`          | For sparse features                    |
| `Adadelta` | Adaptive Delta               | `lr`, `rho`          | Variation of Adagrad                   |
| `AdamW`    | Adam with weight decay       | `lr`, `weight_decay` | Modern standard (used in Transformers) |

---

## üî¨ Example Comparison: SGD vs Adam

### Using SGD:

```python
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
```

### Using Adam:

```python
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
```

Both do the same job ‚Äî *update weights* ‚Äî but in different ways:

* **SGD**: Same learning rate for all parameters.
* **Adam**: Adaptive learning rate for each parameter based on momentum and variance.

---

## üßÆ Optional: Learning Rate Scheduling

You can also adjust the learning rate dynamically using **schedulers**:

```python
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

for epoch in range(30):
    train()                # your training code
    scheduler.step()       # reduce lr every 10 epochs
```

---

## üß† Summary

| Concept          | Description             | Example                                                |
| ---------------- | ----------------------- | ------------------------------------------------------ |
| Module           | Provides all optimizers | `torch.optim`                                          |
| Create optimizer | Defines update rule     | `optimizer = optim.Adam(model.parameters(), lr=0.001)` |
| Reset gradients  | Clears previous grads   | `optimizer.zero_grad()`                                |
| Compute grads    | Backprop                | `loss.backward()`                                      |
| Update weights   | Step forward            | `optimizer.step()`                                     |

---

‚úÖ **In short:**

> `torch.optim` is the engine that drives learning ‚Äî it takes the gradients from backpropagation and updates your model parameters intelligently.

---
