# 📘 Lesson 6 — PyTorch nn.Module: Building Custom Neural Networks

---

### 🎯 Why this lesson matters
So far, we’ve used simple models like Linear or MLP with built-in layers.  
But to build real-world models (like CNNs, Transformers), you need to create **custom classes**.  

👉 `nn.Module` is the base class for all neural networks in PyTorch.  
It handles:
- Parameter registration (weights, biases).
- Forward pass definition.
- Automatic gradient tracking.

Understanding this = unlocking the ability to build **any architecture**.


In [1]:
# Setup
import torch
import torch.nn as nn
import torch.optim as optim
torch.manual_seed(42)


## 1) What is nn.Module?

- It’s a **container** for your model’s layers and parameters.
- Every custom model **inherits from nn.Module**.
- Key methods:
  - `__init__()`: Define layers here.
  - `forward()`: Define computation (how data flows).

👉 WHY?  
PyTorch automatically tracks parameters (via `.parameters()`) for training.  
Without it, you’d have to manage gradients manually.


In [2]:
# Simple Perceptron as nn.Module
class SimplePerceptron(nn.Module):
    def __init__(self, n_inputs):
        super().__init__()  # Call parent class
        self.linear = nn.Linear(n_inputs, 1)  # Layer in init
        self.activation = nn.Sigmoid()

    def forward(self, x):
        x = self.linear(x)
        x = self.activation(x)
        return x

# Usage
model = SimplePerceptron(2)
x = torch.zeros(4, 2)  # Dummy input
print("Prediction:", model(x))


Prediction: tensor([[0.],
        [0.],
        [0.],
        [0.]], grad_fn=<SigmoidBackward0>)


## 2) __init__() — Defining Layers

- Here, you create sub-modules (layers) as attributes.
- PyTorch registers them automatically for gradients.

👉 WHY super().__init__()?  
It initializes the parent class, enabling features like .to(device) or .parameters().


In [3]:
# Example: MLP with init
class SimpleMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(2, 2)
        self.output = nn.Linear(2, 1)

    def forward(self, x):
        pass  # To be defined

model = SimpleMLP()
print("Parameters:", list(model.parameters()))


Parameters: [Parameter containing:
tensor([[ 0.6846, -0.7125],
        [ 1.0447,  0.3510]], requires_grad=True), Parameter containing:
tensor([0.2961, 0.7681], requires_grad=True), Parameter containing:
tensor([[-1.0735,  0.1456]], requires_grad=True), Parameter containing:
tensor([ 0.2344], requires_grad=True)]


## 3) forward() — The Computation Path

- Defines how input flows through layers.
- Called automatically when you do model(x).

👉 WHY not call forward directly?  
model(x) hooks into PyTorch’s autograd for tracking.


In [4]:
class SimpleMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(2, 2)
        self.output = nn.Linear(2, 1)

    def forward(self, x):
        x = torch.relu(self.hidden(x))  # Activation
        x = self.output(x)
        return x

model = SimpleMLP()
x = torch.zeros(2, 2)
print("Output:", model(x))


Output: tensor([[-0.4796],
        [-0.4796]], grad_fn=<AddmmBackward0>)


## 4) Parameters vs Buffers

- **Parameters**: Trainable (weights, biases) — updated by optimizer.
- **Buffers**: Non-trainable (e.g., running means in BatchNorm) — registered with `register_buffer()`.

👉 WHY distinguish?  
Buffers move with model.to(device) but aren’t in .parameters() for training.


In [5]:
class ModelWithBuffer(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2, 1)
        self.register_buffer('my_buffer', torch.ones(2))  # Non-trainable

model = ModelWithBuffer()
print("Parameters count:", len(list(model.parameters())))  # Only linear's params
print("Buffer:", model.my_buffer)


Parameters count: 3
Buffer: tensor([1., 1.])


## 5) Training with nn.Module

- Use model.parameters() for optimizer.
- Call model(x) for forward.

👉 WHY?  
This setup scales to complex models like Transformers.


In [6]:
# XOR data
X = torch.tensor([[0.,0.], [0.,1.], [1.,0.], [1.,1.]])
y = torch.tensor([[0.], [1.], [1.], [0.]])

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(2, 4)
        self.out = nn.Linear(4, 1)
        self.activation = nn.Sigmoid()

    def forward(self, x):
        x = self.activation(self.hidden(x))
        x = self.activation(self.out(x))
        return x

model = MLP()
optimizer = optim.SGD(model.parameters(), lr=0.1)
criterion = nn.BCELoss()

for epoch in range(1000):
    y_pred = model(X)
    loss = criterion(y_pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print("Final loss:", loss.item())


Final loss: 0.25


## 6) Practice Exercises

- Build a custom CNN class with conv layers.
- Add a buffer for positional encoding (hint for Transformers).


In [7]:
# Practice: Build a simple CNN
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3)
        self.fc = nn.Linear(16, 10)  # Assume flattened input

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = x.view(x.size(0), -1)  # Flatten
        x = self.fc(x)
        return x

model = SimpleCNN()
print(model)


SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc): Linear(in_features=16, out_features=10, bias=True)
)


## 📚 Summary

✅ What we learned:
- nn.Module as base for custom models.
- __init__() for layers, forward() for computation.
- Parameters (trainable) vs Buffers (non-trainable).
- Training integrates seamlessly.

🚀 Next Lesson: **Convolutional Neural Networks (CNN)** — applying nn.Module to images.
