#  ⭐**`torch.nn`** **module in PyTorch**

The **`torch.nn`** module is PyTorch’s **neural network toolkit**. It provides all the tools needed to build, train, and manage neural networks efficiently.


## `torch.nn` Module

✅ Simplifies the creation of deep learning models
✅ Provides pre-built **layers**, **activation functions**, **losses**, **utilities**
✅ Clean, modular, and easy to debug


## 🔑 Key Components of `torch.nn`


### 1. 🧱 `nn.Module` – The Core Building Block

* Base class for **all custom layers and models**
* You subclass `nn.Module` to define your own model
* Must override:

  * `__init__(self)` – define layers
  * `forward(self, x)` – define forward pass

📌 Think of it like a recipe for how data flows through your network.

```python
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 2)
    
    def forward(self, x):
        return self.fc(x)
```


### 2. 🧩 Common Layers (`torch.nn`)

* `nn.Linear(in, out)` ➡️ Fully connected layer
* `nn.Conv2d(in_channels, out_channels, kernel_size)` ➡️ Convolution layer
* `nn.LSTM(input_size, hidden_size)` ➡️ Recurrent layer
* `nn.Embedding(num_embeddings, embedding_dim)` ➡️ Used in NLP
* `nn.BatchNorm2d(num_features)` ➡️ Normalizes activations


### 3. ⚡ Activation Functions

Used to introduce **non-linearity** (helps model learn complex patterns):

* `nn.ReLU()`
* `nn.Sigmoid()`
* `nn.Tanh()`
* `nn.LeakyReLU()`
* `nn.Softmax(dim=1)`

💡 Most activations are also available in `torch.nn.functional` (see below).


### 4. 🎯 Loss Functions

Loss functions measure the error of predictions:

* `nn.CrossEntropyLoss()` ➡️ Classification (auto applies `LogSoftmax + NLLLoss`)
* `nn.MSELoss()` ➡️ Regression
* `nn.NLLLoss()` ➡️ Negative Log Likelihood
* `nn.BCELoss()` ➡️ Binary classification
* `nn.HingeEmbeddingLoss()`, `nn.SmoothL1Loss()` ➡️ Advanced cases


### 5. 📦 Container Modules

Used to **wrap multiple layers** together:

* `nn.Sequential(layers...)` ➡️ Runs layers in order
* `nn.ModuleList([layer1, layer2])` ➡️ Stores layers in a list
* `nn.ModuleDict({"layer1": layer1, "layer2": layer2})` ➡️ Stores layers in a dict


### 6. 🎮 `torch.nn.functional` (a.k.a. `F`)

* Contains **functions** (not classes) for activations, loss, etc.
* Use when you need **more control** or **lightweight operations** without storing parameters

✅ Examples:

```python
import torch.nn.functional as F

F.relu(x)
F.cross_entropy(output, target)
F.softmax(x, dim=1)
```

💡 Use `F.*` inside `forward()` if you're not declaring layers in `__init__`.

---

### 7. 🛡️ Regularization: Dropout & Normalization

* `nn.Dropout(p=0.5)` ➡️ Randomly zeroes some neurons to reduce overfitting
* `nn.BatchNorm1d/2d/3d` ➡️ Stabilizes training, faster convergence
* `nn.LayerNorm()` ➡️ Used often in NLP models like transformers


### 8. 🔄 Weight Initialization (Optional but important)

You can customize weights manually using:

```python
def init_weights(m):
    if isinstance(m, nn.Linear):
        nn.init.xavier_uniform_(m.weight)
```

Then apply with:

```python
model.apply(init_weights)
```


### 9. 🧪 Model Training Pattern (Quick Template)

```python
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

# 1. Define Model
class MyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        return self.fc2(x)

# 2. Initialize
model = MyNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 3. Training Loop (simplified)
for data, labels in train_loader:
    optimizer.zero_grad()
    outputs = model(data)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
```


### ✅ Summary Cheat Sheet

| Component             | Purpose                              |
| --------------------- | ------------------------------------ |
| `nn.Module`           | Base class for models                |
| `nn.Linear`, etc.     | Layers to build the model            |
| `nn.ReLU`, etc.       | Activation functions                 |
| `nn.CrossEntropy`     | Loss function for classification     |
| `nn.Sequential`       | Layer stack container                |
| `F.relu`, `F.softmax` | Functional API (no params stored)    |
| `nn.Dropout`          | Regularization to reduce overfitting |




In [1]:
# Model Summary
!pip install torchinfo

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


## **Model 1**

In [2]:
# Import necessary libraries from PyTorch
import torch
import torch.nn as nn

# Define a custom model class inheriting from nn.Module
class Model(nn.Module):
    def __init__(self, num_features):
        super().__init__()  # Essential to initialize the base nn.Module class so that all its internal machinery (like parameter registration) works correctly

        # Define a fully connected layer (Linear layer) that maps input features to a single output
        self.layer = nn.Linear(num_features, 1) # input -> 5, ouput -> 1

        # Apply sigmoid activation to squash output between 0 and 1 (useful for binary classification)
        self.activation = nn.Sigmoid()

    # Define the forward pass that determines how input data flows through the model
    def forward(self, features):
        out = self.layer(features)         # Apply linear transformation
        out = self.activation(out)         # Apply sigmoid non-linearity
        return out

In [3]:
# Create dummy dataset (10 samples, each with 5 features)
features = torch.rand(10, 5)
print(features)
print(features.shape[1])

tensor([[0.2593, 0.6335, 0.0525, 0.0240, 0.7354],
        [0.5317, 0.6713, 0.6719, 0.4471, 0.4876],
        [0.4134, 0.0661, 0.7174, 0.1425, 0.7884],
        [0.6656, 0.1367, 0.9670, 0.0442, 0.3053],
        [0.0266, 0.2185, 0.6821, 0.5872, 0.4163],
        [0.5958, 0.3406, 0.5305, 0.7849, 0.3152],
        [0.0372, 0.1772, 0.1487, 0.5070, 0.3187],
        [0.5605, 0.9877, 0.9243, 0.2497, 0.7496],
        [0.1990, 0.0298, 0.6669, 0.4172, 0.3709],
        [0.1810, 0.5004, 0.7821, 0.6057, 0.7022]])
5


In [4]:
# Create Model
# model1 = Model(5) # better approch is to use `features.shape[1]`
model1 = Model(features.shape[1])

In [5]:
# Call model for forward pass
# we send 10 rows ~ we get 10 corresponding output
print(model1(features))        # Output of the model after sigmoid activation

tensor([[0.3802],
        [0.3929],
        [0.3981],
        [0.4450],
        [0.4323],
        [0.3800],
        [0.4162],
        [0.3890],
        [0.4348],
        [0.3961]], grad_fn=<SigmoidBackward0>)


In [6]:
print(model1.layer.weight)     # Print the learned weights of the linear layer
print(model1.layer.bias)       # Print the learned bias of the linear layer

Parameter containing:
tensor([[-0.2989, -0.0333,  0.2284, -0.2418, -0.3920]], requires_grad=True)
Parameter containing:
tensor([-0.1079], requires_grad=True)


![model-1](https://raw.githubusercontent.com/mohd-faizy/PyTorch-Essentials/refs/heads/main/PyTorch_x1/_img/04_1.jpeg)

**Model 1 Summary**

In [7]:
from torchinfo import summary

summary(model1, input_size=(10, 5))

Layer (type:depth-idx)                   Output Shape              Param #
Model                                    [10, 1]                   --
├─Linear: 1-1                            [10, 1]                   6
├─Sigmoid: 1-2                           [10, 1]                   --
Total params: 6
Trainable params: 6
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

## **Model 2**

In [8]:
# Import necessary libraries from PyTorch
import torch
import torch.nn as nn

# Define a custom model
class Model(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.layer1 = nn.Linear(num_features, 3)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(3, 1)
        self.sigmoid = nn.Sigmoid()

    # Define how input data flows through the layers
    def forward(self, features):
        out = self.layer1(features)
        out = self.relu(out)
        out = self.layer2(out)
        out = self.sigmoid(out)
        return out

# Create dummy input features (10 samples, each with 5 features)
features = torch.rand(10, 5)

# Instantiate the model with the number of input features
model2 = Model(features.shape[1])

In [9]:
# Perform a forward pass
print(model2(features))

tensor([[0.5436],
        [0.5593],
        [0.5552],
        [0.5428],
        [0.5418],
        [0.5520],
        [0.5438],
        [0.5540],
        [0.5513],
        [0.5418]], grad_fn=<SigmoidBackward0>)


In [10]:
# Show weights and bias of the first layer
print(model2.layer1.weight)  # 5x3
print(model2.layer1.bias)    # 3x1

Parameter containing:
tensor([[-0.2688,  0.0239,  0.4432,  0.2677,  0.0229],
        [-0.2138,  0.2217,  0.2949,  0.0895, -0.1404],
        [-0.1854,  0.2483, -0.2754, -0.3980, -0.1566]], requires_grad=True)
Parameter containing:
tensor([ 0.1358,  0.3508, -0.1073], requires_grad=True)


In [11]:
# Show weights and bias of the first layer
print(model2.layer2.weight)  # 3x1
print(model2.layer2.bias)    # 3x1

Parameter containing:
tensor([[ 0.1254, -0.2584,  0.3098]], requires_grad=True)
Parameter containing:
tensor([0.2839], requires_grad=True)


**Model 2 summary**

![model-2](https://raw.githubusercontent.com/mohd-faizy/PyTorch-Essentials/refs/heads/main/PyTorch_x1/_img/04_2.jpeg)

In [12]:
from torchinfo import summary

summary(model2, input_size=(10, 5))

Layer (type:depth-idx)                   Output Shape              Param #
Model                                    [10, 1]                   --
├─Linear: 1-1                            [10, 3]                   18
├─ReLU: 1-2                              [10, 3]                   --
├─Linear: 1-3                            [10, 1]                   4
├─Sigmoid: 1-4                           [10, 1]                   --
Total params: 22
Trainable params: 22
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

### ✅ **Using Sequential Container**

>It Allows you to stack multiple layers together in a modular and readable way

In [17]:
# Import necessary libraries from PyTorch
import torch
import torch.nn as nn

# Define a custom model class inheriting from nn.Module
class Model(nn.Module):
    def __init__(self, num_features):
        super().__init__()  # Initializes parent nn.Module class to enable parameter registration and other core functionality

        # Define a simple feedforward neural network using nn.Sequential
        # This block is assigned to self.model and includes:
        # - Linear layer to reduce input features to 3 hidden units
        # - ReLU activation for non-linearity
        # - Linear layer mapping 3 hidden units to 1 output
        # - Sigmoid activation to squash output between 0 and 1 (suitable for binary classification)
        self.network = nn.Sequential(
            nn.Linear(num_features, 3),  # num_features->5, 3 hidden units [not layers]
            nn.ReLU(),
            nn.Linear(3, 1),
            nn.Sigmoid()
        )

    # Define how input data flows through the layers
    def forward(self, features):
        return self.network(features)

# Create dummy input features (10 samples, each with 5 features)
features = torch.rand(10, 5)

# Instantiate the model with the number of input features
model2 = Model(features.shape[1])

# Perform a forward pass through the model
print(model2(features))  # Output of the model (after passing through all layers)

# Show weights and bias of the first linear layer inside the Sequential block
print(model2.network[0].weight)  # Weights of first Linear layer
print(model2.network[0].bias)    # Bias of first Linear layer

tensor([[0.4292],
        [0.4209],
        [0.4323],
        [0.4353],
        [0.4324],
        [0.4338],
        [0.4386],
        [0.4246],
        [0.4333],
        [0.4228]], grad_fn=<SigmoidBackward0>)
Parameter containing:
tensor([[-0.4003, -0.1846, -0.0175, -0.4442,  0.2974],
        [-0.0927,  0.1105,  0.0322,  0.1539, -0.1415],
        [ 0.0625, -0.4223, -0.2045,  0.4421, -0.1345]], requires_grad=True)
Parameter containing:
tensor([ 0.1412,  0.2851, -0.3888], requires_grad=True)


## **Optimizing**  [`03_PyTorch_Traininga_Pipeline`](https://drive.google.com/file/d/1Thw9YVUJPKRrzWPWdwGVQJrTbfnE3xMn/view?usp=sharing) **code**

In [27]:
# === Import Required Libraries ===
import numpy as np
import pandas as pd

import torch
import torch.nn as nn

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

# === Constants ===
DATA_URL = "https://raw.githubusercontent.com/mohd-faizy/PyTorch-Essentials/main/PyTorch_x1/_dataset/Breast-Cancer-Detection.csv"
DATA_FILE = "Breast-Cancer-Detection.csv"

TEST_SIZE = 0.2
SEED = 42
LEARNING_RATE = 0.1
EPOCHS = 100

# === Download Dataset (use only in notebook) ===
# If using in .py file, download manually
import os
if not os.path.exists(DATA_FILE):
    import urllib.request
    urllib.request.urlretrieve(DATA_URL, DATA_FILE)

# === Function: Data Preprocessing Pipeline ===
def load_and_preprocess_data(file_path):
    df = pd.read_csv(file_path)
    df.drop(columns=['id', 'Unnamed: 32'], inplace=True)

    X = df.iloc[:, 1:].values
    y = df.iloc[:, 0].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=SEED)

    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    encoder = LabelEncoder()
    y_train = encoder.fit_transform(y_train)
    y_test = encoder.transform(y_test)

    X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
    y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

    return X_train_tensor, X_test_tensor, y_train_tensor, y_test_tensor

# === Load and Prepare Data ===
X_train_tensor, X_test_tensor, y_train_tensor, y_test_tensor = load_and_preprocess_data(DATA_FILE)

# === Neural Network Model ===
class MySimpleNN(nn.Module):
    def __init__(self, num_features):
        super().__init__()
        self.layer1 = nn.Linear(num_features, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, features):
        out = self.layer1(features)
        out = self.sigmoid(out)
        return out  # ❗️Missing in original

    def loss_function(self, y_pred, y):
        epsilon = 1e-7
        y_pred = torch.clamp(y_pred, epsilon, 1 - epsilon)
        loss = -(y * torch.log(y_pred) + (1 - y) * torch.log(1 - y_pred)).mean()
        return loss

# === Initialize Model ===
model = MySimpleNN(X_train_tensor.shape[1])

# === Training Loop ===
for epoch in range(EPOCHS):
    y_pred = model(X_train_tensor)
    loss = model.loss_function(y_pred, y_train_tensor)

    loss.backward()

    with torch.no_grad():
        model.layer1.weight -= LEARNING_RATE * model.layer1.weight.grad
        model.layer1.bias -= LEARNING_RATE * model.layer1.bias.grad

    model.layer1.weight.grad.zero_()
    model.layer1.bias.grad.zero_()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch + 1:03d} - Loss: {loss.item():.4f}")

# === Model Evaluation ===
with torch.no_grad():
    y_pred_test = model(X_test_tensor)
    y_pred_labels = (y_pred_test > 0.5).float()  # you can keep 0.9 if confident

    accuracy = (y_pred_labels == y_test_tensor).float().mean()
    print(f"\nTest Accuracy: {accuracy.item():.4f}")

Epoch 010 - Loss: 0.2559
Epoch 020 - Loss: 0.1920
Epoch 030 - Loss: 0.1630
Epoch 040 - Loss: 0.1459
Epoch 050 - Loss: 0.1344
Epoch 060 - Loss: 0.1261
Epoch 070 - Loss: 0.1196
Epoch 080 - Loss: 0.1145
Epoch 090 - Loss: 0.1102
Epoch 100 - Loss: 0.1067

Test Accuracy: 0.9737


In [30]:
"""
Optimised code
"""

# === Import Required Libraries ===
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

# === Constants ===
DATA_URL = "https://raw.githubusercontent.com/mohd-faizy/PyTorch-Essentials/refs/heads/main/PyTorch_x1/_dataset/Breast-Cancer-Detection.csv"
DATA_FILE = "Breast-Cancer-Detection.csv"

TEST_SIZE = 0.2
SEED = 42
LEARNING_RATE = 0.1
EPOCHS = 100

# === Download Dataset ===
!wget -q -O {DATA_FILE} {DATA_URL}

# === Data Preprocessing Function ===
def load_and_preprocess_data(file_path):
    """
    Loads and preprocesses the breast cancer dataset.

    Steps:
    - Drops irrelevant columns
    - Encodes labels
    - Standardizes features
    - Converts data to PyTorch tensors
    """
    df = pd.read_csv(file_path)
    df.drop(columns=['id', 'Unnamed: 32'], inplace=True)

    X = df.iloc[:, 1:].values
    y = df.iloc[:, 0].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=SEED)

    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    encoder = LabelEncoder()
    y_train = encoder.fit_transform(y_train)
    y_test = encoder.transform(y_test)

    # Convert to float32 PyTorch tensors
    X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
    y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

    return X_train_tensor, X_test_tensor, y_train_tensor, y_test_tensor

# === Neural Network Definition ===
class BreastCancerClassifier(nn.Module):
    """
    A simple feedforward neural network for binary classification.
    """
    def __init__(self, num_features):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(num_features, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)

# === Load and Prepare Data ===
X_train_tensor, X_test_tensor, y_train_tensor, y_test_tensor = load_and_preprocess_data(DATA_FILE)

# === Model Initialization and Loss Function ===
model = BreastCancerClassifier(X_train_tensor.shape[1])
loss_function = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE)  # <--- Replaces manual SGD

# === Training Loop ===
for epoch in range(EPOCHS):

    # Forward pass
    y_pred = model(X_train_tensor)

    # loss calculate
    loss = loss_function(y_pred, y_train_tensor.view(-1, 1))

    # clear gradients
    optimizer.zero_grad()

    # Backward pass
    loss.backward()

    # Parameters update
    optimizer.step()

    # Print progress every 10 epochs
    if epoch % 10 == 0:
        print(f"Epoch {epoch + 10} | Loss: {loss.item():.4f}")

# === Evaluation ===
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    predictions = (test_outputs > 0.5).float()
    accuracy = (predictions == y_test_tensor).float().mean()
    print(f"\nTest Accuracy: {accuracy.item():.4f}")

Epoch 10 | Loss: 0.6711
Epoch 20 | Loss: 0.2466
Epoch 30 | Loss: 0.1866
Epoch 40 | Loss: 0.1596
Epoch 50 | Loss: 0.1437
Epoch 60 | Loss: 0.1329
Epoch 70 | Loss: 0.1250
Epoch 80 | Loss: 0.1188
Epoch 90 | Loss: 0.1139
Epoch 100 | Loss: 0.1098

Test Accuracy: 0.9912
