![alt text](pytorch_seo.avif)

## PyTorch for Deep Learning: A Comprehensive Lecture 

### What is PyTorch?

 PyTorch is an open-source machine learning library primarily used for applications like computer vision and natural language processing. It's developed by Facebook's AI Research lab (FAIR) and is known for its:

1. Pythonic Nature: It feels very natural to Python developers, integrating seamlessly with the Python ecosystem.

2. Dynamic Computation Graph: Unlike some other frameworks that use static graphs, PyTorch uses a dynamic computation graph. This means the graph is built on the fly as operations are performed, offering incredible flexibility for debugging and handling variable-length inputs.

3. Ease of Use: It's often praised for its simplicity and intuitive API, making it easier to learn and experiment with.

4. Strong GPU Acceleration: It leverages the power of GPUs for efficient computation, crucial for training large deep learning models.



![alt text](content-anchor-GPU-diagram.png)![alt text](Parallel-computing-1-1.jpg)

## Any exmaple of Parallel Computing ? 

### Why Choose PyTorch?


- Flexibility and Debugging: The dynamic graph allows for easier debugging using standard Python debugging tools. You can inspect values at any point in your network.

- Research-Friendly: Its flexibility makes it a favorite among researchers for rapid prototyping and experimentation with novel architectures.

- Growing Community and Ecosystem: PyTorch has a vibrant and rapidly growing community, with extensive documentation, tutorials, and pre-trained models.

## Core Concepts of PyTorch

### Core Concepts of PyTorch

PyTorch operates on Tensors. Think of a Tensor as a multi-dimensional array, very similar to NumPy arrays, but with the added capability to run on GPUs and track gradients for automatic differentiation.

Scalars: 0-D Tensor (single number)

Vectors: 1-D Tensor (list of numbers)

Matrices: 2-D Tensor (table of numbers)

Higher-dimensional Tensors: For images (height x width x channels), video (frames x height x width x channels), etc.

### Key Tensor Operations:
You can perform various operations on Tensors, just like NumPy arrays:

Arithmetic operations (+, -, *, /)

Matrix multiplication (torch.matmul())

Reshaping (.view(), .reshape())

Indexing and Slicing

### Tensor Creation and Operations

In [None]:
import numpy as np

# Create a NumPy array
np_array = np.array([[[1, 2], [3, 4]]])
print("NumPy Array:\n", np_array)
print("NumPy Array Type:", type(np_array))
print("NumPy Array Shape:", np_array.shape)

print("-" * 30)

NumPy Array:
 [[1 2]
 [3 4]]
NumPy Array Type: <class 'numpy.ndarray'>
NumPy Array Shape: (2, 2)
------------------------------


## Importing PyTorch

In [None]:
import torch
torch.__version__

## Introduction to tensors 

Tensors are the fundamental building block of machine learning.

- We represent data in a numerical way.

For example, you could represent an image as a tensor with shape `[3, 224, 224]` which would mean `[colour_channels, height, width]`, as in the image has `3` colour channels (red, green, blue), a height of `224` pixels and a width of `224` pixels.



![alt text](00-tensor-shape-example-of-image.png)

The tensor would have three dimensions, one for `colour_channels`, `height` and `width`.

### Creating tensors

In [None]:
# Create a 5x3 matrix, uninitialized
x_empty = torch.empty(5, 3)
print("Uninitialized Tensor (5x3):\n", x_empty)


Uninitialized Tensor (5x3):
 tensor([[6.1663e-33, 1.7698e-42, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])
------------------------------


In [3]:
# Create a randomly initialized matrix
x_rand = torch.rand(5, 3)
print("Randomly Initialized Tensor (5x3):\n", x_rand)
print("-" * 30)

Randomly Initialized Tensor (5x3):
 tensor([[0.4775, 0.9785, 0.6593],
        [0.7840, 0.5105, 0.0955],
        [0.5920, 0.6240, 0.8588],
        [0.6076, 0.3772, 0.2686],
        [0.8403, 0.2257, 0.5921]])
------------------------------


In [4]:
# Create a tensor directly from data
x_data = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32) # Specify dtype for consistency
print("Tensor from data:\n", x_data)
print("-" * 30)

Tensor from data:
 tensor([[1., 2.],
        [3., 4.]])
------------------------------


In [5]:
# Operations
y_rand = torch.rand(5, 3)
print("Tensor y_rand (5x3):\n", y_rand)
print("-" * 30)


Tensor y_rand (5x3):
 tensor([[0.7757, 0.8902, 0.1962],
        [0.3580, 0.3460, 0.1082],
        [0.2560, 0.4202, 0.6879],
        [0.5671, 0.1826, 0.9218],
        [0.1693, 0.4697, 0.4104]])
------------------------------


In [6]:
print("Element-wise addition (x_rand + y_rand):\n", x_rand + y_rand)
print("-" * 30)

Element-wise addition (x_rand + y_rand):
 tensor([[1.2533, 1.8687, 0.8555],
        [1.1420, 0.8566, 0.2037],
        [0.8479, 1.0443, 1.5467],
        [1.1747, 0.5598, 1.1905],
        [1.0096, 0.6954, 1.0025]])
------------------------------


In [7]:
# Matrix multiplication requires compatible dimensions
mat1 = torch.rand(2, 3)
mat2 = torch.rand(3, 2)
print("Matrix 1 (2x3):\n", mat1)
print("Matrix 2 (3x2):\n", mat2)
print("Matrix multiplication (mat1 @ mat2):\n", torch.matmul(mat1, mat2))
print("-" * 30)


Matrix 1 (2x3):
 tensor([[0.2281, 0.2147, 0.7368],
        [0.6948, 0.5536, 0.4486]])
Matrix 2 (3x2):
 tensor([[0.5649, 0.0846],
        [0.1272, 0.9118],
        [0.3729, 0.9616]])
Matrix multiplication (mat1 @ mat2):
 tensor([[0.4308, 0.9235],
        [0.6301, 0.9949]])
------------------------------


In [8]:
# Reshaping
z = torch.arange(9).reshape(3, 3) # Create a 3x3 tensor with values 0-8
print("Original Tensor z (3x3):\n", z)
print("Reshaped z to (9,):\n", z.view(9)) # or z.reshape(9)
print("-" * 30)

Original Tensor z (3x3):
 tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
Reshaped z to (9,):
 tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
------------------------------


In [9]:
# Indexing and Slicing
print("First row of x_rand:", x_rand[0])
print("Element at (1, 2) of x_rand:", x_rand[1, 2])
print("First two rows, all columns of x_rand:\n", x_rand[:2, :])


First row of x_rand: tensor([0.4775, 0.9785, 0.6593])
Element at (1, 2) of x_rand: tensor(0.0955)
First two rows, all columns of x_rand:
 tensor([[0.4775, 0.9785, 0.6593],
        [0.7840, 0.5105, 0.0955]])


### Exercise 1: Tensor Manipulation

1. Create a 4x4 tensor filled with ones.

2. Multiply this tensor by 5.

3. Create another 4x4 tensor with random values.

4. Perform element-wise multiplication between the two tensors.

5. Reshape the resulting tensor into a 1D tensor.

6. Print the shape of the final tensor.

## 2. Autograd: Automatic Differentiation
This is where PyTorch truly shines for deep learning. The torch.autograd package provides automatic differentiation for all operations on Tensors. This is crucial for training neural networks, as it allows us to compute gradients for backpropagation efficiently.

``requires_grad=True:`` If you set **requires_grad=True** for a Tensor, PyTorch will track all operations performed on it. When you finish your computation and call **.backward()** on the resulting scalar, **all gradients will be computed automatically**.

``grad attribute:`` The gradients are accumulated into the **.grad** attribute of the Tensor.

This automatic differentiation is the backbone of how neural networks learn by adjusting their weights based on the loss.

### Autograd in Action



In [21]:
# Create a tensor and tell PyTorch to track its gradients
x = torch.tensor(2.0, requires_grad=True)
print(f"Initial x: {x}, requires_grad: {x.requires_grad}")

Initial x: 2.0, requires_grad: True


In [22]:
# A simple computation graph: y depends on x
y = x**2 + 3*x + 5
print(f"Computed y: {y}")


Computed y: 15.0


In [23]:
# Compute gradients
# y must be a scalar for .backward() without specifying a gradient argument
y.backward() # This computes dy/dx

In [24]:
# Access the gradient
# dy/dx = 2*x + 3
# For x = 2.0, dy/dx = 2*2.0 + 3 = 7.0
print(f"Gradient of y with respect to x (x.grad): {x.grad}")

Gradient of y with respect to x (x.grad): 7.0


In [25]:
# Another example with multiple variables
a = torch.tensor(3.0, requires_grad=True)
b = torch.tensor(4.0, requires_grad=True)
c = a * b
d = c + a**2
d.backward() # Computes gradients for a and b

In [26]:
# d = a*b + a^2
# d(d)/d(a) = b + 2*a = 4 + 2*3 = 10
# d(d)/d(b) = a = 3
print(f"\nGradient of d with respect to a (a.grad): {a.grad}")
print(f"Gradient of d with respect to b (b.grad): {b.grad}")


Gradient of d with respect to a (a.grad): 10.0
Gradient of d with respect to b (b.grad): 3.0


In [27]:
# Detaching a tensor from the computation graph
# If you don't want to track gradients for certain operations
z = torch.tensor(5.0, requires_grad=True)
with torch.no_grad(): # Operations inside this context manager will not track gradients
    w = z * 2
print(f"\nTensor w (created with no_grad): {w}, requires_grad: {w.requires_grad}")



Tensor w (created with no_grad): 10.0, requires_grad: False


In [28]:
# If you want to detach an existing tensor
p = torch.tensor(10.0, requires_grad=True)
q = p.detach() # q will be a new tensor that does not require gradients
print(f"Tensor q (detached from p): {q}, requires_grad: {q.requires_grad}")

Tensor q (detached from p): 10.0, requires_grad: False


### Exercise 2: Gradient Calculation

1. Define two tensors, $u$ and $v$, both with requires_grad=True. Assign them initial scalar values (e.g., $u=3.0, v=2.0$).

2. Define a new tensor $f$ based on u and v using the formula:  $ f=u^3v^2 +5u+2v.$
3. Compute the gradients of $f$ with respect to u and v.

4. Print the gradients ``u.grad`` and ``v.grad``.

5. Manually calculate the expected gradients and verify your results.

In [29]:
# Exercise 2: Your code here
# 1. Define two tensors, u and v, both with requires_grad=True.
u = torch.tensor(3.0, requires_grad=True)
v = torch.tensor(2.0, requires_grad=True)
print(f"Initial u: {u}, v: {v}")

# 2. Define a new tensor f based on u and v using the formula: f = u^3 * v^2 + 5u + 2v.
f = u**3 * v**2 + 5*u + 2*v
print(f"Computed f: {f}")

# 3. Compute the gradients of f with respect to u and v.
f.backward()

# 4. Print the gradients u.grad and v.grad.
print(f"Gradient of f with respect to u (u.grad): {u.grad}")
print(f"Gradient of f with respect to v (v.grad): {v.grad}")

# 5. Manually calculate the expected gradients and verify your results.
# f = u^3 * v^2 + 5u + 2v
# df/du = 3*u^2 * v^2 + 5
# df/dv = u^3 * 2*v + 2

# For u=3.0, v=2.0:
# df/du = 3*(3.0)^2 * (2.0)^2 + 5 = 3*9*4 + 5 = 108 + 5 = 113
# df/dv = (3.0)^3 * 2*(2.0) + 2 = 27 * 4 + 2 = 108 + 2 = 110

# The printed gradients should match these manual calculations.


Initial u: 3.0, v: 2.0
Computed f: 127.0
Gradient of f with respect to u (u.grad): 113.0
Gradient of f with respect to v (v.grad): 110.0


### 3. ``nn.Module:`` Building Neural Networks

The torch.nn module provides all the necessary components for building neural networks. The base class for all neural network modules is torch.nn.Module.

Encapsulation: A Module can contain other Modules (e.g., a Sequential module containing Linear and ReLU modules).

Parameters: Modules automatically register their learnable parameters (like weights and biases) as nn.Parameters.

forward() method: Every nn.Module subclass must override the forward() method. This method defines how the input data flows through the network to produce an output.

In [30]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        # Define layers
        # Input features=10, output features=5
        self.fc1 = nn.Linear(10, 5)
        # Input features=5, output features=1
        self.fc2 = nn.Linear(5, 1)

    def forward(self, x):
        # Define forward pass
        # Apply ReLU activation after first layer
        x = F.relu(self.fc1(x))
        # Apply Sigmoid for binary classification output (output between 0 and 1)
        x = torch.sigmoid(self.fc2(x))
        return x

# Create an instance of the network
model = SimpleNet()
print("Model Architecture:\n", model)

# Print learnable parameters
print("\nModel Parameters:")
for name, param in model.named_parameters():
    if param.requires_grad:
        print(f"  {name}: {param.shape}")

# Test the forward pass with dummy data
dummy_input = torch.randn(1, 10) # Batch size of 1, 10 features
output = model(dummy_input)
print(f"\nOutput shape for dummy input: {output.shape}")
print(f"Output for dummy input:\n{output}")

# Example of using nn.Sequential for a simpler network
sequential_model = nn.Sequential(
    nn.Linear(10, 5),
    nn.ReLU(),
    nn.Linear(5, 1),
    nn.Sigmoid()
)
print("\nSequential Model Architecture:\n", sequential_model)

Model Architecture:
 SimpleNet(
  (fc1): Linear(in_features=10, out_features=5, bias=True)
  (fc2): Linear(in_features=5, out_features=1, bias=True)
)

Model Parameters:
  fc1.weight: torch.Size([5, 10])
  fc1.bias: torch.Size([5])
  fc2.weight: torch.Size([1, 5])
  fc2.bias: torch.Size([1])

Output shape for dummy input: torch.Size([1, 1])
Output for dummy input:
tensor([[0.4117]], grad_fn=<SigmoidBackward0>)

Sequential Model Architecture:
 Sequential(
  (0): Linear(in_features=10, out_features=5, bias=True)
  (1): ReLU()
  (2): Linear(in_features=5, out_features=1, bias=True)
  (3): Sigmoid()
)


### Exercise 3: Building a Multi-Layer Perceptron (MLP)

Create an nn.Module class for a simple MLP with the following structure:

Input layer with 784 features (e.g., for flattened MNIST images).

Hidden layer 1 with 128 neurons, followed by a ReLU activation.

Hidden layer 2 with 64 neurons, followed by a ReLU activation.

Output layer with 10 neurons (e.g., for 10-class classification), followed by a Softmax activation (use F.log_softmax or nn.Softmax).

Instantiate your model and print its architecture. Pass a dummy input of shape (1, 784) through it and print the output shape.

In [31]:
# Exercise 3: Your code here
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        # Input: 784 features
        # Hidden Layer 1: 128 neurons, ReLU
        self.fc1 = nn.Linear(784, 128)
        # Hidden Layer 2: 64 neurons, ReLU
        self.fc2 = nn.Linear(128, 64)
        # Output Layer: 10 neurons (for 10 classes)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        # Flatten the input if it's not already 1D per sample (e.g., for images)
        # x = x.view(x.shape[0], -1) # Uncomment if input is not already flattened

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        # For multi-class classification, often use log_softmax for numerical stability with NLLLoss
        x = F.log_softmax(self.fc3(x), dim=1)
        return x

# Create an instance of the MLP
mlp_model = MLP()
print("MLP Model Architecture:\n", mlp_model)

# Test with dummy input
dummy_input_mlp = torch.randn(1, 784) # Batch size of 1, 784 features
output_mlp = mlp_model(dummy_input_mlp)
print(f"\nOutput shape for dummy MLP input: {output_mlp.shape}")
print(f"Output for dummy MLP input (first 5 values):\n{output_mlp[0, :5]}")

MLP Model Architecture:
 MLP(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=10, bias=True)
)

Output shape for dummy MLP input: torch.Size([1, 10])
Output for dummy MLP input (first 5 values):
tensor([-2.2302, -2.1654, -2.2692, -2.4614, -2.4436], grad_fn=<SliceBackward0>)


### Building a Simple Neural Network in PyTorch (Conceptual Flow)

###  the typical steps involved in building and training a deep learning model with PyTorch.

#### Step -1  Data Preparation:

- Load your dataset (images, text, tabular data).

- Preprocess the data (normalization, tokenization, resizing, etc.).

- Split into training, validation, and test sets.

- Use torch.utils.data.Dataset and torch.utils.data.DataLoader for efficient data loading and batching

#### Step - 2 Define the Model:

- Create a class that inherits from nn.Module.

- In the __init__ method, define the layers of your network (e.g., nn.Linear, nn.Conv2d, nn.ReLU, nn.MaxPool2d).

- In the forward method, define the computational flow of data through these layers.

#### Step -3 Define Loss Function:

- Choose an appropriate loss function (also known as criterion) to measure the difference between your model's predictions and the true labels.

- Examples: nn.CrossEntropyLoss (for multi-class classification), nn.MSELoss (for regression), nn.BCELoss (for binary classification).

#### Step - 4 Define Optimizer:

- Choose an optimization algorithm that will update the model's parameters (weights and biases) to minimize the loss.

- Examples: torch.optim.SGD (Stochastic Gradient Descent), torch.optim.Adam, torch.optim.RMSprop.

#### Step - 5 Training Loop:

- Iterate over your dataset for a specified number of epochs.

- For each batch:

      - Forward Pass: Pass input data through the model to get predictions.

      - Calculate Loss: Compute the loss between predictions and true labels.

      - Zero Gradients: Clear the gradients from the previous iteration (optimizer.zero_grad()).

      - Backward Pass: Compute gradients of the loss with respect to all learnable parameters (loss.backward()).

      - Optimizer Step: Update the model's parameters using the computed gradients (optimizer.step()).

- Optionally, evaluate the model on the validation set periodically

#### Step 6 Evaluation:

After training, evaluate the model's performance on the unseen test set to get an unbiased estimate of its generalization capability.

## Key PyTorch Components in Detail

torch.optim: Optimizers

The torch.optim package provides various optimization algorithms. These algorithms adjust the model's parameters based on the gradients computed during the backward pass to minimize the loss function.

In [32]:
import torch
import torch.nn as nn
import torch.optim as optim

# Assume SimpleNet is defined as before
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

model = SimpleNet()

# Example: SGD optimizer
# model.parameters() gives the optimizer access to all the learnable parameters of your nn.Module.
# lr (learning rate) controls the step size during parameter updates.
# momentum (optional) helps accelerate SGD in the relevant direction and dampens oscillations.
sgd_optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
print("SGD Optimizer created.")

# Example: Adam optimizer
# Adam is an adaptive learning rate optimization algorithm.
adam_optimizer = optim.Adam(model.parameters(), lr=0.001)
print("Adam Optimizer created.")

# You would typically choose one optimizer for your training.
# For demonstration, let's show a conceptual training step:
dummy_input = torch.randn(1, 10)
dummy_target = torch.tensor([[0.8]], dtype=torch.float32) # Example target for binary classification

# 1. Forward pass
output = model(dummy_input)

# 2. Calculate Loss (using Binary Cross Entropy Loss for sigmoid output)
criterion = nn.BCELoss()
loss = criterion(output, dummy_target)
print(f"\nInitial loss: {loss.item():.4f}")

# 3. Zero Gradients
# It's crucial to zero the gradients before backpropagation,
# otherwise, gradients from previous steps would accumulate.
sgd_optimizer.zero_grad()

# 4. Backward Pass
# This computes d(loss)/d(param) for all parameters that require_grad=True
loss.backward()

# 5. Optimizer Step
# This updates the model's parameters using the calculated gradients
# according to the chosen optimization algorithm (e.g., SGD, Adam).
sgd_optimizer.step()

# After the step, the model's weights and biases have been updated
print("Parameters updated after one step.")


SGD Optimizer created.
Adam Optimizer created.

Initial loss: 0.6591
Parameters updated after one step.


## Exercise 4: Experiment with Optimizers

- Take the SimpleNet model from the previous section.

- Create a dummy dataset (e.g., 100 samples, 10 features, binary labels).

- Set up nn.BCELoss as your criterion.

- Train the SimpleNet for a few epochs (e.g., 5-10) using torch.optim.SGD with lr=0.01. Print the loss for each epoch.

- Reset the model (re-instantiate SimpleNet).

- Train the SimpleNet for the same number of epochs using torch.optim.Adam with lr=0.001. Print the loss for each epoch.

- Observe and comment on the difference in loss reduction between SGD and Adam (if any) over these few epochs.

In [33]:
# Exercise 4: Your code here
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

# Define SimpleNet again for clarity in this exercise block
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

# 2. Create a dummy dataset
num_samples = 100
num_features = 10
X_dummy = torch.randn(num_samples, num_features)
# Generate binary labels (0 or 1)
y_dummy = (torch.rand(num_samples, 1) > 0.5).float()

# Create a TensorDataset and DataLoader
dummy_dataset = TensorDataset(X_dummy, y_dummy)
dummy_dataloader = DataLoader(dummy_dataset, batch_size=10, shuffle=True)

# 3. Set up nn.BCELoss as your criterion.
criterion = nn.BCELoss()
num_epochs = 10

print("--- Training with SGD ---")
# 4. Train with SGD
model_sgd = SimpleNet()
optimizer_sgd = optim.SGD(model_sgd.parameters(), lr=0.01)

for epoch in range(num_epochs):
    total_loss_sgd = 0
    for inputs, targets in dummy_dataloader:
        optimizer_sgd.zero_grad()
        outputs = model_sgd(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer_sgd.step()
        total_loss_sgd += loss.item()
    print(f"Epoch {epoch+1}/{num_epochs}, SGD Loss: {total_loss_sgd / len(dummy_dataloader):.4f}")


print("\n--- Training with Adam ---")
# 5. Reset the model (re-instantiate SimpleNet).
model_adam = SimpleNet()
# 6. Train with Adam
optimizer_adam = optim.Adam(model_adam.parameters(), lr=0.001)

for epoch in range(num_epochs):
    total_loss_adam = 0
    for inputs, targets in dummy_dataloader:
        optimizer_adam.zero_grad()
        outputs = model_adam(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer_adam.step()
        total_loss_adam += loss.item()
    print(f"Epoch {epoch+1}/{num_epochs}, Adam Loss: {total_loss_adam / len(dummy_dataloader):.4f}")

# 7. Observe and comment on the difference in loss reduction.
print("\n--- Observation ---")
print("In this simple example, Adam typically shows faster convergence and a more stable decrease in loss compared to SGD, especially in the initial epochs. SGD might oscillate more but can sometimes reach a better minimum given enough time and proper tuning.")


--- Training with SGD ---
Epoch 1/10, SGD Loss: 0.7453
Epoch 2/10, SGD Loss: 0.7392
Epoch 3/10, SGD Loss: 0.7340
Epoch 4/10, SGD Loss: 0.7291
Epoch 5/10, SGD Loss: 0.7246
Epoch 6/10, SGD Loss: 0.7202
Epoch 7/10, SGD Loss: 0.7163
Epoch 8/10, SGD Loss: 0.7127
Epoch 9/10, SGD Loss: 0.7092
Epoch 10/10, SGD Loss: 0.7061

--- Training with Adam ---
Epoch 1/10, Adam Loss: 0.7981
Epoch 2/10, Adam Loss: 0.7903
Epoch 3/10, Adam Loss: 0.7835
Epoch 4/10, Adam Loss: 0.7769
Epoch 5/10, Adam Loss: 0.7707
Epoch 6/10, Adam Loss: 0.7653
Epoch 7/10, Adam Loss: 0.7607
Epoch 8/10, Adam Loss: 0.7560
Epoch 9/10, Adam Loss: 0.7524
Epoch 10/10, Adam Loss: 0.7474

--- Observation ---
In this simple example, Adam typically shows faster convergence and a more stable decrease in loss compared to SGD, especially in the initial epochs. SGD might oscillate more but can sometimes reach a better minimum given enough time and proper tuning.


### torch.nn.functional: Functional API for Layers
While torch.nn provides module-based layers (which have internal state, like weights), torch.nn.functional provides the functional versions of many operations, often used directly in the forward method of nn.Modules, especially for activation functions, pooling layers, or convolutions that don't have learnable parameters themselves.

In [34]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class FunctionalNet(nn.Module):
    def __init__(self):
        super(FunctionalNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5) # Learnable parameters here
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5) # Learnable parameters here
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        # Use F.max_pool2d and F.relu (functional)
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 320) # Flatten the tensor for the linear layer
        x = F.relu(self.fc1(x))
        x = self.fc2(x) # Output layer (often no activation here if using CrossEntropyLoss)
        return F.log_softmax(x, dim=1) # Use log_softmax for classification output

# Create an instance of the network
functional_model = FunctionalNet()
print("FunctionalNet Architecture:\n", functional_model)

# Test with dummy image data (e.g., 1 channel, 28x28 image)
dummy_image_input = torch.randn(1, 1, 28, 28) # Batch size 1, 1 channel, 28x28
output_image = functional_model(dummy_image_input)
print(f"\nOutput shape for dummy image input: {output_image.shape}")


FunctionalNet Architecture:
 FunctionalNet(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=320, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=10, bias=True)
)

Output shape for dummy image input: torch.Size([1, 10])


#### torch.utils.data.Dataset and DataLoader: Data Handling

These two classes are essential for efficient and organized data loading, especially for large datasets.

- Dataset: An abstract class representing a dataset. You typically subclass it and implement:

  - __len__: Returns the total number of samples in the dataset.

  - __getitem__: Returns a sample from the dataset at a given index.

- DataLoader: Wraps a Dataset and provides an iterable over the dataset, supporting:

   - Batching: Grouping samples into mini-batches.

   - Shuffling: Randomizing the order of samples.

   - Multi-process data loading: num_workers for faster data loading.

### Example Code: Custom Dataset and DataLoader

In [35]:
from torch.utils.data import Dataset, DataLoader
import torch
import numpy as np

# 1. Define a Custom Dataset class
class CustomDataset(Dataset):
    def __init__(self, data, labels):
        # Convert data and labels to PyTorch tensors
        # Use float32 for data (inputs to neural networks)
        self.data = torch.tensor(data, dtype=torch.float32)
        # Use long for classification labels (integers)
        # Use float32 for regression labels
        self.labels = torch.tensor(labels, dtype=torch.long)

    def __len__(self):
        # Return the total number of samples in the dataset
        return len(self.labels)

    def __getitem__(self, idx):
        # Return a single sample (data, label) at the given index
        return self.data[idx], self.labels[idx]

# 2. Prepare Dummy Data
# 100 samples, 10 features each
dummy_data = np.random.rand(100, 10)
# 100 labels, randomly 0 or 1 (binary classification)
dummy_labels = np.random.randint(0, 2, 100)

print(f"Shape of dummy_data: {dummy_data.shape}")
print(f"Shape of dummy_labels: {dummy_labels.shape}")

# 3. Create an instance of your CustomDataset
dataset = CustomDataset(dummy_data, dummy_labels)
print(f"Dataset size: {len(dataset)} samples")
print(f"First sample from dataset: {dataset[0]}")

# 4. Create a DataLoader
# batch_size: number of samples per batch
# shuffle: True to shuffle data at each epoch (good for training)
# num_workers: how many subprocesses to use for data loading (0 means main process)
dataloader = DataLoader(dataset, batch_size=16, shuffle=True, num_workers=0)

print(f"\nDataLoader created with batch_size={dataloader.batch_size}")

# 5. Iterate through the dataloader during a conceptual training loop
print("\nIterating through DataLoader (first 2 batches):")
for batch_idx, (inputs, targets) in enumerate(dataloader):
    # inputs and targets are now tensors of batch_size
    print(f"  Batch {batch_idx + 1}:")
    print(f"    Inputs shape: {inputs.shape}")
    print(f"    Targets shape: {targets.shape}")
    if batch_idx >= 1: # Print only first 2 batches for brevity
        break

print("\nDataLoader iteration complete.")

Shape of dummy_data: (100, 10)
Shape of dummy_labels: (100,)
Dataset size: 100 samples
First sample from dataset: (tensor([0.0634, 0.2971, 0.9790, 0.5872, 0.7071, 0.3855, 0.8584, 0.3002, 0.7349,
        0.1650]), tensor(1))

DataLoader created with batch_size=16

Iterating through DataLoader (first 2 batches):
  Batch 1:
    Inputs shape: torch.Size([16, 10])
    Targets shape: torch.Size([16])
  Batch 2:
    Inputs shape: torch.Size([16, 10])
    Targets shape: torch.Size([16])

DataLoader iteration complete.


### Exercise 5: Data Loading and Batching

- Generate a synthetic dataset: 500 samples, each with 5 features, and corresponding labels (e.g., 0, 1, or 2 for a 3-class problem).

- Create a CustomDataset class (similar to the example, but adapt for 5 features and 3 classes).

- Create a DataLoader with a batch_size of 32 and shuffle=True.

- Iterate through the DataLoader for one full epoch. For each batch, print the batch number, the shape of the inputs, and the shape of the targets.

In [36]:
# Exercise 5: Your code here
import torch
import numpy as np
from torch.utils.data import Dataset, DataLoader

# 1. Generate a synthetic dataset
num_samples_ex5 = 500
num_features_ex5 = 5
num_classes_ex5 = 3

X_ex5 = torch.randn(num_samples_ex5, num_features_ex5)
y_ex5 = torch.randint(0, num_classes_ex5, (num_samples_ex5,)) # Labels 0, 1, or 2

print(f"Synthetic data shape: {X_ex5.shape}")
print(f"Synthetic labels shape: {y_ex5.shape}")

# 2. Create a CustomDataset class
class ExerciseDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data.float()
        self.labels = labels.long()

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

# Create an instance of the dataset
exercise_dataset = ExerciseDataset(X_ex5, y_ex5)

# 3. Create a DataLoader
batch_size_ex5 = 32
exercise_dataloader = DataLoader(exercise_dataset, batch_size=batch_size_ex5, shuffle=True)

print(f"\nDataLoader created with batch_size={batch_size_ex5}")

# 4. Iterate through the DataLoader for one full epoch
print("\nIterating through DataLoader for one epoch:")
for batch_idx, (inputs, targets) in enumerate(exercise_dataloader):
    print(f"  Batch {batch_idx + 1}:")
    print(f"    Inputs shape: {inputs.shape}")
    print(f"    Targets shape: {targets.shape}")

print("\nFinished iterating through all batches in the epoch.")

Synthetic data shape: torch.Size([500, 5])
Synthetic labels shape: torch.Size([500])

DataLoader created with batch_size=32

Iterating through DataLoader for one epoch:
  Batch 1:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 2:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 3:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 4:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 5:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 6:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 7:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 8:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 9:
    Inputs shape: torch.Size([32, 5])
    Targets shape: torch.Size([32])
  Batch 10:
    Inputs shape: torch.Size([32, 5])
    Targets shap

#### Advantages of PyTorch
Dynamic Computation Graph: As discussed, this offers flexibility and easier debugging.

Pythonic Interface: Feels natural to Python developers, making it quick to pick up.

Imperative Programming Style: Operations are executed immediately, which aids in understanding and debugging.

Strong Community and Ecosystem: Excellent documentation, tutorials, and a growing number of libraries built on top of PyTorch (e.g., Hugging Face Transformers, PyTorch Lightning).

Production Readiness: While often seen as a research framework, PyTorch is increasingly used in production environments, with tools like TorchScript for deployment.

In [37]:
import torch

def dynamic_computation(x):
    # This is a standard Python if-else statement.
    # The computation graph will be built differently depending on 'x'.
    if x.sum() > 10:
        # First possible computation path
        y = x * 2
        z = y + 5
        print("Path A: x.sum() > 10 was True. Computation graph includes multiplication and addition.")
    else:
        # Second possible computation path
        y = x**2
        z = y - 3
        print("Path B: x.sum() > 10 was False. Computation graph includes squaring and subtraction.")

    # Now, we compute gradients for the final result 'z'
    # The gradients will be specific to the path taken.
    z.backward(torch.ones_like(z)) # Pass a gradient to z to start the backward pass
    return z, x.grad

# --- Scenario 1: Path A is taken ---
print("--- Running Scenario 1 ---")
x1 = torch.tensor([4.0, 3.0, 5.0], requires_grad=True) # x.sum() = 12 > 10
z1, grad1 = dynamic_computation(x1)
print(f"Final output z1: {z1}")
# For Path A, z = (x * 2) + 5
# dz/dx = 2
print(f"Gradient of z1 w.r.t x1: {grad1}\n") # Expected: tensor([2., 2., 2.])


# --- Scenario 2: Path B is taken ---
print("--- Running Scenario 2 ---")
x2 = torch.tensor([1.0, 2.0, 1.0], requires_grad=True) # x.sum() = 4 < 10
z2, grad2 = dynamic_computation(x2)
print(f"Final output z2: {z2}")
# For Path B, z = (x**2) - 3
# dz/dx = 2*x
print(f"Gradient of z2 w.r.t x2: {grad2}\n") # Expected: tensor([2., 4., 2.])

--- Running Scenario 1 ---
Path A: x.sum() > 10 was True. Computation graph includes multiplication and addition.
Final output z1: tensor([13., 11., 15.], grad_fn=<AddBackward0>)
Gradient of z1 w.r.t x1: tensor([2., 2., 2.])

--- Running Scenario 2 ---
Path B: x.sum() > 10 was False. Computation graph includes squaring and subtraction.
Final output z2: tensor([-2.,  1., -2.], grad_fn=<SubBackward0>)
Gradient of z2 w.r.t x2: tensor([2., 4., 2.])



In [40]:
import torch

print("--- Dynamic Computation Graph (PyTorch) ---")

def perform_dynamic_computation(input_tensor):
    # The graph is built step-by-step as these lines execute.
    # input_tensor needs requires_grad=True to track operations for gradients.

    print(f"\nInput Tensor: {input_tensor.tolist()}")

    # Example of dynamic behavior:
    # The path taken (and thus the graph structure) depends on the input_tensor's sum.
    if input_tensor.sum() > 10:
        intermediate_result = input_tensor * 2 # Operation 1: multiplication
        final_result = intermediate_result + 5 # Operation 2: addition
        print("Path A taken: (input * 2) + 5")
    else:
        intermediate_result = input_tensor ** 2 # Operation 1: squaring
        final_result = intermediate_result - 3 # Operation 2: subtraction
        print("Path B taken: (input^2) - 3")

    print(f"Intermediate result: {intermediate_result.tolist()}")
    print(f"Final result: {final_result.tolist()}")

    # We can perform backward pass immediately after computation
    # For a non-scalar output, we need to provide a gradient argument to backward()
    # torch.ones_like(final_result) means we want gradients for all elements of final_result
    final_result.backward(torch.ones_like(final_result))

    # The gradients are now available in input_tensor.grad
    print(f"Gradients w.r.t input_tensor: {input_tensor.grad.tolist()}")
    print("---")

# Scenario 1: Input leads to Path A
x1 = torch.tensor([4.0, 3.0, 5.0], requires_grad=True) # Sum = 12 > 10
perform_dynamic_computation(x1)
# Expected gradients for Path A: dz/dx = 2 for each element.

# Scenario 2: Input leads to Path B
x2 = torch.tensor([1.0, 2.0, 1.0], requires_grad=True) # Sum = 4 < 10
perform_dynamic_computation(x2)
# Expected gradients for Path B: dz/dx = 2*x for each element.
# For x=[1,2,1], gradients should be [2*1, 2*2, 2*1] = [2, 4, 2]

print("\n--- End of Dynamic Graph Example ---")

--- Dynamic Computation Graph (PyTorch) ---

Input Tensor: [4.0, 3.0, 5.0]
Path A taken: (input * 2) + 5
Intermediate result: [8.0, 6.0, 10.0]
Final result: [13.0, 11.0, 15.0]
Gradients w.r.t input_tensor: [2.0, 2.0, 2.0]
---

Input Tensor: [1.0, 2.0, 1.0]
Path B taken: (input^2) - 3
Intermediate result: [1.0, 4.0, 1.0]
Final result: [-2.0, 1.0, -2.0]
Gradients w.r.t input_tensor: [2.0, 4.0, 2.0]
---

--- End of Dynamic Graph Example ---
