# Module 2 - Working with Gradients in PyTorch

Last unit we talked about some of the operations that occur during a forward pass in a neural network. However, a backward pass involves a different set of important mathematical operations - particularly those that enable the neural network to "learn". 

In particular, a backward pass involves calculating the gradient of the loss function with respect to the weights of the network using a technique called _gradient descent_. The gradients are calculated layer by layer, starting from the output layer and moving backward towards the input layer. The gradients are then used to update the weights of the network through an optimization algorithm which iteratively adjusts the weights in the direction that minimizes the loss function.

Minimizing the loss function updates the neural network in a manner that iterally minimizes error on it's chosen task, and is what we commonly call "learning". 

## Autograd in PyTorch

Autograd is one of the core features of PyTorch and a big factor in its popularity as a deep learning library. Autograd makes it easy to compute and store gradients - which makes PyTorch very intuitive and flexible for researchers and practitioners familiar with deep learning operations. 

In [1]:
import torch

# Create a torch tensor with `requires_grad=True`
t = torch.tensor([1.0, 5.0, 10.0], requires_grad=True)

# A miscellaneous operation simulating "forward propagation"
tensor_sum = t.sum()

# Perform back propagation and view gradients
tensor_sum.backward()
t.grad

tensor([1., 1., 1.])

PyTorch stores these gradients using a computational directed acyclic graph (DAG). When working with gradients, we need to be mindul of the operations we perform - as some operations aren't won't be supported using PyTorch autograd. 

In [2]:
t.numpy()

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

In order to convert the tensor to a NumPy array, we can see that we need to use the `detach` method. This is because autograd isn't supported by NumPy - and therefore any computations we perform on the tensor once converted to a NumPy array can't be tracked by autograd.

In [3]:
t.detach().numpy()

array([ 1.,  5., 10.], dtype=float32)

## Creating Neural Network Classes in PyTorch

So far, we've just done toy examples with small tensor operations - but we haven't used one of the most prominent patterns in PyTorch: using `nn.Module`.

The `nn.Module` class in PyTorch implements a number of useful methods and attributes for us behind the scenes, allowing us to focus on the core components used to build our neural network, and the deep learning process itself. 

In [4]:
import torch
import torch.nn as nn

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(2, 3)  # Fully connected layer with input size 2 and output size 3
        self.fc2 = nn.Linear(3, 1)  # Fully connected layer with input size 3 and output size 1

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Create an instance of the neural network
model = SimpleNN()

# Define some input data
input_data = torch.tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], requires_grad=True)

# Perform a forward pass
output = model.forward(input_data)

# Print the output
print("Forward Pass Output:")
print(output)

# Perform a backward pass (compute gradients)
output.backward(torch.ones_like(output))

# Print the gradients
print("Gradients:")
print(input_data.grad)

Forward Pass Output:
tensor([[1.1551],
        [1.6131],
        [2.0110]], grad_fn=<AddmmBackward0>)
Gradients:
tensor([[ 0.0729,  0.2701],
        [-0.0900,  0.2890],
        [-0.0900,  0.2890]])


The code above constitutes _one iteration_ of model training, where we conducted 1 forward pass and 1 backward pass. In a real scenario, we'd conduct many forward and backward pass to slowly update the network weights and cause it to learn on our data.

In fact, let's update the sample above to do that right now!

## Training a Model Using PyTorch

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(2, 3)  # Fully connected layer with input size 2 and output size 3
        self.fc2 = nn.Linear(3, 1)  # Fully connected layer with input size 3 and output size 1

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Create an instance of the neural network
model = SimpleNN()

# Define some input data and corresponding target labels (ground truth)
input_data = torch.tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], requires_grad=True)
target_labels = torch.tensor([[0.0], [1.0], [0.0]])

# Define a loss function (Mean Squared Error, MSE)
criterion = nn.MSELoss()

# Define an optimizer (Stochastic Gradient Descent, SGD) to update the model's parameters
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
num_epochs = 500
for epoch in range(num_epochs):
    # Zero the gradients to prevent accumulation
    optimizer.zero_grad()
    
    # Perform a forward pass
    output = model(input_data)
    
    # Calculate the loss
    loss = criterion(output, target_labels)
    
    # Perform a backward pass (compute gradients)
    loss.backward()
    
    # Update the model's parameters using the optimizer
    optimizer.step()
    
    # Print the loss for monitoring the training progress
    if epoch % 10 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}")

Epoch 1/500, Loss: 0.7083749771118164
Epoch 11/500, Loss: 0.2700563967227936
Epoch 21/500, Loss: 0.237418994307518
Epoch 31/500, Loss: 0.2305573970079422
Epoch 41/500, Loss: 0.2284841686487198
Epoch 51/500, Loss: 0.22763872146606445
Epoch 61/500, Loss: 0.2271665334701538
Epoch 71/500, Loss: 0.22682584822177887
Epoch 81/500, Loss: 0.226541206240654
Epoch 91/500, Loss: 0.22631622850894928
Epoch 101/500, Loss: 0.22612178325653076
Epoch 111/500, Loss: 0.2259395271539688
Epoch 121/500, Loss: 0.22576852142810822
Epoch 131/500, Loss: 0.22560793161392212
Epoch 141/500, Loss: 0.2254570722579956
Epoch 151/500, Loss: 0.22531533241271973
Epoch 161/500, Loss: 0.2251824289560318
Epoch 171/500, Loss: 0.22505740821361542
Epoch 181/500, Loss: 0.2249399870634079
Epoch 191/500, Loss: 0.22482885420322418
Epoch 201/500, Loss: 0.22471646964550018
Epoch 211/500, Loss: 0.22461330890655518
Epoch 221/500, Loss: 0.22452150285243988
Epoch 231/500, Loss: 0.22443516552448273
Epoch 241/500, Loss: 0.22434568405151367

In [6]:
# After training, you can use the trained model for predictions
test_data = torch.tensor([[7.0, 8.0], [9.0, 10.0]], requires_grad=False)
with torch.no_grad():
    model.eval()  # Switch to evaluation mode
    predictions = model(test_data)
    print("Predictions:", predictions)

Predictions: tensor([[0.4025],
        [0.4386]])
