 ### <font color="yellow">Automatic Differentiation with `torch.autograd`</font>
Autograd is a PyTorch library that implements Automatic Differentiation. It uses the graph structure to compute gradients and allows the model to learn by updating its parameters during training. Autograd also provides a way to compute gradients with respect to arbitrary scalar values, which is useful for tasks such as optimization.

#### <font color="yellow">Example 1</font>

In [8]:
 # Import the torch library
import torch

# Assign any value for x as tensor form
# Set requires_grad=True So,
# that autograd will record the operations
x=torch.tensor(7.0,requires_grad=True)

# Define the equation
f = (x**2)+3

# Differentiate using torch
#Uses backward function to compute the gradient value
f.backward()

# Print the derivative value
# of y i.e dy/dx = 2x  = 2 X 7.0 = 14.
print(x.grad)

tensor(14.)


 #### <font color="yellow">Example 2</font>

In [39]:
# import the library
import torch

# Assign the input variable
x = torch_input=torch.tensor([[1.0,2.0,3.0],
                              [4.0,5.0,6.0],
                              [7.0,8.0,9.0]],requires_grad=True)

# define the function
def f(x):
    return (x**3) + 7*(x**2) + 5*x + 10

# Assign the sum to another variable z
# Because torch.autograd.grad() works only for scalar input
z=f(x).sum()

# Compute the gradient
z.backward()

# Find the gradient value
print(x.grad)

# Calculation by hand
print(3*x**2+14*x+5)

tensor([[ 22.,  45.,  74.],
        [109., 150., 197.],
        [250., 309., 374.]])
tensor([[ 22.,  45.,  74.],
        [109., 150., 197.],
        [250., 309., 374.]], grad_fn=<AddBackward0>)


 #### <font color="yellow">Example 3: A simple linear regression model</font>

When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called `torch.autograd`. It supports automatic computation of gradient for any computational graph.

Consider the simplest one-layer neural network, with input `x`, parameters `w` and `b`, and some loss function. It can be defined in PyTorch in the following manner:

In [27]:
import torch

# Define Input variables
x = torch.tensor(5., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Define the graph structure
# Forward pass
y = w * x + b

# Backward pass
y.backward()

#View the outputs
print("Gradient of x:", x.grad)
print("Gradient of b:", b.grad)
print("Gradient of w:", w.grad)

# Update parameters
w.data -= 0.01 * w.grad.data
b.data -= 0.01 * b.grad.data

y1 = torch.zeros(3)  # expected output
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y1)

print("y:",y)
print("Updated b:", b.data)
print("Updated W:", w.data)
print("Loss:",loss)

Gradient of x: tensor(2.)
Gradient of b: tensor(1.)
Gradient of w: tensor(5.)
y: tensor(13., grad_fn=<AddBackward0>)
Updated b: tensor(2.9900)
Updated W: tensor(1.9500)
Loss: tensor(3.1430, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)


In [5]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x7f7c99ccae60>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f7c99cca770>


#### <font color="yellow">Example 4: A Simple Neural Network with a Single Hidden Layer </font>

In [40]:
# Import the torch library
import torch

# Input
x = torch.randn(1, 10, requires_grad=True)
w1 = torch.randn(10, 5, requires_grad=True)
b1 = torch.randn(5, requires_grad=True)

# Forward pass
h = x @ w1 + b1
h = torch.relu(h)

w2 = torch.randn(5, 1, requires_grad=True)
b2 = torch.randn(1, requires_grad=True)
y = h @ w2 + b2

# Backward pass
y.backward()

#view the outputs
print("Gradient of w1:", w1.grad)
print("Gradient of b1:", b1.grad)
print("Gradient of w2:", w2.grad)
print("Gradient of b2:", b2.grad)

# Update parameters
w1.data -= 0.01 * w1.grad.data
b1.data -= 0.01 * b1.grad.data
w2.data -= 0.01 * w2.grad.data
b2.data -= 0.01 * b2.grad.data

Gradient of w1: tensor([[-0.4414, -0.0000,  0.5173, -0.1474, -0.1068],
        [ 0.9661,  0.0000, -1.1324,  0.3228,  0.2339],
        [ 0.2137,  0.0000, -0.2505,  0.0714,  0.0517],
        [ 0.3215,  0.0000, -0.3769,  0.1074,  0.0778],
        [-1.2320, -0.0000,  1.4441, -0.4116, -0.2982],
        [ 1.8425,  0.0000, -2.1597,  0.6155,  0.4460],
        [-2.0603, -0.0000,  2.4149, -0.6883, -0.4987],
        [-1.9953, -0.0000,  2.3388, -0.6666, -0.4830],
        [-0.4793, -0.0000,  0.5619, -0.1601, -0.1160],
        [-0.2604, -0.0000,  0.3053, -0.0870, -0.0630]])
Gradient of b1: tensor([ 0.8793,  0.0000, -1.0307,  0.2938,  0.2128])
Gradient of w2: tensor([[0.0359],
        [0.0000],
        [5.0725],
        [4.4821],
        [1.4200]])
Gradient of b2: tensor([1.])


#### <font color="yellow">Example 5 </font>

Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function.

To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:

In [42]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.1484, 0.0223, 0.0146],
        [0.1484, 0.0223, 0.0146],
        [0.1484, 0.0223, 0.0146],
        [0.1484, 0.0223, 0.0146],
        [0.1484, 0.0223, 0.0146]])
tensor([0.1484, 0.0223, 0.0146])


#### Disabling Gradient Tracking
By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad() block:

In [43]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


Another way to achieve the same result is to use the detach() method on the tensor:

In [44]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


#### <font color="yellow">Example 6 </font>

In [54]:
import torch.nn as nn

# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10,3)
y = torch.randn(10,2)

# Build a fully connected layer
linear = nn.Linear(3,2)
print("w:", linear.weight)
print("b:", linear.bias)

# Build loss function and optimizer.
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr = 0.01)

# Forward Pass
pred = linear(x)

# Compute loss
loss = criterion(pred, y)
print("Loss:", loss.item())

# Backward pass
loss.backward()
print("dL/dw:", linear.weight.grad)
print("dL/db:", linear.bias.grad)

# 1-Step gradient descent
optimizer.step()

# You can also perform gradient descent at the low level.
# linear.weight.data.sub_(0.01 * linear.weight.grad.data)
# linear.bias.data.sub_(0.01 * linear.bias.grad.data)
# This is an in-place subtraction operation in PyTorch. It subtracts the value from linear.bias.data directly.

# Loss after 1-step gradient descent.
pred = linear(x)
loss = criterion(pred, y)
print("Loss after 1 step optimization:", loss.item())

w: Parameter containing:
tensor([[ 0.4090, -0.1328,  0.2428],
        [ 0.0499, -0.3968, -0.1845]], requires_grad=True)
b: Parameter containing:
tensor([-0.0601, -0.0259], requires_grad=True)
Loss: 0.9306974411010742
dL/dw: tensor([[-0.0604,  0.5519,  0.0706],
        [-0.1768, -0.4575, -0.0654]])
dL/db: tensor([-0.5108,  0.0952])
Loss after 1 step optimization: 0.9224470257759094


### Same example in simple form using for loop

In [63]:
import torch.nn as nn

# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10,3)
y = torch.randn(10,2)

# Build a fully connected layer
linear = nn.Linear(3,2)

# Build loss function and optimizer.
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr = 0.01)

epochs = 10

for epoch in range(epochs):
    pred = linear(x)
    loss = criterion(pred, y)
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item()}")
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

print("Final Loss after all epochs:", loss.item())

Epoch [1/10], Loss: 1.728887915611267
Epoch [2/10], Loss: 1.699350118637085
Epoch [3/10], Loss: 1.6708132028579712
Epoch [4/10], Loss: 1.6432403326034546
Epoch [5/10], Loss: 1.6165964603424072
Epoch [6/10], Loss: 1.5908480882644653
Epoch [7/10], Loss: 1.5659624338150024
Epoch [8/10], Loss: 1.541908621788025
Epoch [9/10], Loss: 1.518656611442566
Epoch [10/10], Loss: 1.4961774349212646
Final Loss after all epochs: 1.4961774349212646
