# KKUI CIT Course - Neural networks - Week_02 - Autograd PyTorch

## A Gentle Introduction ``to torch.autograd``
[A Gentle Introduction to (PyTorch website)](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)

``torch.autograd`` is PyTorch’s automatic differentiation engine that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train.

Background

Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

**Forward Propagation:** In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

**Backward Propagation:** In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent. For a more detailed walkthrough of backprop.

In [168]:
import torch
# The autograd package provides automatic differentiation
# for all operations on Tensors

In [169]:
# requires_grad = True -> tracks all operations on the tensor.
x = torch.randn(3, requires_grad=True)

In [170]:
# Enable gradient tracking for tensor x
# x.requires_grad_(True)

# Define a new tensor y as a result of an operation on x
y = x + 2

# Since y is created as a result of an operation, it has a grad_fn attribute
# grad_fn references a Function that has created the tensor
print(f"x -> {x}")  # created by the user -> grad_fn is None
print(f"y -> {y}")

# Perform more operations on y
z = y * y * 3
print(f"z -> {z}")

# Calculate the mean of z
z = z.mean()
print(f"z sum -> {z}")

# Perform backpropagation to compute the gradients
# When computation finishes, call .backward() to compute gradients automatically
# The gradient for this tensor will be accumulated into .grad attribute
# It represents the partial derivative of the function with respect to the tensor
z.backward()
print(f"x grad -> {x.grad}")

# Detach tensor x from the computational graph to demonstrate example of accumulating gradient
# x = x.detach()

# Generally speaking, torch.autograd is an engine for computing vector-Jacobian product
# It computes partial derivatives while applying the chain rule


x -> tensor([-0.8917, -0.6851,  1.7303], requires_grad=True)
y -> tensor([1.1083, 1.3149, 3.7303], grad_fn=<AddBackward0>)
z -> tensor([ 3.6850,  5.1870, 41.7449], grad_fn=<MulBackward0>)
z sum -> 16.872297286987305
x grad -> tensor([2.2166, 2.6298, 7.4606])


In [171]:
# Create a tensor 'x' of shape (3,) with requires_grad set to True to enable gradient tracking
x = torch.randn(3, requires_grad=True)

# Perform a series of operations on 'x' to obtain tensor 'y'
# 'y' will have the same shape as 'x' and will contain the result of the operations
y = x * 2  # Multiply 'x' by 2
for _ in range(10):
    y = y * 2  # Multiply 'y' by 2 repeatedly, resulting in a non-scalar output

# Print the tensor 'y' and its shape
print(y)
print(y.shape)

# Create a tensor 'v' representing the gradient with respect to 'y'
# This tensor specifies how much each element of 'y' contributes to the final gradient
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32)

# Use backward() with an additional 'gradient' argument to compute gradients of non-scalar outputs
# 'v' acts as the gradient argument here, indicating the gradients that should be backpropagated
y.backward(v)

# Print the gradients of 'x' after backpropagation
print(x.grad)

tensor([  645.5250, -1758.2660,  1596.3779], grad_fn=<MulBackward0>)
torch.Size([3])
tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])


In [172]:
# Check if requires_grad is enabled for tensor 'a'
a = torch.randn(2, 2)
print(a.requires_grad)

# Perform operations on tensor 'a' to create tensor 'b'
b = ((a * 3) / (a - 1))

# Print the gradient function of tensor 'b'
# Since 'b' is a result of operations on 'a', it will have a gradient function
print(b.grad_fn)

# Enable requires_grad for tensor 'a' using requires_grad_()
a.requires_grad_(True)
print(a.requires_grad)

# Perform operations on tensor 'a' to create tensor 'b'
b = (a * a).sum()

# Print the gradient function of tensor 'b'
# Since 'b' is a result of operations on 'a', it will have a gradient function
print(b.grad_fn)

# Detach tensor 'a' to create tensor 'b' without gradient computation
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
b = a.detach()
print(b.requires_grad)

# Wrap the operation in 'with torch.no_grad()' to temporarily disable gradient tracking
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
with torch.no_grad():
    print((a ** 2).requires_grad)


False
None
True
<SumBackward0 object at 0x11e9aae80>
True
False
True
False


In [173]:
# Final example
# Create a tensor 'weights' with requires_grad enabled for optimization
weights = torch.ones(4, requires_grad=True)

# Run training for multiple epochs
for epoch in range(3):
    # Just a dummy example: compute model output (sum of weights multiplied by 3)
    model_output = (weights * 3).sum()

    # Perform backpropagation to compute gradients
    model_output.backward()

    # Print the gradients of weights
    print(weights.grad)

    # Optimize the model by adjusting weights using gradient descent
    # Update weights using gradient descent formula: new_weights = old_weights - learning_rate * gradient
    with torch.no_grad():
        weights -= 0.1 * weights.grad

    # Important step: Empty the gradients before the next optimization step to avoid accumulation
    weights.grad.zero_()

# After training, print the final weights and model output
print(weights)
print(model_output)

# Note: Optimizers provided by torch.optim automatically handle gradient updates and zeroing gradients
# Example usage:
# optimizer = torch.optim.SGD([weights], lr=0.1)
# During training loop:
# optimizer.step()  # Update weights based on gradients
# optimizer.zero_grad()  # Clear gradients for the next iteration


tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([0.1000, 0.1000, 0.1000, 0.1000], requires_grad=True)
tensor(4.8000, grad_fn=<SumBackward0>)
