# Automatic Differentiation with PyTorch

## Objective
As a continuation to the theory session, we will explore the powerful concept of automatic differentiation using PyTorch. We will start by creating a tensor and applying a series of transformations to obtain a scalar value. Through this process, we will explore how PyTorch tracks operations for gradient computation, utilize the backward method to compute gradients, and manually calculate gradients to validate your results.

In [1]:
## Import relevant libraries
import torch 
import matplotlib.pyplot as plt

Before proceeding with this exercise read about Autograd from the pytorch documentation

[Autograd](https://pytorch.org/tutorials/beginner/former_torchies/autograd_tutorial.html)

## Create a tensor

In [2]:
# Create a tensor 'x' with 21 ( 20 + 1 to include the end value) evenly spaced values from -5 to 5,
# Remember to enable it's gradient tracking  

x = None

# Print the value of 'x'
x

In [3]:
# What is the shape of x? 


In [4]:
# Create a tensor y that is the square of x
y = None

# Print y
y

In [5]:
# create tensor z that is adds 5 to y

z = None

# Print z
z

In [6]:
# Create a tensor that sums all values of z
l = None

# Print l
l

In [7]:
# We plot z against x. Note that we use .detach() which removes the returned tensor from the graph
# plt.plot(x.detach(), z.detach())

The grad function of a tensor links the tensor to the function that created it. But how is this useful? Read the documentation and provide your answer below.

ans: 

In [8]:
# print y's grad function


In [9]:
# print z's grad function


In [10]:
# print l's grad function


The variables $x$, $y$, $z$ and $l$ are nodes in a directed acyclic graph. 

$l\longrightarrow z\longrightarrow y \longrightarrow x$ . $l$ is the root node. $x$ is the leaf node.
* $l$ is connected to $z$ by $l$'s grad function: SumBackward
* $z$ is connected to $y$ by $z$'s grad function: AddBackward
* $y$ is connected to $x$ by $y$'s grad function: PowBackward

To learn more about the computational graphs that get created read : https://pytorch.org/tutorials/beginner/former_torchies/autograd_tutorial.html


To view the gradient vector of a tensor use : `tensor.grad`. Note only the input vectors (leaf nodes) store gradients. If you wish to record the gradients of y and z you can add a .`retain_grad()` to the tensor. 

In [None]:
# what is the gradient vector of x currently?
print(None)

Before computing the gradients using pytorch, can we derive the gradients using calculus ? 

We are interested in computing the gradients of each element of the $\mathbf{x}$ tensor i.e.

$$ \frac{\partial l}{\partial x_i} = ? $$ 

Applying the chain rule (refer the theory session videos)

$$ \frac{\partial l}{\partial x_i} = \frac{\partial l}{\partial z_i} \cdot \frac{\partial z_i}{\partial y_i} \cdot \frac{\partial y_i}{\partial x_i}$$ 

What is the gradient of $l$ with respect to $\mathbf{z}$ ? Since $l$ is a scalar and $\mathbf{z}$ is a vector, let's think about computing the  gradient w.r.t. to an element of $z$ namely, $z_i$. If $z_i$ changes by a small amount, $l$ (the sum) will also change by the same amount. Therefore,

$$ \frac{\partial l}{\partial z_i} = 1 $$  


Next lets focus on the derivative of $z_i$ w.r.t to corresponding element in y, namely $y_i$. 
$$ \frac{\partial z_i}{\partial y_i} = ? $$ 

then, 
$$ \frac{\partial y_i}{\partial x_i} = ? $$

Combining all together we get: 

$$ \frac{\partial l}{\partial x_i} = ? $$ 

Let's now use pytorch to back propagate and check if the gradients match with our derivation. 


In [12]:
# lets compute the gradient of l with respect to x, using .backward() method



# display the gradients of x



Verify that the gradients match the computed partial derivative

In [13]:
x_grad_derived = None

# print(x.grad == x_grad_derived)

In [14]:
# Here we plot x with x.grad, does it match your expectation?  

# plt.plot(x.detach(), x.grad.detach())

As seen in this exercise Autograd simplifies the process of computing gradients, which are essential for updating model parameters during training. By automatically handling the complex chain rule of calculus, autograd allows for efficient and accurate gradient computation, enabling the optimization algorithms to adjust weights and biases effectively.