## Automatic differentiation with torch.autograd

Link: https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html

In [1]:
import torch
import numpy as np
import matplotlib.pyplot as plt

  from .autonotebook import tqdm as notebook_tqdm


### Differentiating a scalar wrt to a scalar


Let $x_4$ be a variable (Torch tensor), and let $y_4 = 4 x_4^2$.

We will use PyTorch to compute the derivative of $y_4$ with respect to $x_4$, which is equal to $8 x_4$

In [3]:
x4 = torch.tensor(4.0)

print(x4)

y4 = 4.0*(x4**2)
print(y4)
y4.backward()  # Won't work because gradients wrt to x4 not being tracked

tensor(4.)
tensor(64.)


RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

To compute the gradients of a function with respect to a tensor, we need to set ``requires_grad`` to True

Note that ``y5`` now has an object storing the back propagation function.

Gradient can be evaluated using the ``.backward()`` function. The values can be retrieved using ``.grad``

Taking the gradient of two seperate tensors (computational graphs) with respect to the same tensor will augment (add) the gradients

In [4]:
x5 = torch.tensor(4.0,requires_grad=True)
print(x5)

y5 = 4*x5**2
print(y5)
y5.backward()
print(x5.grad)

t5 = 3*x5**3
print(t5)
t5.backward()
print(x5.grad) # 8 x + 9 x^2 = 32 + 144

tensor(4., requires_grad=True)
tensor(64., grad_fn=<MulBackward0>)
tensor(32.)
tensor(192., grad_fn=<MulBackward0>)
tensor(176.)


### Multiple independent variables

Let $y_6 = 4x_6^2 + z_6^3$

We can even compute partial derivatives of $y_6$ with respect to $x_6$ and $z_6$ using PyTorch!!

Note:
1. $\partial y_6 / \partial x_6 = 8 x_6$
2. $\partial y_6 / \partial z_6 = 3 z_6^2$


In [5]:
x6 = torch.tensor(4.0,requires_grad=True)
z6 = torch.tensor(3.0,requires_grad=True)
y6 = 4*x6**2 + z6**3
print(y6)
y6.backward()
print(x6.grad)
print(z6.grad)

tensor(91., grad_fn=<AddBackward0>)
tensor(32.)
tensor(27.)


### We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True. For all other nodes in our graph, gradients will not be available.

Let $y_7 = z_7^2$ and $z_7 = 2x_7^2$


In [9]:
x7 = torch.tensor(3.0,requires_grad=True)
z7 = 2*x7**2 # Not a leaf node
z7.retain_grad()  # Un-comment this if you want the gradients of z7 to be available

print(x7.is_leaf)
print(z7.is_leaf)
y7 = z7**2
print(f"y_7 =  {y7}")

print('Try to print gradient before backward')
print(x7.grad)


y7.backward()  # 16 x7^3


print('Try to print gradient after backward')
print(x7.grad)
print('Try to print gradient of z7 after backward')
print(z7.grad) # No gradients available since this is not a lead node

True
False
y_7 =  324.0
Try to print gradient before backward
None
Try to print gradient after backward
tensor(432.)
Try to print gradient of z7 after backward
tensor(36.)


### Calculating second order derivatives requires saving older gradients. For this, we use ``torch.autograd.grad`` (https://pytorch.org/docs/stable/generated/torch.autograd.grad.html?highlight=autograd%20grad#torch.autograd.grad)

### Note that we need to set ``create_graph=True`` to evaluate higher order derivates.

### Also with this method, gradients are not stored as tensor attributes

In [None]:
x8 = torch.tensor(4.0,requires_grad=True)
print('x = {}'.format(x8))   

y8 = 4*x8**3
print('y = {}'.format(y8))

# y8.backward()
# print(x8.grad)

first_der = torch.autograd.grad(y8,x8,create_graph=True)[0] # You need to extract first element of tuple
print(r'First derivative = {}'.format(first_der))
print(x8.grad) # No grads linked

second_der = torch.autograd.grad(first_der,x8,create_graph=True)[0] # You need to extract first element of tuple
print(r'Second derivative = {}'.format(second_der))
print(x8.grad) # No grads linked

third_der = torch.autograd.grad(second_der,x8)[0] # Graph was not saved when creating second_der, so can't evaluate another gradient
print('Third derivative = {}'.format(third_der))

### Vector calculus using Torch 


Let,

$\mathbf{x} = \begin{Bmatrix} x_1 \\ x_2 \end{Bmatrix}$ and $ \mathbf{y} = f(\mathbf{x}) = \mathbf{A}\mathbf{x} + g(\mathbf{x}) \implies \begin{Bmatrix} y_1 \\ y_2 \end{Bmatrix} =  \begin{bmatrix} a_{11} & a_{12}  \\ a_{21} & a_{22} \end{bmatrix} \begin{Bmatrix} x_1 \\ x_2 \end{Bmatrix} + \begin{Bmatrix} x_1^2 \\ x_2^2 \end{Bmatrix}$

Jacobian $ \mathbf{J} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} \end{bmatrix} = \begin{bmatrix} a_{11} + 2x_1 & a_{12}  \\ a_{21} & a_{22} + 2x_2 \end{bmatrix}$


Torch can be used to compute vector-Jcobian products, i.e., evaluate $ \mathbf{J}^{\mathrm{T}} \mathbf{v} $ given some vector $\mathbf{v}$. 


Let:

$x = \begin{Bmatrix} x_1 \\ x_2 \end{Bmatrix} = \begin{Bmatrix} 2 \\ 3 \end{Bmatrix}$

$\mathbf{A} = \begin{bmatrix} a_{11} & a_{12}  \\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} 1 & 2  \\ -1 & 3 \end{bmatrix}$ 

$\mathbf{J} = \begin{bmatrix} a_{11} + 2x_1 & a_{12}  \\ a_{21} & a_{22} + 2x_2 \end{bmatrix} = \begin{bmatrix} 1 + 2\times2 & 2  \\ -1 & 3 + 2 \times 3 \end{bmatrix} = \begin{bmatrix} 5 & 2  \\ -1 & 9 \end{bmatrix}$

$\mathbf{v} = \begin{Bmatrix} 1 \\ 1 \end{Bmatrix}$ 

$ \mathbf{J}^{\mathrm{T}} \mathbf{v} = \begin{bmatrix} 5 & 2  \\ -1 & 9 \end{bmatrix}^\mathrm{T} \begin{Bmatrix} 1 \\ 1 \end{Bmatrix} = \begin{bmatrix} 5 & -1  \\ 2 & 9 \end{bmatrix} \begin{Bmatrix} 1 \\ 1 \end{Bmatrix} = \begin{Bmatrix} 4 \\ 11 \end{Bmatrix}$

In [None]:
x = torch.tensor([[2.0],[3.0]],requires_grad=True)
A = torch.tensor([[1.0,2.0],[-1.0,3.0]])
y = torch.matmul(A,x) + x**2
print('x = {}'.format(x))
print('y = {}'.format(y))
dydx = torch.autograd.grad(y,x,grad_outputs=torch.ones_like(x))
print('dydx = {}'.format(dydx))

### Taking pointwise derivatives.

### grad can be implicitly created only for scalar outputs. If this is not the case, grad_outputs needs to be specified.



In [None]:
x9 = torch.linspace(0,2*np.pi,5,requires_grad=True)  ### Points at which you want to evaluate the residue
print(x9)

y9 = torch.sin(x9)      ### Outputs from your PINN
print(y9)

dydx = torch.autograd.grad(y9,x9)[0]

### Assume $x \in \mathbb{R}^n$, $y\in \mathbb{R}^n, y_i = f(x_i)$. Thus the Jacobian is $\frac{Dy}{Dx} = diag(\frac{dy_1}{dx_1},...,\frac{dy_n}{dx_n})$

### Define the loss $\ell = \sum_{i=1}^n y_i$. Then

##### $ \frac{d\ell}{dx} = (\frac{dy_1}{dx_1},...,\frac{dy_n}{dx_n})^\top$ is what we want.

##### $ \frac{d\ell}{dy} = (1,...,1)^\top$ is going to be ``grad_outputs``

##### $ \frac{d\ell}{dx} = \frac{dy}{dx}\frac{d\ell}{dy}$

##### Also see (https://towardsdatascience.com/pytorch-autograd-understanding-the-heart-of-pytorchs-magic-2686cd94ec95)


In [None]:
x9 = torch.linspace(0,2*np.pi,5,requires_grad=True)
y9 = torch.sin(x9)
dydx = torch.autograd.grad(y9,x9,grad_outputs=torch.ones_like(y9),create_graph=True)[0]
print(dydx)


z9 = torch.cos(x9)
print(z9)


### If gradients are attached to a tensor, it needs to be detached using .detach() before the numpy array can be extracted

In [None]:
plt.figure()
#plt.plot(x9.numpy(),y9.numpy()) # This would give an error
plt.plot(x9.detach().numpy(),y9.detach().numpy(),label='y(x)')
plt.plot(x9.detach().numpy(),dydx.detach().numpy(),label='y\'(x)')
plt.legend()

### Disabling gradient tracking

### By default, all tensors with ``requires_grad=True`` are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with ``torch.no_grad()`` block

In [None]:
x10 = torch.tensor(4.0,requires_grad=True)
y10 = 4*x10**2
print(y10.requires_grad)

with torch.no_grad():
  z10 = 4*x10**2  # gradients will not be tracked
  print(z10.requires_grad)
  print(y10.requires_grad)
  
print(y10.requires_grad)

### Differentiating with respect to components of an input

In [None]:
xy = torch.linspace(0,2,20,requires_grad=True).reshape(2,-1).T
#print('xy = {}'.format(xy))

yval = xy[:,0]**2 + xy[:,1]**3
#print('yval = {}'.format(yval))

# Method 1
zval = torch.sum(yval)
#print('zval = {}'.format(zval))
dydx = torch.autograd.grad(zval,xy, retain_graph=True)[0]
print(dydx)
print(dydx.shape)

# Method 2
dydx_ = torch.autograd.grad(yval, xy, grad_outputs=torch.ones_like(yval), create_graph=True)[0]

if torch.allclose(dydx, dydx_):
    print('Both are same')
