### _Automatic differentiation_ 
- Builds a computational graph and tracks how the values depend upon each other
- Backpropagation : Computational algorithmn applying chain rule in autodifferentation. 

In [1]:
import torch

Differentiation of $y = 2x^Tx$ wrt to $x$. 

In [38]:
# Initialization
x = torch.arange(0, 5, dtype = torch.float32) # Place to store it and gradient will be of same shape. 
x.requires_grad_(True)
x.grad # Place gradient is stored

In [39]:
#* The gradient is stored in the x.grad variable and accumulated there. 
y = 2 * torch.dot(x,x) # y = 2 * x^T * x

# Calculating the gradient
y.backward() # This is where the gradient is calculated
x.grad

# Resetting the gradient
x.grad.zero_()


tensor([0., 0., 0., 0., 0.])

In [40]:
x = torch.arange(0, 5, dtype = torch.float32)
x.requires_grad_(True)
y = x.sum()
y.backward() # Kind of like jacobian
x.grad

tensor([1., 1., 1., 1., 1.])

In essence, the code calculates the derivative of $y = x0 + x1 + x2 + x3 + x4$ with respect to each $x_i$, which is simply 1 for all i.

In [48]:
x = torch.arange(0, 5, dtype = torch.float32)
x.requires_grad_(True)
y = x*x
y.backward(torch.ones_like(x))
x.grad


tensor([0., 2., 4., 6., 8.])

In [70]:
# Detaching computations : For auxillary intermediate variables
x.grad.zero_()
y = x * x
u = y.detach() # Detaching from computational graph & stops gradient tracking
z = u * x # treats u as a constant
z.sum().backward() 
x.grad == u

tensor([True, True, True, True, True])

In [85]:
# Maze of control flow logic

def f(a) : 
    b = a * 2
    while b.norm() < 1000 :
        b = b * 2
    if b.sum() > 0 :
        c = b
    else : 
        c = 100 * b
    return c

a = torch.randn(1, requires_grad = True)
d = f(a)
d.backward()

a.grad == d/a # Intereseting, didn't understand.


tensor([True])

In [94]:
# Question 4
x = torch.arange(3, requires_grad = True, dtype = torch.float32)
y = torch.sin(x)
y.sum().backward()
x.grad == torch.cos(x) # Gradient of sin(x) is cos(x)

tensor([True, True, True])

In [102]:
# Question 5
x = torch.arange(5, requires_grad = True, dtype = torch.float32)
y = torch.dot(torch.log(x*x), torch.sin(x)) + 1/x
y.sum().backward()
x.grad

tensor([     nan,   7.4147,   1.4120, -10.5169, -11.0159])

### 3 Rules 
- Attach gradients to those with vars wrt to which we desire derivatives
- Record the computation of target values
- Execute backpropagation function
- Access the resulting gradient

### Functions
1. `x.requires_grad_(True)` : For tracking into the computational graph
2. `y.backward()` : $\frac{dy}{dx}$
3. `u = y.detach()` : Detach from the computational graph and treat u as constant
4. `x.grad` : Place where the gradients are stored
5. `x.grad.zero_()` : Reset the gradient to zero

### Tasks 
- Understand in more depth how backpropagation works
  - Using some complicated equations create.
- Understand how the computational graph is created
- Create your own automatic gradient engine
- What is the difference between forward and backward differentiation
