# Autograd concepts

Heart of a Neural network is its backpropogation task. This is where the loss is fixed backwards by calculating gradient for each of the tensors within the network. Thankgod these gradients (all the partical derivaties of a function, in chain fashion for each layers in the network).

**Autograd** :
In short, autograd does this gradient calculation for us automatically and store all the partical derivates in a vector called Jacobian-Vector. To understand the working we must monitor following autograd attributes for each node: 


1.   **tensor**
2.   **requires_grad**
3.   **grad_fn**
4.   **grad**
5.   **is_leaf**   



In [9]:
# STEP : 1

import torch
# The autograd package provides automatic differentiation 
# for all operations on Tensors

# requires_grad = True -> tracks all operations on the tensor. 
x = torch.randn(2, 3, requires_grad=True)
y = x + 2

print(x.dtype)
print(x) # OR
#print(x.requires_grad)
print(y) # OR
#print(y.grad_fn)

torch.float32
tensor([[-0.9890,  1.3430, -1.3031],
        [-0.9375, -0.1760,  0.2672]], requires_grad=True)
tensor([[1.0110, 3.3430, 0.6969],
        [1.0625, 1.8240, 2.2672]], grad_fn=<AddBackward0>)


If you notice above,

`require_grad` is set to `True` for tensor `x`, whereas
tensor `y` is created due to an `add` operation and hence it has `AddBackward` grad_fn.

In [10]:
# STEP : 2 

# More attributes about tensors in action. 
print(x.shape)
print(y.shape)
print(x.is_leaf)
print(y.is_leaf)

torch.Size([2, 3])
torch.Size([2, 3])
True
False


In [11]:
# STEP : 3

# More operations on y
z = y * 5
print(z.shape)
print(z.grad_fn)
print(z.requires_grad)

print('====== Post Mean ======')
zz = z.mean()
print(zz.shape)
print(zz.grad_fn)
print(zz.requires_grad)

torch.Size([2, 3])
<MulBackward0 object at 0x7f81ea1e0320>
True
torch.Size([])
<MeanBackward0 object at 0x7f82386f8da0>
True


In [4]:
# STEP : 4

# Let's compute the gradient with builtin backpropogation
zz.backward()

# The gradient for this tensor will be accumulated into .grad attribute.
print(zz.grad)

# zero_grad clears old gradients from the previous step (otherwise you’d just accumulate the gradients from all loss.backward() calls.
x.grad.zero_()

None


tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [8]:
# STEP : 5 (if STEP : 4 executed, re-run STEP: 1,2,3)

# But, following FAILS
z.backward()

RuntimeError: ignored

In [12]:
# STEP : 6 (if STEP : 4 or 5 executed, re-run STEP: 1,2,3)

# Whereas, z has requires_grad set.
# This is because you need .grad can store scalar value and due to mean zz has become scalar, whereas
# z has shape (2,3).
# To solve such problems:
v = torch.empty(2,3, dtype=torch.float32)
# NOTE : re-run all the above blocks except block #5.
z.backward(v)

# this is important! It affects the final weights & output
x.grad.zero_()

tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [13]:
# Let's check the stored gradients. 
print(x.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])


## Following three ways to control weights to update or not during training cycle. 
1. `weight.requires_grad_(False/True)`
2. `weight.detach()`
3. within `with` statement as `with torch.no_grad()`


In [14]:
weights = torch.randn(3, requires_grad=True)

for epoch in range(3):
    model = (weights * 3).mean()
    model.backward()
    
    print(weights.grad)
    with torch.no_grad():
        weights -= 0.001 * weights.grad

    # zero_grad clears old gradients from the previous step.
    weights.grad.zero_()

print(weights)
print(model)


tensor([1., 1., 1.])
tensor([1., 1., 1.])
tensor([1., 1., 1.])
tensor([ 0.7771,  0.5906, -1.4751], requires_grad=True)
tensor(-0.1044, grad_fn=<MeanBackward0>)
