## Evaluating gradients in PyTorch

Autograd in PyTorch does auto-differentiation which make evaluating forward and backward prop easier. 

In [1]:
import torch 
import numpy as np 

In [2]:
x = torch.randn(3, requires_grad=True)
print(x)

tensor([ 0.6867, -0.3665, -1.1250], requires_grad=True)


`requires_grad` which is **False** by-default, once true will help in tracking the gradients for the particular variable. This will also allow for PyTorch to generate a _computational graph_ whenever we do operations for that variable. 

To get rid of the gradient trailing and do some operations we can do it in 3 ways: 
```python
1. x.requires_grad_(False)
2. x.detach() 
3. with torch.no_grad():
    do operations here 
```

In [3]:
y = x + 2 
print(y)

tensor([2.6867, 1.6335, 0.8750], grad_fn=<AddBackward0>)


`y` has an attribute called `grad_fn` which stores a gradient function here as the `AddBackward0` which will be used to calculate the gradient when doing the back-propogation

In [4]:
z = y * y * 2
print(z, z.size())

tensor([14.4362,  5.3369,  1.5311], grad_fn=<MulBackward0>) torch.Size([3])


In [5]:
#z = z.mean()
v = torch.tensor([0.1,1.0,0.001],dtype=torch.float64)
print(z)

tensor([14.4362,  5.3369,  1.5311], grad_fn=<MulBackward0>)


To calculate the gradient -- in this case it will be gradient of z wrt x: 
$$\nabla Z = \frac{\partial Z}{\partial X}$$

`z.backward()` is the method to calculate the gradients upto x that is the base variable 

`x.grad` is the cache where the gradients are stored 

In [6]:
z.backward(v) #dz/dx if the final function is not a scalar it should be multiplied with a vector to make one. Since the final grad is Jacobian x vector 
print(x.grad) #Here the values of the gradients are accumulated graadients with values added up 

tensor([1.0747e+00, 6.5342e+00, 3.4999e-03])


### Important: 
`.grad` or the gradient cache for a variable accumulates all the gradient evaluations for each loop. Hence it needs to be cleared after each epoch run. 

In [16]:
#Toy example
weights = torch.ones(3, requires_grad=True)
print(weights)

#Dummy training loop
for epoch in range(3):
    model_output = (weights * 3).sum() 
    print('Model ouput: {}'.format(model_output))
    model_output.backward() #dZ/dw
    print('dZ/dW :{}'.format(weights.grad))

tensor([1., 1., 1.], requires_grad=True)
Model ouput: 9.0
dZ/dW :tensor([3., 3., 3.])
Model ouput: 9.0
dZ/dW :tensor([6., 6., 6.])
Model ouput: 9.0
dZ/dW :tensor([9., 9., 9.])


As seen for each epoch the gradient `dZ/dW` gets accumulated in the `.grad` cache. Which is not what we need. Since the gradient in all the cases should be same i.e. 3. We must empty the cache after each epoch. 

In [18]:
#Toy example
weights = torch.ones(3, requires_grad=True)
print(weights)

#Dummy training loop
for epoch in range(3):
    model_output = (weights * 3).sum() 
    print('Model ouput: {}'.format(model_output))
    model_output.backward() #dZ/dw
    print('dZ/dW :{}'.format(weights.grad))
    weights.grad.zero_()

tensor([1., 1., 1.], requires_grad=True)
Model ouput: 9.0
dZ/dW :tensor([3., 3., 3.])
Model ouput: 9.0
dZ/dW :tensor([3., 3., 3.])
Model ouput: 9.0
dZ/dW :tensor([3., 3., 3.])


This idea will be used when we work wit the NN modules in the PyTorch and use a built-in optimizer function to do gradient descent on them. 
```python
optimizer = torch.optim.SGD(weights, lr=0.01)
optimizer.step()
optimizer.zero_grad()
```

# BackProp example

Implement forward and backward prop for a simple linear regression example. 
$$ y = wx $$
$$ loss = (y^{known} - wx)^{2} $$
When we call for backward pass -- following gradients will be computed as per chain rule: 
$$ \frac{\partial L}{\partial w} = \frac{\partial L}{\partial y} \frac{\partial y}{\partial w} $$

In [20]:
y = torch.tensor(2.0)
x = torch.tensor(1.0)
#Create a weight array -- this requires grad 
w = torch.tensor(1.0, requires_grad=True)

#Forward pass to compute the loss value 
loss = (y - torch.mul(x,w)) ** 2 
print(loss)

#Backward pass 
loss.backward() #Whole gradient with chain rule computed here 
print(w.grad)

#If gradient descent -- We will update the weights here 

tensor(1., grad_fn=<PowBackward0>)
tensor(-2.)
