## **Gradients used to optimize the function approximation for DNNs using Backpropogation**

In [1]:
import torch
import numpy as np

**Let's Define a tensor. In-default requires_grad argument is set to False. In order to calculate gradients we should enable this (requires_grad = True)**

In [8]:
x_without_grad = torch.randn(3)
x_without_grad.requires_grad

False

In [41]:
# Create a function in order to create a computational graph 

def compGraph(x):
    y = x + 2
    z = y*y*2
    z = z.mean()
    return y, z

In [42]:
y_without_grad,z_without_grad = compGraph(x_without_grad) 
y_without_grad.requires_grad, z_without_grad.requires_grad

(False, False)

In [43]:
z_without_grad.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

**RuntimeError Generated since we didn't enabled the requires_grad Flag. Let's redo the process by enabling that**

In [72]:
x_with_grad = torch.randn(3, requires_grad=True)
x_with_grad.requires_grad

True

In [73]:
y_with_grad,z_with_grad = compGraph(x_with_grad) 
y_with_grad.requires_grad, z_with_grad.requires_grad

(True, True)

In [74]:
z_with_grad.backward()

In [75]:
y_with_grad.grad #return None

**But one thing to notice is that the concept of leaf variables and intermediate variables. if you call y_with_grad.grad the output would be None. The reason is that Y is an intermediate variable while X is a leaf variable & Z is the root variable. All the grads for intermendiate variables will be removed after calling backward() function. So if you want to retain gradients for intermediate variables then you should call y.retain_grad() before calling backward()**

In [76]:
x_with_grad = torch.randn(3, requires_grad=True)
y_with_grad,z_with_grad = compGraph(x_with_grad) 
y_with_grad.retain_grad()
z_with_grad.backward()

In [77]:
y_with_grad.grad

tensor([2.2461, 4.3175, 1.7098])

## **Now learn how we can stop tracking the gradients**

**There are 3 basic methods to perform this**

        1) x.requires_grad_(False)
        2) x.detach()
        3) with torch.no_grad():
                {erform opration with x}

In [89]:
x1 = torch.randn(3, requires_grad=True)
x2 = torch.randn(3, requires_grad=True)
x3 = torch.randn(3, requires_grad=True)

## **Method 01**

In [85]:
print('Before Grad Disabling : ', x1)

x1.requires_grad_(False) # when a function contain 'trailing _' which literally means it changes the variable inplace (overwrite)

print('After Grad Disabling  : ', x1)

Before Grad Disabling :  tensor([0.2599, 1.0128, 0.2167], requires_grad=True)
After Grad Disabling  :  tensor([0.2599, 1.0128, 0.2167])


## **Method 02**

In [90]:
print('Before Grad Disabling : ', x2)

x2 = x2.detach()

print('After Grad Disabling  : ', x2)

Before Grad Disabling :  tensor([-1.5315,  1.4275,  0.0762], requires_grad=True)
After Grad Disabling  :  tensor([-1.5315,  1.4275,  0.0762])


## **Method 03**

In [92]:
print('Before Grad Disabling : ', x3)

with torch.no_grad():
    y3 = x3 + 3
    print('After Grad Disabling  : ', y3)

Before Grad Disabling :  tensor([-0.0487, -0.8465, -0.3214], requires_grad=True)
After Grad Disabling  :  tensor([2.9513, 2.1535, 2.6786])
