##### Name: K Lalith Aditya
##### Regd No: 22231
##### AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD

<h4>When training neural networks, the most frequently used algorithm is back propagation. In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.</h4>

Considering the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined like this.

In [2]:
import torch

x = torch.ones(5) # input tensor
y = torch.zeros(3) # executed output
w = torch.randn(5,3, requires_grad=True)
# creates a tensor w of size 5x3 is created with random values from a normal distribution. The requires_grad=True flag indicates that gradients with respect to this tensor should be computed during backpropagation.
b = torch.randn(3, requires_grad=True)
# Similarly, a tensor b of size 3 is created with random values, and the requires_grad=True flag enables gradient computation for this tensor.
z = torch.matmul(x, w)+b # does matrix multiplication of x, w and adds tensor b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z,y)
#It compares the values in z (predicted logits) with y (expected labels) and calculates the loss value.

In [8]:
print(f"Gradient function for z = {z.grad_fn}")
# This line prints the gradient function associated with the tensor z. The grad_fn attribute of a tensor stores the reference to the function that created it. By printing z.grad_fn, you can see the operation (such as matrix multiplication and addition) that generated z.
print(f"Gradient function for loss = {loss.grad_fn}")
# Similarly, this line prints the gradient function associated with the tensor loss. Since loss is computed using the binary cross-entropy loss function, the gradient function will correspond to this function.

Gradient function for z = <AddBackward0 object at 0x000001E86B0E19F0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x000001E86B0E3A90>


In [10]:
"""
To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need 
under some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:
"""

'\nTo optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need \nunder some fixed values of x and y. To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad:\n'

In [11]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.1833, 0.2676, 0.2318],
        [0.1833, 0.2676, 0.2318],
        [0.1833, 0.2676, 0.2318],
        [0.1833, 0.2676, 0.2318],
        [0.1833, 0.2676, 0.2318]])
tensor([0.1833, 0.2676, 0.2318])


#### Disabling Gradient Tracking
By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation. However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data, i.e. we only want to do forward computations through the network. We can stop tracking computations by surrounding our computation code with torch.no_grad()

In [13]:
z = torch.matmul(x, w)+b # operation on tensor
print(z.requires_grad) # wether tensor requires grad

with torch.no_grad(): # torch.no_grad() context disables gradient tracking. 
    z = torch.matmul(x, w)+b 
print(z.requires_grad)

True
False


In [15]:
# Another aproach is to use detach() method

z = torch.matmul(x, w)+b
z_det = z.detach() # disabling the gradient tracking
print(z_det.requires_grad)

False
