# AutoGrad in PyTorch

In this notebook we will review the automatic differentaion mechanics in PyTorch. We will see the workings of autograd for both Scalar and Non-Scalar differentiaions.

Video: https://youtu.be/I2MRiWPeXDY

Ref:
1) https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
2) https://pytorch.org/docs/stable/autograd.html
3) https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html
4) https://pytorch.org/docs/stable/notes/autograd.html


We first begin by importing the reqired libraries

In [1]:
import torch, torchvision

## Scalar Case

Let me first define a function that takes a scalar (a) and computed 5*a^2 of it. We return both the ouput and the initial value. The reason for creating a function to do this trivial task will be clear later on.

In [46]:
def compute():
    a = torch.tensor([3.], requires_grad=True)
    O = 5*a**2
    return [a,O]

Now that we our function, lets print the output

In [47]:
[a,O] = compute()
print(a)
print(O)

tensor([3.], requires_grad=True)
tensor([45.], grad_fn=<MulBackward0>)


Now we will perform differentiation. The is done by invoking ".backward()" on the output scalar. Once this is done all variables with ("required_grad=True") flag will have their gradients computed.

In [48]:
O.backward()
print(a.grad)

tensor([30.])


In the following lines we will recompute the same without a function. When we call the function multiple times, it overwrites 'a' and 'O" with new compute graphs. However if I do not create a function can perform computation multiple times, it uses the same graph and accumulates the gradients. After every iteration the gradients need to be zeroed out (refresh the graph) for proper computations. 

In [30]:
a = torch.tensor([2.], requires_grad=True)


tensor([20.])


In [35]:
O = 5*a**2
O.backward()
print(a.grad)

tensor([20.])


In [34]:
a.grad.zero_()

tensor([0.])

Now let us inspect the auto-diff graph. For our computation the graph looks like the following:

![Compute Graph](compGraph.png)

In [52]:
[a,O] = compute()
print(O.grad_fn)
print(O.grad_fn.next_functions)

<MulBackward0 object at 0x0000022027A86E80>
((<PowBackward0 object at 0x0000022027A86310>, 0), (None, 0))


We can manully call backward functions to compute the values that they return

In [58]:
temp = torch.tensor([1.])
print(O.grad_fn(temp))
print(O.grad_fn.next_functions[0][0](temp))

(tensor([5.]), None)
tensor([6.], grad_fn=<MulBackward0>)


## Non-Scalar Case

In the previous section we saw scalar case, what about matrix derivates (Neural Networks). To demonstrate the process we will create a small neural network.

In [61]:

x = torch.ones(3)  # input tensor
w = torch.randn(3, 2, requires_grad=True)
b = torch.randn(2, requires_grad=True)
z = torch.matmul(x, w)+b
y = torch.zeros(2) # Output
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [62]:
loss.backward()
print(w.grad)
print(b.grad)


tensor([[0.4273, 0.1037],
        [0.4273, 0.1037],
        [0.4273, 0.1037]])
tensor([0.4273, 0.1037])


## No-Grad mode

Generating computational graph and keeping track of gradients hurts both performance and memory. For inference, computation graphs are not required and we can perform computation by dsiabling gradient tracking. This can be done in the following ways

In [63]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [64]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False
