<a href="https://colab.research.google.com/github/junawaneshivani/Pytorch/blob/master/nb2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AUTOGRAD: AUTOMATIC DIFFERENTIATION
- The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.
- If you set your tensor's attribute `.requires_grad` as `True`, it starts to track all operations on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.
- To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.
- To prevent tracking history (and using memory), you can also wrap the code block in `with torch.no_grad()`:. This can be particularly helpful when evaluating a model.
- Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a `.grad_fn` attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).
- If you want to compute the derivatives, you can call .backward() on a Tensor.

In [None]:
import torch
x = torch.ones(2, 2, requires_grad=True)              # start tracking computation
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [None]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [None]:
z = y * y * 3
out = z.mean()
print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


In [None]:
out.backward()   # dout/dx
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


we have that $out = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.
Therefore,
$\frac{\partial out}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence
$\frac{\partial out}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.

In [None]:
print(y.grad)    # will throw an error

  """Entry point for launching an IPython kernel.


None


In [None]:
# Ways to disable gradient computation

print(x.requires_grad)

with torch.no_grad():
    print((x+2).requires_grad)

y = x.detach()
print(y.requires_grad)

x.requires_grad_(False)
print(x.requires_grad)

True
False
False
False


**Reference** : https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

More example: https://www.youtube.com/watch?v=DbeIqrwb_dE&list=PLqnslRFeH2UrcDBWF5mfPGpqQDSta6VK4&index=4&t=0s

In [None]:
# grads can be created only for scalar outputs
x = torch.randn(2, 5, requires_grad=True)
y = x + 2
z = y * y * 2
#vec = torch.randn(2, 5)            # needed for jacobian product
z.backward(vec)                    # z is not a scalar
print(x.grad)

tensor([[  1.9968,   0.4964, -12.6622,   4.3206,   1.0706],
        [ -1.0943,   6.7481,  27.1264,  -2.9408,   0.1543]])


In [None]:
# grads need to be reset every iteration as they get summed up

weights = torch.ones(3, requires_grad=True)
EPOCHS = 3

for epoch in range(EPOCHS):
    output = (weights * 3).sum() # grad of x * 3 = 3
    output.backward()
    
    print(weights.grad)
    #weights.grad.zero_()

tensor([3., 3., 3.])
tensor([6., 6., 6.])
tensor([9., 9., 9.])


In [None]:
# If you are using optimizer even then you need to make the grads 0 using below function

optimizer = torch.optim.SGD(weights, lr=0.1)
optimizer.step()
optimizer.zero_grad()