# Automatic Differentiation with ``torch.autograd``
---

[Link](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html)<br>

Used for <b>back propagation</b>.<br>

Simple one-layer nn:

In [1]:
import torch

x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [2]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x7f66a7acb730>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f66a7acbee0>


# Computing Gradients
---

In [3]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0513, 0.2702, 0.0510],
        [0.0513, 0.2702, 0.0510],
        [0.0513, 0.2702, 0.0510],
        [0.0513, 0.2702, 0.0510],
        [0.0513, 0.2702, 0.0510]])
tensor([0.0513, 0.2702, 0.0510])


# Disabling Gradient Tracking
---

In [4]:
z = torch.matmul(x, w) + b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w) + b
print(z.requires_grad)

True
False


In [5]:
# Another way
z = torch.matmul(x, w) + b
z_det = z.detach()
print(z_det.requires_grad)

False


Use to mark <b>frozen parameters</b>. See [finetuning a pretrained network](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html).

# More on Computational Graphs
---

Direct acyclic grapch (DAG) has leaves as the input tensors and roots are the output tensors. By tracking this graph from roots to the leaves, you can automatically compute the gradients using the chain rule.

# Optional Reading: Tensor Gradients and Jacobian Products
---

Instead of computing the Jacobian matrix itself, PyTorch allows us to compute the **Jacobian Product** $v^T$.

In [6]:
inp = torch.eye(4, 5, requires_grad=True)
out = (inp+1).pow(2).t()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"First call\n{inp.grad}")
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nSecond Call\n{inp.grad}")
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")

First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Second Call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])


# Further Reading
---

* [Autograd Mechanics](https://pytorch.org/docs/stable/notes/autograd.html)
* [Extending PyTorch](https://pytorch.org/docs/stable/notes/extending.html)
* [Finetuning Torchvision Models](https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html)