<a href="https://colab.research.google.com/github/hyeok1235/PyTorch/blob/main/Autograd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

When training neural networks, the most frequently used algorithm is **back propagation**. In this algorithm, parameters(model weights) are adjusted according to the **gradient** of the loss function with respect to the given parameter.

To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd. It supports automatic computation of gradient for any computational graph.

In [None]:
import torch

x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [None]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x78b1888abd60>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x78b1888a9f00>


# Computing Gradients

In [None]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.2345, 0.2499, 0.1201],
        [0.2345, 0.2499, 0.1201],
        [0.2345, 0.2499, 0.1201],
        [0.2345, 0.2499, 0.1201],
        [0.2345, 0.2499, 0.1201]])
tensor([0.2345, 0.2499, 0.1201])


# Disabling Gradient Tracking

In [None]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
  z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [None]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


# More on Computational Graphs

autograd keeps a record of data and all executed operations in a directed acyclic graph (DAG). In this DAG, leaves are the input tensors, and roots are the output tensors.

# Optional Reading: Tensor Gradients and Jacobian Products

In [None]:
# Pytorch supports Jacobian product
inp = torch.eye(4, 5, requires_grad=True)
out = (inp+1).pow(2).t()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"First call\n{inp.grad}")
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nSecond call\n{inp.grad}")
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")

First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])
