# Automatic differentiation

In [2]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
z

tensor([0.6588, 0.1461, 3.2694], grad_fn=<AddBackward0>)

In [4]:
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
loss

tensor(1.7172, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)

In [5]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x00000248167FAE50>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x000002481682E280>


### Computing gradients
The parameters are tracked to compute the gradient that will be used with the back propagation algorithm. The computations are defined in a computational graph, a directed acyclic graph of variables and operators performed by the model. Every function or operator applied to a variable is differentiable and the derivative of any such function is computed and stored in the computational graph in order to be available to the back propagation algorithm.

In [6]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.2197, 0.1788, 0.3211],
        [0.2197, 0.1788, 0.3211],
        [0.2197, 0.1788, 0.3211],
        [0.2197, 0.1788, 0.3211],
        [0.2197, 0.1788, 0.3211]])
tensor([0.2197, 0.1788, 0.3211])


We can stop tracking the parameters in two situations. One when, after the training of a model is complete, we need the model only for forward computation of an input so there's no need to update the parameters that will be "frozen". The second situation if during a fine-tuning of model using a dataset unseen before.

In [8]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


### Computing the Jacobian
The Jacobian is used when the loss function is not a scalar.

In [9]:
inp = torch.eye(4, 5, requires_grad=True)
out = (inp+1).pow(2).t()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"First call\n{inp.grad}")
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nSecond call\n{inp.grad}")
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")

First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])
