# Automatic Differentiation
:label:`sec_autograd`

## Section Summary
This section discusses automatic differentiation, which is a powerful tool used by modern deep learning frameworks. The process of calculating derivatives can be a tedious and error-prone task when done by hand, especially for complex models. Fortunately, autograd packages can build computational graphs that track how each value depends on others and calculate derivatives using the chain rule via backpropagation. The section provides an example of how to calculate the gradient of a function with respect to a vector using PyTorch.





In [24]:
import torch

## A Simple Function


In [25]:
x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

In [26]:
# Can also create x = torch.arange(4.0, requires_grad=True)
x.requires_grad_(True)
x.grad  # The gradient is None by default

In [27]:
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

In [28]:
y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

In [29]:
x.grad == 4 * x

tensor([True, True, True, True])

In [30]:
x.grad.zero_()  # Reset the gradient
y = x.sum()
print(y)
y.backward()
x.grad

tensor(6., grad_fn=<SumBackward0>)


tensor([1., 1., 1., 1.])

## Backward for Non-Scalar Variables

In [34]:
x.grad.zero_()
y = x * x
y.backward(gradient=torch.ones(len(y)))  # Faster: y.sum().backward()
print(x.grad)

x.grad.zero_()
y = x * x
y.sum().backward()
print(x.grad)

tensor([0., 2., 4., 6.])
tensor([0., 2., 4., 6.])


## Detaching Computation


In [35]:
x.grad.zero_()
y = x * x
u = y.detach()
z = u * x

z.sum().backward()
x.grad == u

tensor([True, True, True, True])

In [36]:
x.grad.zero_()
y.sum().backward()
x.grad == 2 * x

tensor([True, True, True, True])

## Gradients and Python Control Flow


In [37]:
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

In [46]:
a = torch.randn(size=(), requires_grad=True)
print(a)
d = f(a)
print(d)
d.backward()
a.grad

tensor(0.0139, requires_grad=True)
tensor(1823.8965, grad_fn=<MulBackward0>)


tensor(131072.)

In [13]:
a.grad == d / a

tensor(True)