**2.5.1 A Simple Function**

In [1]:
import torch

x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

In [2]:
x.requires_grad_(True)
x.grad

In [3]:
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

In [4]:
#@ Take the gradient of Y with respect to x 
y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

In [5]:
x.grad == 4*x

tensor([True, True, True, True])

x.grad.zero_()    # Reset the gradient
y = x.sum()
y.backward()
y.grad

**2.5.2 Backward for Non-Scalar Variables**

In [6]:
x.grad.zero_()
y = x * x
y.backward(gradient=torch.ones(len(y)))
x.grad

tensor([0., 2., 4., 6.])

**2.5.3 Detaching Computation**

In [7]:
x.grad.zero_()
y = x * x
u = y.detach()
z = u * x
z.sum().backward()
x.grad == u

tensor([True, True, True, True])

In [8]:
x.grad.zero_()
y.sum().backward()
x.grad == 2 * x

tensor([True, True, True, True])

**2.5.4 Gradients and Python Control Flow**

In [9]:
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

In [10]:
a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()

In [11]:
a.grad == d/a

tensor(True)

Dynamic control ﬂow is very common in deep learning. 

For instance, when processing text,the computational graph depends on the length of the input. In these cases, automatic diﬀerentiation becomes vital for statistical modeling since it is impossible to compute the gradient a priori.