In [1]:
import torch

In [4]:
x = torch.rand(10)
print(x)

tensor([0.4139, 0.8000, 0.7144, 0.3358, 0.7892, 0.6328, 0.1489, 0.4234, 0.3254,
        0.2530])


In [21]:
x_with_grad = torch.rand(10, requires_grad=True)
print(x_with_grad)

tensor([0.6628, 0.4319, 0.2934, 0.6871, 0.8195, 0.5396, 0.0580, 0.9034, 0.8306,
        0.5603], requires_grad=True)


Let's compare the operations that we can do on both the regular tensor and the tensor with `requires_grad=True`.

In [7]:
print((x**2).mean())

tensor(0.2829)


In [8]:
print((x_with_grad**2).mean())

tensor(0.2352, grad_fn=<MeanBackward0>)


We see `grad_fn=<MeanBackwards>` in the output. If any of the tensors in a computation have `required_grad=True`, then the output of any computation will now be a tensor with a `grad_fn` attached to it.

In [22]:
res = (x_with_grad**2).mean()

In [23]:
print(res.grad_fn)
print(res.grad_fn.name)

<MeanBackward0 object at 0x132db49d0>
<built-in method name of MeanBackward0 object at 0x132db6170>


When we run backpropagation, we populate the `.grad` attribute of our initial tensor.

In [24]:
print("X prior to backpropagation")
print(x_with_grad)
res.backward()
print(res)
print("X after backpropagation: the `grad` is now populated")
# print(x_with_grad)
print(x_with_grad.grad)

X prior to backpropagation
tensor([0.6628, 0.4319, 0.2934, 0.6871, 0.8195, 0.5396, 0.0580, 0.9034, 0.8306,
        0.5603], requires_grad=True)
tensor(0.3970, grad_fn=<MeanBackward0>)
X after backpropagation: the `grad` is now populated
tensor([0.1326, 0.0864, 0.0587, 0.1374, 0.1639, 0.1079, 0.0116, 0.1807, 0.1661,
        0.1121])


We can verify that the gradient calculation is correct. Since it's a mean of ten values, the gradient is just $\frac{1}{10}$, and since it's a square, the gradient is $*2$.

In [28]:
print(x_with_grad.grad)
print(x_with_grad / 10 * 2)
print(
    torch.isclose(x_with_grad.grad, x_with_grad / 10 * 2, atol=1e-8)
)

tensor([0.1326, 0.0864, 0.0587, 0.1374, 0.1639, 0.1079, 0.0116, 0.1807, 0.1661,
        0.1121])
tensor([0.1326, 0.0864, 0.0587, 0.1374, 0.1639, 0.1079, 0.0116, 0.1807, 0.1661,
        0.1121], grad_fn=<MulBackward0>)
tensor([True, True, True, True, True, True, True, True, True, True])


Whenever we call "backwards", PyTorch collapses the computational graph and returns the result. In doing so, it frees up memory, which means that we can't call "backwards" twice.