In [None]:
import torch

In [None]:
# A similar simple case would be y = x + z
x = torch.tensor(2.0, requires_grad=True)
z = torch.tensor(3.0, requires_grad=True)

y = x + z
y.backward()

print(x.grad)  # dy/dx = 1.0
print(z.grad)  # dy/dz = 1.0

tensor(1.)
tensor(1.)


## The Very Basics

Before we go throught the full example, let's check the basics of `required_grad` parameter and `.grad` attribute in PyTorch.

Let's start with a simple `y = xz` function, where we could use the product rule to compute the gradients:

$$
\frac{dy}{dx} = z \\
\frac{dy}{dz} = x
$$

In [None]:
x = torch.tensor(2.0, requires_grad=True)
z = torch.tensor(3.0, requires_grad=True)

y = x * z
y.backward()

print(x.grad)  # dy/dx = z = 3.0
print(z.grad)  # dy/dz = x = 2.0

tensor(3.)
tensor(2.)


Just for the sake of completeness, here is another example with `y = x + z`, where the gradients are simply:

$$
\frac{dy}{dx} = 1 \\
\frac{dy}{dz} = 1
$$

In [None]:
# A similar simple case would be y = x + z
x = torch.tensor(2.0, requires_grad=True)
z = torch.tensor(3.0, requires_grad=True)

y = x + z
y.backward()

print(x.grad)  # dy/dx = 1.0
print(z.grad)  # dy/dz = 1.0

tensor(1.)
tensor(1.)


As a more complex example, let's check the derivate of a sigmoid function. This can be verified from Wikipedia. The sigmoid function is defined as:

$$
a(x) = \frac{1}{1 + e^{-x}}
$$

And the derivative is:
$$
a'(x) = a(x)(1 - a(x))
$$

Thus, if we feed in the value `x = 0.5`, we should get the same result as when we compute `S(0.5) * (1 - S(0.5))`. Let's verify this with PyTorch:

In [None]:
x = torch.tensor(0.5, requires_grad=True)

# Sigmoid only
a = torch.sigmoid(x)
output = a.detach().clone()

# Print the PyTorch computed gradient
a.backward()
print("Automatic gradient:", x.grad)

# Compare to manually computed gradient
manual_grad = output * (1 - output)
equals = torch.isclose(x.grad, manual_grad).item() # type: ignore
print("Do they equal?:", equals) 

Automatic gradient: tensor(0.2350)
Do they equal?: True
