# 2.5.1 A Simple Function

# A Simple Function Example

## Function Definition
Let's assume that we are interested in differentiating the function y = 2x·µÄx with respect to the column vector x. To start, we assign x an initial value.

In [1]:
import torch

In [2]:
x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

- Before we calculate the gradient of ùë¶ with respect to ùë•, we need a place to store it. Ingeneral, we avoid allocating new memory every time we take a derivative because deeplearning requires successively computing derivatives with respect to the same parametersa great many times, and we might risk running out of memory. Note that the gradient of a scalar-valued function with respect to a vector x is vector-valued with the same shape asx

In [3]:
x.requires_grad_(True)
x.grad

We now calculate our func of x and assign the result on y

In [4]:
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

we can now take the gradient of y with resp to x by calling its backward method. we can access the gradient via x's grad attr

In [None]:
y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

We already know that the gradient of the function $y = 2x^\top x$ with respect to $x$ should be $4x$. We can now verify that the automatic gradient computation and the expected result are identical.

In [7]:
print(x.grad == 4 * x)

tensor([True, True, True, True])


Now let's calculate another function of $x$ and take its gradient. Note that PyTorch does not automatically reset the gradient buffer when we record a new gradient. Instead, the new gradient is added to the already-stored gradient. This behavior comes in handy when we want to optimize the sum of multiple objective functions. To reset the gradient buffer, we can call `x.grad.zero_()` as follows:

In [8]:
x.grad.zero_()  # reset the gradient to zero
y = x.sum()
y.backward() # compute the gradient of y with respect to x
print(x.grad)

tensor([1., 1., 1., 1.])
