<a href="https://colab.research.google.com/github/fahriyegrl/ds-677-deep-learning/blob/main/autograd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Automatic Computation of Gradients**

Let's take a look at how PyTorch can compute gradients

In [None]:
import torch

In [None]:
x = torch.tensor(2., requires_grad = True)
y = torch.tensor(3., requires_grad = True)

f = 3*x**3 - y**2
print(f)


# This computes the gradient
f.backward()

# The derivative of f with respect to x is 9*x^2   -- Let's see if this is calculated correctly at x==2
# We expect it to be 9*2^2 = 36

print(x.grad)

# The derivative of f with respect to y is -2*y   -- Let's see if this is calculated correctly at y==3
# We expect it to be -2*3 = -6

print(y.grad)



tensor(15., grad_fn=<SubBackward0>)
tensor(36.)
tensor(-6.)


In [None]:
# Notice how our tensors are now vectors


x = torch.tensor([2., 3.], requires_grad=True)
y = torch.tensor([6., 4.], requires_grad=True)



F = 3*x**3 - y**2
print(F)



tensor([-12.,  65.], grad_fn=<SubBackward0>)


#### **A small reference to Jacobians**

Now our function F is also a vector, not a scalar as before. Specifically
we have

$$  F =
 \begin{pmatrix}
 3 x_1^3 - y_1^2 \\
 3 x_2^3 - y_2^2
\end{pmatrix}
$$

Here $x_i$ is the entry $i$ of the vector $x$, and similarly for $y_i$.


Formally, the generalization of the 'gradient' of F with respect to the vector $x$ is called a Jacobian and looks like this:
$$  J =
 \begin{pmatrix}
 \frac{\partial{F_1}}{\partial x_1} & \frac{\partial{F_1}}{\partial x_2} \\
 \frac{\partial{F_2}}{\partial x_1} & \frac{\partial{F_2}}{\partial x_2}
\end{pmatrix}
$$

Here $F_i$ is the entry $i$ of the vector $F$.If we do the calculations we get:

$$
 \begin{pmatrix}
 9x_1^2 & 0 \\
 0 & 9x_2^2
\end{pmatrix}
$$

More generally, the gradient of a function with $m$ entries with respect to a vector of $n$ entries, will be a Jacobian matrix of size $m \times n$. In general, all entries can be non-zero (more on the assignment).




In [None]:
F.backward()

RuntimeError: ignored

Here 'backward' will not produce the entire Jacobian as one might expect. The reason is practical. The Jacobian will be a very big matrix,  and not all gradients are needed by the solvers. So, they are not computed.

We only need **linear combinations** of these gradients (i.e. their weighted sums). So, here 'backward' requires another vector tensor argument $T$, and what is computed internally is the product $J^T v$. In fact, this Jacobian-vector products simply compute the chain rule with such 'vector' functions.



In [None]:
T = torch.tensor([1., 1.])
F.backward(gradient=T)



In [None]:
print(x.grad)
print(y.grad)


tensor([36., 81.])
tensor([-12.,  -8.])


In [None]:
# now let's try to assess the gradient of F
print(F.grad)

None


  return self._grad


### Automatic gradients of more complicated functions

The automatic gradient computation supports multiple basic functions and all their combinations. But **importantly** these functions need to be computed
using their pytorch versions. For example:

In [None]:
x = torch.tensor(2., requires_grad = True)
y = torch.tensor(3., requires_grad = True)

f = 3*torch.cos(x) - torch.sin(y)**2

# This computes the gradient
f.backward()


print(x.grad)


print(y.grad)

tensor(-2.7279)
tensor(0.2794)


### Side note: How to be frugal with derivatives

We can exclude some parameters from derivative computation, a fact
that as we will see is useful when we are done training a model and
we just want to evaluate it on new points.

In [None]:
x = torch.tensor(2., requires_grad = True)
y = torch.tensor(3., requires_grad = False)

f = 3*torch.cos(x) - torch.sin(y)**2

# This computes the gradient
f.backward()


print(x.grad)
print(y.grad)

tensor(-2.7279)
None
