# 0x05 Gradient descent and backpropagation

In this tutorial we will cover the implementations and how-tos for interacting
with the backpropagation engine of PyTorch.

PyTorch is the go-to library for deep learning in Python especially if you are building a custom model on your own.
You will be very likely be using PyTorch when you are doing your research.

> 💡 **NOTE**: 
>
> We assume you have already learnt the fundamentals of derivatives and gradients.
>
> If you need a quick recap, check out this explanation [here](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#optional-reading-vector-calculus-using-autograd) by the PyTorch team.
>
> Another suggested deep walkthrough is this [3blue1brown video](https://www.youtube.com/watch?v=tIeHLnjs5U8) on the topic.

## 1. Backpropagation

One of PyTorch tensors' biggest difference with NumPy arrays is that they can track gradients.

Two important ways to interact with it are `requires_grad` and the `.grad` property.
Let us see it with a simple example.

Consider this formula:
$$
y = w_1x^2 + w_2x + b
$$

where $x$ is the input.

> 🤔 **THINKING**
>
> - What is $\frac{\partial y}{\partial w_1}$, $\frac{\partial y}{\partial w_2}$, $\frac{\partial y}{\partial b}$? Compute by hand.

To prove your computation, let us implement this formula in PyTorch.

In [None]:
import torch

x = torch.tensor([2.0])

# Identify the parameters that you need to compute gradients for
# and set requires_grad=True
w1 = torch.tensor([1.0], requires_grad=True)
w2 = torch.tensor([3.0], requires_grad=True)
b = torch.tensor([4.0], requires_grad=True)
# You will see why we used an intermediate variable z here.
z = w1 * torch.pow(x, 2) + w2 * x
y = z + b
y

tensor([14.], grad_fn=<AddBackward0>)

In [3]:
# Compute backpropagation
y.backward()

In [4]:
# Gradient w.r.t. w1, w2, and b are stored in .grad of the tensors
print(w1.grad)  # dy/dw1
print(w2.grad)  # dy/dw2
print(b.grad)    # dy/db

tensor([4.])
tensor([2.])
tensor([1.])


Is your computation correct?

Now, you may want to also get $\frac{dy}{dz}$ from PyTorch.

In [6]:
print(z.grad)  # dy/dz

None


  print(z.grad)  # dy/dz


Oh no 😱, we cannot do it!

This is because PyTorch does not update the gradients on **non-leaf** tensors. This makes sense because model parameters are leaf tensors. If you really **DO** want to compute the gradients of non-leaf tensors for specific use cases, you can use [`.retain_grad()`](https://pytorch.org/docs/stable/generated/torch.Tensor.retain_grad.html).

> 📚 **EXERCISE**
>
> - Define an expression on your own and get the gradients using backpropagation.
> - Currently, our `y` is a scalar. Although loss is usually scalar in deep learning, what if we have a vector as `y`? How do we compute the gradients in this case?

In [None]:
# === Your code here ===

## 2. Toggling gradient tracking

## 3. Gradient descent and optimizing process