## Introduction to Computational Graphs with PyTorch

by [Shreyas Kale](https://www.linkedin.com/in/shreyaskale11/)


In this notebook we provide a short introduction and overview of computational graphs using PyTorch.

Inspired by Olah's article ["Calculus on Computational Graphs: Backpropagation"](https://colah.github.io/posts/2015-08-Backprop/).

### Why Computational Graphs?

In [9]:
import torch

Define the inputs like this:

In [10]:
a = torch.tensor([3.], requires_grad=True)
b = torch.tensor([1.], requires_grad=True)

In [11]:
c = a + b
d = b + 1
e = c * d

# grads populated for non-leaf nodes
c.retain_grad()
d.retain_grad()
e.retain_grad()

In [12]:
print(e)

tensor([8.], grad_fn=<MulBackward0>)


### Derivatives on Computational Graphs

Using the concept of computational graphs we are now interested in evaluating the **partial derivatives** of the edges of the graph. This will help in gathering the gradients of the graph. Remember that gradients are what we use to train the neural network and those calculations can be taken care of by the automatic differentation engine. 

The intuition is: we want to know, for example, if $a$ directly affects $c$, how does it affect it. In other words, if we change $a$ a little, how does $c$ change. This is referred to as the partial derivative of $c$ with respect to $a$.

You can work this by hand, but the easy way to do this with PyTorch is by calling `.backward()` on $e$ and let the engine figure out the values. The `.backward()` signals the autograd engine to calculate the gradients and store them in the respective tensors’ `.grad` attribute.

Let's do that now:

In [13]:
e.backward()

Now, let’s say we are interested in the derivative of $e$ with respect to $a$, how do we obtain this? In other words, we are looking for $\frac{\partial e}{\partial a}$.

Using PyTorch, we can do this by calling `a.grad`:

In [14]:
print(a.grad)

tensor([2.])


It is important to understand the intuition behind this. Olah puts it best:

>Let’s consider how $e$ is affected by $a$. If we change $a$ at a speed of 1, $c$ also changes at a speed of $1$. In turn, $c$ changing at a speed of $1$ causes $e$ to change at a speed of $2$. So $e$ changes at a rate of $1*2$ with respect to $a$.


To check that this holds, let look at another example. How about caluclating the derivative of $e$ with respect to $b$, i.e., $\frac{\partial e}{\partial b}$?

We can get that through `b.grad`:

In [15]:
print(b.grad)

tensor([6.])


Here are all the gradients collected, including non-leaf nodes:

In [16]:
print(a.grad, b.grad, c.grad, d.grad, e.grad)

tensor([2.]) tensor([6.]) tensor([2.]) tensor([4.]) tensor([1.])


You can use the computational graph above to verify that everything is correct. This is the power of computational graphs and how they are used by automatic differentation engines. It's also a very useful concept to understand when developing neural networks architectures and their correctness.

### Next Steps

In this notebook, I've provided a simple and intuitive explanation to the concept of computational graphs using PyTorch. I highly recommend to go through [Olah's article](https://colah.github.io/posts/2015-08-Backprop/) for more on the topic.

In the next tutorial, I will be applying the concept of computational graphs to more advanced operations you typically see in a neural network. In fact, if you are interested in this, and you are feeling comfortable with the topic now, you can check out these two PyTorch tutorials:

- [A gentle introduction to `torch.autograd`](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)
- [Automatic differentation with `torch.autograd`](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html)

And here are some other useful references used to put this article together:

- [Hacker's guide to Neural Networks
](http://karpathy.github.io/neuralnets/)
- [Backpropagation calculus](https://www.youtube.com/watch?v=tIeHLnjs5U8&ab_channel=3Blue1Brown)

