# Automatic Differentiation

A brief informal introduction to automatic differentiation.

A Neural Network is a function implemented using a composition of small functions. The building blocks of Neural Networks are linear projections and non linear activation functions.

How do we compute the derivative a composite function? Using the chain rule. What pytorch, tensorflow or Jax does is computing this derivatives for any function we specify. So our task is:

1. Specify the function
2. Tell pytorch which derivatives do we want to obtain.

### Basic function

Let's start with the function:

$$
y = (x^2 - 4)^3 + 2
$$

The derivative is

$$
\frac{\partial y}{\partial x} = 3 \cdot (x^2 - 4)^2 \cdot 2 x
$$

something we usually derive implicitely using our knowledge of basic derivatives, such that the derivative of $f(x)^{n}$ is $f(x)^{n-1} \cdot f'(x)$. 

However, when things get complicate we need to specify a way to obtain complex derivatives, and this is where the chain rule comes into place. For functions $f:\mathbb{R} \rightarrow \mathbb{R}$, we can use the one-dimensional chain rule. 

The idea of the chain rule is to express functions as composition of elementary functions from which the derivative is well known. In our example we could break into a small composition of functions.

$$
y = (x^2 - 4)^3 + 2
$$

could be written as:

$$
\begin{split}
t = x^2 \\
u = t - 4\\
z = u^3 \\
y = z + 2 
\end{split}
$$

Let's start substituting to show it is equivalent. First subsitute $z = u^3$ into $y = z + 2$, yielding $y = u^3 + 2$. Now do the same thing and substitute $u = t - 4$ into $y = u^3 + 2$, yielding: $y = (t - 4)^3 + 2$, and finally subsitute $t = x^2$ into $y = (t - 4)^3 + 2$ yielding $y = ( x^2 - 4)^3 + 2$, which is our original function.

When break into small functions, we know that the full derivative can be obtained by multiplying the derivatives of the individual functions. In other words:

$$
\frac{\partial y}{\partial x} = \frac{\partial y}{\partial z}\frac{\partial z}{\partial u}\frac{\partial u}{\partial t}\frac{\partial t}{\partial x}
$$

Obviously, the derivative of these small functions are usually easier and well known. In fact, the most important thing is that this gives a principle way of obtaining complex derivatives in a structured and well-ordered way. In fact, in this example when you applied the rule $f(x)^{n}$ is $f(x)^{n-1} \cdot f'(x)$, to obtain $\frac{\partial y}{\partial x}$ directly, you are implicitely applying the chain rule without noticing it.

Let's do it:

$$
\begin{split}
&\frac{\partial y}{\partial z} = 1\\
&\frac{\partial z}{\partial u} = 3u^2\\
&\frac{\partial u}{\partial t} = 1\\
&\frac{\partial t}{\partial x} = 2x\\
\end{split}
$$

So applying chain rule we have:

$$
\frac{\partial y}{\partial x} = \frac{\partial y}{\partial z}\frac{\partial z}{\partial u}\frac{\partial u}{\partial t}\frac{\partial t}{\partial x} = 1 \cdot 3u^2 \cdot 1 \cdot 2x = 3u^2\cdot 2x 
$$

Note that the last is step is to actually have a derivative which is a function of $x$. In other words, we need to replace the value of $u$ in this example. So using a similar proceedure as above we know the steps to perform are: 

1. substitute $t = x^2 $ into $u = t - 4$ to yield $u = x^2 - 4$, finishing the subsitution.
2. If we place this result back into the derivative, subsituting $u$ by this expression that depends on $x$ we yield:

$$
\frac{\partial y}{\partial x} = \frac{\partial y}{\partial z}\frac{\partial z}{\partial u}\frac{\partial u}{\partial t}\frac{\partial t}{\partial x} = 1 \cdot 3u^2 \cdot 1 \cdot 2x = 3u^2\cdot 2x = 3(x^2 - 4)^2 \cdot 2x
$$

which happily matches the original derivative we obtained. For multivariate functions $f:\mathbb{R}^n \rightarrow \mathbb{R}^m$, things change a bit but that is a story for another chapter.