<a href="https://colab.research.google.com/github/udlbook/udlbook/blob/main/CM20315_Gradients_I.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CM20315 Gradients I

We're going to investigate how to take the derivatives of functions where one operation is composed with another, which is composed with a third and so on.    For example, consider the function:

\begin{equation}
     y = \beta_4+\omega_4\cdot \log\biggl[\beta_3+\omega_3\cdot\cos\Bigl[\beta_2+\omega_2\cdot\exp\bigl[\beta_1+\omega_1\cdot\sin[\beta_0+\omega_0x]\bigr]\Bigr]\biggr],
\end{equation}

which is a composition of the functions $\log[\bullet], \cos[\bullet],\exp[\bullet],\sin[\bullet]$.   I chose these just because you probably already know the derivatives of these functions:

\begin{align*}
\frac{\partial \log[z]}{\partial z} &= \frac{1}{z} \
\frac{\partial \cos[z]}{\partial z} &= -\sin[z] \
\frac{\partial \exp[z]}{\partial z} &= \exp[z] \
\frac{\partial \sin[z]}{\partial z} &= \cos[z].
\end{align*}

Suppose that we know the current values of $\beta_{0},\beta_{1},\beta_{2},\beta_{3},\beta_{4},\omega_{0},\omega_{1},\omega_{2},\omega_{3},\omega_{4}$, and $x$. We could obviously calculate $y$.   But we also want to know how $y$ changes when we make a small change to $\beta_{0},\beta_{1},\beta_{2},\beta_{3},\beta_{4},\omega_{0},\omega_{1},\omega_{2},\omega_{3}$, or $\omega_{4}$.  In other words, we want to compute the ten derivatives:

\begin{align*}
\frac{\partial y}{\partial \beta_{0}}, \quad \frac{\partial y}{\partial \beta_{1}}, \quad \frac{\partial y}{\partial \beta_{2}}, \quad \frac{\partial y}{\partial \beta_{3}}, \quad \frac{\partial y}{\partial \beta_{4}}, \quad \frac{\partial y}{\partial \omega_{0}}, \quad \frac{\partial y}{\partial \omega_{1}}, \quad \frac{\partial y}{\partial \omega_{2}}, \quad \frac{\partial y}{\partial \omega_{3}}, \quad \text{and} \quad \frac{\partial y}{\partial \omega_{4}}.
\end{align*}

In [None]:
# import library
import numpy as np

Let's first define the original function for $y$:

In [None]:
def fn(x, beta0, beta1, beta2, beta3, beta4, omega0, omega1, omega2, omega3, omega4):
  return beta4 + omega4 * np.log(beta3+omega3 * np.cos(beta2 + omega2 * np.exp(beta1 + omega1 * np.sin(beta0 + omega0 * x))))

Now we'll choose some values for the betas and the omegas and x and compute the output of the function:

In [None]:
beta0 = 1.0; beta1 = 2.0; beta2 = -3.0; beta3 = 0.4; beta4 = -0.3
omega0 = 0.1; omega1 = -0.4; omega2 = 2.0; omega3 = 3.0; omega4 = -0.5
x = 2.3
y_func = fn(x,beta0,beta1,beta2,beta3,beta4,omega0,omega1,omega2,omega3,omega4)
print('y=%3.3f'%y_func)

# Computing derivatives by hand

We could compute expressions for the derivatives by hand and write code to compute them directly.  Some of them are easy. For example:

\begin{equation}
\frac{\partial y}{\partial \beta_{4}}  = 1,
\end{equation}

but some have very complex expressions, even for this relatively simple original equation. For example:

\begin{align*}
\frac{\partial y}{\partial \omega_{0}}  &=& 
-\frac{\omega_{1}\omega_{2}\omega_{3}\omega_{4} x \cos[\beta_{0}\!+\!\omega_{0}x]\cdot\exp\bigl[\omega_{1}\sin[\beta_{0}\!+\!\omega_{0}x]\!+\!\beta_{1}\bigr]\cdot\sin\Bigl[\omega_{2}\exp\bigl[\omega_{1}\sin[\beta_{0}\!+\!\omega_{0}x]\!+\!\beta_{1}\bigr]\!+\!\beta_{2}\Bigr]}
{\omega_{3}\cos[\omega_{2}\exp[\omega_{1}\sin[\beta_{0}\!+\!\omega_{0}x]\!+\!\beta_{1}]\!+\!\beta_{2}]\!+\!\beta_{3}}.
\end{align*}

In [None]:
dydbeta4_func = 1
dydomega0_func = -omega1*omega2*omega3*omega4*x * np.cos(beta0+omega0*x) * \
              np.exp(omega1 * np.sin(beta0+omega0*x)+beta1) * \
              np.sin(omega2 * np.exp(omega1 * np.sin(beta0+omega0 *x)+beta1)+beta2)/ \
              (omega3 * np.cos(omega2 * np.exp(omega1 * np.sin(beta0+omega0*x)+beta1)+beta2)+beta3)

Let's make sure these are correct using finite differences:

In [None]:
dydbeta4_fd = (fn(x,beta0,beta1,beta2,beta3,beta4+0.0001,omega0,omega1,omega2,omega3,omega4)-fn(x,beta0,beta1,beta2,beta3,beta4,omega0,omega1,omega2,omega3,omega4))/0.0001
dydomega0_fd = (fn(x,beta0,beta1,beta2,beta3,beta4,omega0+0.0001,omega1,omega2,omega3,omega4)-fn(x,beta0,beta1,beta2,beta3,beta4,omega0,omega1,omega2,omega3,omega4))/0.0001

print('dydbeta4: Function value = %3.3f, Finite difference value = %3.3f'%(dydbeta4_func,dydbeta4_fd))
print('dydomega0: Function value = %3.3f, Finite difference value = %3.3f'%(dydomega0_func,dydomega0_fd))

The code to calculate $\partial y/ \partial \omega_0$ is a bit of a nightmare.  It's easy to make mistakes, and you can see that some parts of it are repeated (for example, the $\sin[\bullet]$ term), which suggests some kind of redundancy in the calculations.  The goal of this practical is to compute the derivatives in a much simpler way.  There will be three steps:

**Step 1:** Write the original equations as a series of intermediate calculations.  We change 

\begin{equation}
     y = \beta_4+\omega_4\cdot \log\biggl[\beta_3+\omega_3\cdot\cos\Bigl[\beta_2+\omega_2\cdot\exp\bigl[\beta_1+\omega_1\cdot\sin[\beta_0+\omega_0x]\bigr]\Bigr]\biggr]
\end{equation}

to 

\begin{align}
f_{0} &=& \beta_{0} + \omega_{0} x\nonumber\\
h_{1} &=& \sin[f_{0}]\nonumber\\
f_{1} &=& \beta_{1} + \omega_{1}h_{1}\nonumber\\
h_{2} &=& \exp[f_{1}]\nonumber\\
f_{2} &=& \beta_{2} + \omega_{2} h_{2}\nonumber\\
h_{3} &=& \cos[f_{2}]\nonumber\\
f_{3} &=& \beta_{3} + \omega_{3}h_{3}\nonumber\\
h_{4} &=& \log[f_{3}]\nonumber\\
y &=& \beta_{4} + \omega_{4} h_{4}
\end{align}

and compute and store the values of all of these intermediate values.  We'll need them to compute the derivatives.<br>  This is called the **forward pass**.

In [None]:
# TODO compute all the f_k and h_k terms 
# Replace the code below

f0 = 1
h1 = 1
f1 = 1
h2 = 1
f2 = 1
h3 = 1
f3 = 1
h4 = 1
y = 1

In [None]:
# Let's check we got that right:
print("f0: true value = %3.3f, your value = %3.3f"%(1.230, f0))
print("h1: true value = %3.3f, your value = %3.3f"%(0.942, h1))
print("f1: true value = %3.3f, your value = %3.3f"%(1.623, f1))
print("h2: true value = %3.3f, your value = %3.3f"%(5.068, h2))
print("f2: true value = %3.3f, your value = %3.3f"%(7.137, f2))
print("h3: true value = %3.3f, your value = %3.3f"%(0.657, h3))
print("f3: true value = %3.3f, your value = %3.3f"%(2.372, f3))
print("h4: true value = %3.3f, your value = %3.3f"%(0.864, h4))
print("y_func = %3.3f, y = %3.3f"%(y_func, y))


**Step 2:** Compute the derivatives of $y$ with respect to the intermediate quantities that we just calculated, but in reverse order:

\begin{align}
\frac{\partial y}{\partial h_4}, \quad \frac{\partial y}{\partial f_3}, \quad \frac{\partial y}{\partial h_3}, \quad \frac{\partial y}{\partial f_2}, \quad
\frac{\partial y}{\partial h_2}, \quad \frac{\partial y}{\partial f_1}, \quad \frac{\partial y}{\partial h_1}, \quad\text{and}\quad \frac{\partial y}{\partial f_0}.
\end{align}

The first of these derivatives is straightforward:

\begin{equation}
\frac{\partial y}{\partial h_{4}} = \frac{\partial }{\partial h_{4}} \beta_{4} + \omega_{4} h_{4} = \omega_{4}.
\end{equation}

The second derivative can be calculated using the chain rule:

\begin{equation}
\frac{\partial y}{\partial f_{3}} = \frac{\partial y}{\partial h_{4}} \frac{\partial h_{4}}{\partial f_{3}}.
\end{equation}

The left-hand side asks how $y$ changes when $f_{3}$ changes.  The right-hand side says we can decompose this into (i) how $y$ changes when $h_{4}$ changes and how $h_{4}$ changes when $f_{4}$ changes.  So you get a chain of events happening:  $f_{3}$ changes $h_{4}$, which changes $y$, and the derivatives represent the effects of this chain.  Notice that we computed the first of these derivatives already and the other one is the derivative of $\log[f_{3}]$ is simply $1/f_{3}$.  We calculated $f_{3}$ in step 1.

We can continue in this way, computing the derivatives of the output with respect to these intermediate quantities:

\begin{align}
\frac{\partial y}{\partial h_{3}} &=& \frac{\partial y}{\partial h_{4}} \frac{\partial h_{4}}{\partial f_{3}}\frac{\partial f_{3}}{\partial h_{3}}=\frac{\partial y}{\partial f_{3}} \frac{\partial f_{3}}{\partial h_{3}}\nonumber \\
\frac{\partial y}{\partial f_{2}} &=& \frac{\partial y}{\partial h_{4}} \frac{\partial h_{4}}{\partial f_{3}}\frac{\partial f_{3}}{\partial h_{3}}\frac{\partial h_{3}}{\partial f_{2}} = \frac{\partial y}{\partial h_{3}}\frac{\partial h_{3}}{\partial f_{2}}\nonumber \\
\frac{\partial y}{\partial h_{2}} &=& \frac{\partial y}{\partial h_{4}} \frac{\partial h_{4}}{\partial f_{3}}\frac{\partial f_{3}}{\partial h_{3}}\frac{\partial h_{3}}{\partial f_{2}}\frac{\partial f_{2}}{\partial h_{2}}=\frac{\partial y}{\partial f_{2}}\frac{\partial f_{2}}{\partial h_{2}}\nonumber \\
\frac{\partial y}{\partial f_{1}} &=& \frac{\partial y}{\partial h_{4}} \frac{\partial h_{4}}{\partial f_{3}}\frac{\partial f_{3}}{\partial h_{3}}\frac{\partial h_{3}}{\partial f_{2}}\frac{\partial f_{2}}{\partial h_{2}}\frac{\partial h_{2}}{\partial f_{1}}=\frac{\partial y}{\partial h_{2}}\frac{\partial h_{2}}{\partial f_{1}}\nonumber \\
\frac{\partial y}{\partial h_{1}} &=& \frac{\partial y}{\partial h_{4}} \frac{\partial h_{4}}{\partial f_{3}}\frac{\partial f_{3}}{\partial h_{3}}\frac{\partial h_{3}}{\partial f_{2}}\frac{\partial f_{2}}{\partial h_{2}}\frac{\partial h_{2}}{\partial f_{1}}\frac{\partial f_{1}}{\partial h_{1}}=\frac{\partial y}{\partial f_{1}}\frac{\partial f_{1}}{\partial h_{1}}\nonumber \\
\frac{\partial y}{\partial f_{0}} &=& \frac{\partial y}{\partial h_{4}} \frac{\partial h_{4}}{\partial f_{3}}\frac{\partial f_{3}}{\partial h_{3}}\frac{\partial h_{3}}{\partial f_{2}}\frac{\partial f_{2}}{\partial h_{2}}\frac{\partial h_{2}}{\partial f_{1}}\frac{\partial f_{1}}{\partial h_{1}}\frac{\partial h_{1}}{\partial f_{0}}=\frac{\partial y}{\partial h_{1}}\frac{\partial h_{1}}{\partial f_{0}}.
\end{align}

In each case, we have already computed all of the terms except the last one in the previous step, and the last term is simple to evaluate.  This is called the **backward pass**.

In [None]:
# TODO -- Compute the derivatives of the output with respect
# to the intermediate computations h_k and f_k (i.e, run the backward pass)
# I've done the first two for you.  You replace the code below:
dydh4 = omega4
dydf3 = dydh4 * (1/f3)
# Replace the code below
dydh3 = 1
dydf2 = 1
dydh2 = 1
dydf1 = 1
dydh1 = 1
dydf0 = 1 

In [None]:
# Let's check we got that right
print("dydh3: true value = %3.3f, your value = %3.3f"%(-0.632, dydh3))
print("dydf2: true value = %3.3f, your value = %3.3f"%(0.476, dydf2))
print("dydh2: true value = %3.3f, your value = %3.3f"%(0.953, dydh2))
print("dydf1: true value = %3.3f, your value = %3.3f"%(4.830, dydf1))
print("dydh1: true value = %3.3f, your value = %3.3f"%(-1.932, dydh1))
print("dydf0: true value = %3.3f, your value = %3.3f"%(-0.646, dydf0))

**Step 3:**  Now we will find how $y$ changes when we change the $\beta$ and $\omega$ terms. The first two are easy:

\begin{align}
\frac{\partial y}{\partial \beta_{4}} &=& \frac{\partial }{\partial \beta_{4}}(\beta_{4} + \omega_{4} h_{4}) = 1\nonumber \\
\frac{\partial y}{\partial \omega_{4}} &=& \frac{\partial }{\partial \omega_{4}}(\beta_{4} + \omega_{4} h_{4}) = h_{4}.
\end{align}

The remaining terms are calculated using the chain rule again:

\begin{align}
\frac{\partial y}{\partial \beta_{3}} &=& \frac{\partial y}{\partial f_{3}}\frac{\partial f_{3}}{\partial \beta_{3}}\nonumber \\
\frac{\partial y}{\partial \omega_{3}} &=& \frac{\partial y}{\partial f_{3}}\frac{\partial f_{3}}{\partial \omega_{3}}
\end{align}

where we already computed the first term of each right-hand side in Step 2, and the second terms are also easy to compute.  By the same logic, the other terms are:

\begin{align}
\frac{\partial y}{\partial \beta_{k}} &=& \frac{\partial y}{\partial f_{k}}\frac{\partial f_{k}}{\partial \beta_{k}}\nonumber \\
\frac{\partial y}{\partial \omega_{k}} &=& \frac{\partial y}{\partial f_{k}}\frac{\partial f_{k}}{\partial \omega_{k}}
\end{align}

for $k=2,1,0$.

In [None]:
# TODO -- Calculate the final derivatives with respect to the beta and omega terms

dydbeta4 = 1
dydomega4 = 1
dydbeta3 = 1
dydomega3 = 1
dydbeta2 = 1
dydomega2 = 1
dydbeta1 = 1
dydomega1 = 1
dydbeta0 = 1
dydomega0 = 1


In [None]:
# Let's check we got them right
print('dydbeta4: Your value = %3.3f, Function value = %3.3f, Finite difference value = %3.3f'%(dydbeta4, dydbeta4_func,dydbeta4_fd))
print('dydomega4: Your value = %3.3f, True value = %3.3f'%(dydomega4, 0.864))
print('dydbeta3: Your value = %3.3f, True value = %3.3f'%(dydbeta3, -0.211))
print('dydomega3: Your value = %3.3f, True value = %3.3f'%(dydomega3, -0.139))
print('dydbeta2: Your value = %3.3f, True value = %3.3f'%(dydbeta2, 0.476))
print('dydomega2: Your value = %3.3f, True value = %3.3f'%(dydomega2, 2.415))
print('dydbeta1: Your value = %3.3f, True value = %3.3f'%(dydbeta1, 4.830))
print('dydomega1: Your value = %3.3f, True value = %3.3f'%(dydomega1, 4.552))
print('dydbeta0: Your value = %3.3f, True value = %3.3f'%(dydbeta0, -0.646))
print('dydomega0: Your value = %3.3f, Function value = %3.3f, Finite difference value = %3.3f'%(dydomega0, dydomega0_func,dydomega0_fd))

Using this method, we can compute the derivatives quite easily without needing to compute very complicated expressions.  This is exactly the same way that the derivatives of the parameters are computed in the backpropagation algorithm.  In fact, this basically *is* the backpropagation algorithm.