# Backpropogation in an RNN

To get a better understanding of how backpropogation works in an RNN, I will walk through the derivative of the loss function with respect to a paremeter. Below is an illustration of how a simple RNN works, unrolled over four time steps.

![RNN](pt5_rnn.png)

Let's start by writing out the equations for $z_t$, $h_t$, $o_t$, $\hat{y}_t$, and $L_t$. Assume the loss function is cross entropy loss and the activation function is the hyperbolic tangent function.

$$
z_t = W_{xh}x_t + W_{hh}h_{t-1} + b_h
$$

$$
h_t = \tanh(z_t)
$$

$$
o_t = W_{hy}h_t + b_y
$$

$$
\hat{y}_t = \text{softmax}(o_t)
$$

$$
L_t(y,\hat{y}) = -y_t*log(\hat{y}_t)
$$

That's a lot of terms. Let's simplify this by expressing $h_t$ in terms of $x_t$ and $h_{t-1}$.

$$
h_t = \tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)
$$

Let's also simplify by expressing $\hat{y}_t$ in terms of $h_t$.

$$
\hat{y}_t = \text{softmax}(W_{hy}h_t + b_y)
$$

In order to get the loss over all time steps, we need to sum the loss over all time steps.

$$
L = \sum_{t=1}^T L_t
$$

$$
L = L_1 + L_2 + L_3 + L_4
$$

Okay, now we're ready to write out the derivative of the loss function with respect to $W_{xh}$. It will be a sum of the derivatives for each time step.

$$
\frac{\partial L}{\partial W_{xh}} = \frac{\partial L_1}{\partial W_{xh}} + \frac{\partial L_2}{\partial W_{xh}} + \frac{\partial L_3}{\partial W_{xh}} + \frac{\partial L_4}{\partial W_{xh}}
$$

Let's start with $L_1$ and work our way to $L_4$. 

$$
\frac{\partial L_1}{\partial W_{xh}} = \frac{\partial L_1}{\partial \hat{y}_1} \frac{\partial \hat{y}_1}{\partial h_1} \frac{\partial h_1}{\partial W_{xh}}
$$

Now let's write out the derivative of $L_2$ with respect to $W_{xh}$. This will depend on $h_2$ and $h_1$.

$$
\frac{\partial L_2}{\partial W_{xh}} = \frac{\partial L_2}{\partial \hat{y}_2} \frac{\partial \hat{y}_2}{\partial h_2} \frac{\partial h_2}{\partial W_{xh}} + \frac{\partial L_2}{\partial \hat{y}_2} \frac{\partial \hat{y}_2}{\partial h_2} \frac{\partial h_2}{\partial h_1} \frac{\partial h_1}{\partial W_{xh}}
$$

Let's keep this going for $L_3$, which will have terms for $h_3$, $h_2$, and $h_1$.

$$
\frac{\partial L_3}{\partial W_{xh}} = \frac{\partial L_3}{\partial \hat{y}_3} \frac{\partial \hat{y}_3}{\partial h_3} \frac{\partial h_3}{\partial W_{xh}} + \frac{\partial L_3}{\partial \hat{y}_3} \frac{\partial \hat{y}_3}{\partial h_3} \frac{\partial h_3}{\partial h_2} \frac{\partial h_2}{\partial W_{xh}} + \frac{\partial L_3}{\partial \hat{y}_3} \frac{\partial \hat{y}_3}{\partial h_3} \frac{\partial h_3}{\partial h_2} \frac{\partial h_2}{\partial h_1} \frac{\partial h_1}{\partial W_{xh}}
$$

Finally, let's write out the derivative of $L_4$ with respect to $W_{xh}$. This will have terms for $h_4$, $h_3$, $h_2$, and $h_1$.

$$
\frac{\partial L_4}{\partial W_{xh}} = \frac{\partial L_4}{\partial \hat{y}_4} \frac{\partial \hat{y}_4}{\partial h_4} \frac{\partial h_4}{\partial W_{xh}} + \frac{\partial L_4}{\partial \hat{y}_4} \frac{\partial \hat{y}_4}{\partial h_4} \frac{\partial h_4}{\partial h_3} \frac{\partial h_3}{\partial W_{xh}} + \frac{\partial L_4}{\partial \hat{y}_4} \frac{\partial \hat{y}_4}{\partial h_4} \frac{\partial h_4}{\partial h_3} \frac{\partial h_3}{\partial h_2} \frac{\partial h_2}{\partial W_{xh}} + \frac{\partial L_4}{\partial \hat{y}_4} \frac{\partial \hat{y}_4}{\partial h_4} \frac{\partial h_4}{\partial h_3} \frac{\partial h_3}{\partial h_2} \frac{\partial h_2}{\partial h_1} \frac{\partial h_1}{\partial W_{xh}}
$$

Simplifying this into a summation, we get the following.

$$
\frac{\partial L}{\partial W_{xh}} = \sum_{t=1}^4 \sum_{k=1}^4\frac{\partial L_t}{\partial \hat{y}_t} \frac{\partial \hat{y}_t}{\partial h_t} \frac{\partial h_t}{\partial h_k} \frac{\partial h_k}{\partial W_{xh}}
$$

Expressed generically for a series of $T$ time steps, we get the following.

$$
\frac{\partial L}{\partial W_{xh}} = \sum_{t=1}^T \sum_{k=1}^T\frac{\partial L_t}{\partial \hat{y}_t} \frac{\partial \hat{y}_t}{\partial h_t} \frac{\partial h_t}{\partial h_k} \frac{\partial h_k}{\partial W_{xh}}
$$




## Export to HTML

In [1]:
# export to HTML for webpage
import os
os.system('jupyter nbconvert --to html pt5_derivatives.ipynb --HTMLExporter.theme=dark')

[NbConvertApp] Converting notebook pt5_derivatives.ipynb to html
[NbConvertApp] Writing 273444 bytes to pt5_derivatives.html


0