# <center>Understanding Recurrent Neural Networks in Tensorflow</center>

Are you in a position where you feel like you've a good theoretical knowlege on these Deep Learning models, but when you look at the API docs, everything seems completely different (Has your brain underfit)? If so, then this notebook is just for you! It maps the concepts and equations that you know, to the corresponding APIs in Tensorflow. Let's get started!

**Note**: It is assumed that the reader has a reasonably good theoretical knowledge on these topics.

## Table of Contents
1.  [Visualizing the RNN](#visualize)
2.  [Clarifying the Dimensions](#dims)
3.  [The SimpleRNNCell API](#simplernncell)
4.  [The SimpleRNN API](#simplernn)
5.  [The StackedRNNCells API](#stacked)
6.  [The RNN API](#rnn)

## Visualizing the RNN <a id="visualize"></a>
<img align="left" src="./images/RNN_rolled.png" width="500px" style="margin-right: 50px;"/>

### Notations
-   $ X_{t} $: Input at time $ t $
-   $ h_{t} $: State at time $ t $
-   $ o_{t} $: Output at time $ t $
-   $ U $: Input weights
-   $ W $: State weights 
-   $ b $: Bias (not shown)

-   `feature`: Number of input-features 
-   `timestamp` or `T`: Number of time-steps present in an input sample
-   `batch`: Total number of input samples fed to the model
<br clear="left">

**Note**
-   The last 3 notations are in accordance with the Tensorflow API documentation
-   The output of an RNN cell is the same as the newly calculated state. Hence the output block will **not be shown** in the upcoming diagrams.

### Equations

$ h_t = Wh_{t-1} + UX_t + b, $ for $ 1 \le t \le T $

### The unrolled version
<img src="./images/RNN_unrolled.png" style="margin-right: 50px;"/>

## Clarifying the dimensions <a id="dims"></a>
Without loss of generality, let the $ i^{th} $ input sample be denoted as $ X^{(i)} $.

$ X^{(i)} = \begin{bmatrix} X^{(i)}_1 \\ X^{(i)}_2 \\ \vdots \\ X^{(i)}_T \end{bmatrix} $

Each $ X^{(i)}_t $ is of size `feature`, which means that each sample $ X^{(i)} $ has dimensions `feature x T` or `feature x timestamp`

The dimensions of $ h_t $ are specified by the user with the `units` parameter.

Let's create some sample input to play with the APIs

In [1]:
import numpy as np
import tensorflow as tf

batch = 4 
timestamp = 5 
feature = 1

inputs_to_play = np.random.random(
    size=(batch, timestamp, feature)
)

## The `SimpleRNNCell` API <a id="simplernncell"></a>
This is an RNN *cell* , which when *called*, takes 2 arguments - $ X_{t} $ and $ h_{t-1} $ to produce the output and the next state $ h_{t} $.
As previously stated, the output is the same as the new hidden state

Because this is a *cell*, it performs the computation for only the provided timestep. Thus, the input to this API is of shape `[batch, feature]`

In [2]:
output_units = 2

# create the RNN cell
rnn_cell = tf.keras.layers.SimpleRNNCell(
    units=output_units,  # this refers to the number of output units in the RNN
    activation=None,
)

In [3]:
# define the inputs
X_1 = inputs_to_play[:, 0, :]  # consider the samples at timestamp 0
h_0 = tf.constant(
    np.ones(shape=(batch, output_units), dtype=np.float32)
),  # let the previous states of all samples be 0-vectors

# "call" the RNN cell on these inputs
output, h_1 = rnn_cell(
    inputs=X_1,
    states=h_0,
)

print(output)
# same as `print(h_1)`

tf.Tensor(
[[0.47837692 0.9902846 ]
 [0.6551285  1.0892183 ]
 [0.7130833  1.1216575 ]
 [0.6351037  1.0780097 ]], shape=(4, 2), dtype=float32)


Below is a figure that shows an illustration of `SimpleRNNCell`.
Taking the above case, each $ h_t^i $ is of size 2 since the number of output units is 2. Thus, the final output has shape `(4, 2)` (4 denoting the number of samples (`n` in the diagram)) 

<img src="./images/simpleRNNCell.gif" width="500px" />

## The `SimpleRNN` API <a id="simplernn"></a>
The `SimpleRNNCell` API computes the state / output for only the given timestamp.
But what if we want the final result, (i.e) the one obtained after performing the required computation across all timestamps of the input? 

Behold, the `SimpleRNN` which internally uses a `SimpleRNNCell` to keep computing the next state until the last timestamp is reached

In [113]:
simple_rnn = tf.keras.layers.SimpleRNN(
    units=output_units,
    activation='linear'
)

output = simple_rnn(inputs_to_play) 

print(output)

tf.Tensor(
[[-2.3202045  1.1348801]
 [-2.1530511  0.9799179]
 [-3.1703815  1.5421298]
 [-3.1401954  1.3645253]], shape=(4, 2), dtype=float32)


Here is a simple illustration

<img src="./images/simpleRNN.gif" width="600px" />

## Where are the weights? <a id="weights"></a>
Use the `get_weights()` method to get the weights of the RNN

In [114]:
weights = simple_rnn.get_weights()
for w in weights:
    print(w.shape)

(1, 2)
(2, 2)
(2,)


On exploring the dimensions of the 3 weight matrices, we arrive at the following conclusion.

In [115]:
U = weights[0]
W = weights[1]
b = weights[2]

print("U = \n{}".format(U))
print("W = \n{}".format(W))
print("b = \n{}".format(b))

U = 
[[-1.1416765  0.6327354]]
W = 
[[ 0.99908745  0.04271416]
 [-0.04271416  0.99908733]]
b = 
[0. 0.]


If you use these weights and follow through with the equations, you will end up with the answer that was previously calculated in our `SimpleRNN` example (Yes, I've tried it and it works :) )

## The `StackedRNNCells` API <a id="stacked"></a>
What if you wanted the output of one cell to be given as input to another cell? 

<img src="./images/stackedRNNCells.png" width="600px" />

Yes that's possible with the `StackedRNNCells` API. 
This API wraps given the RNN cells so that it behaves as a single RNN cell.
Note here that this again just a *cell*, so the computation is performed for only a single timestamp.

In [4]:
stacked_rnn = tf.keras.layers.StackedRNNCells(
    [
        tf.keras.layers.SimpleRNNCell(units=1),
        tf.keras.layers.SimpleRNNCell(units=2),
    ]
)

Note that here we have to define 2 initial states - one for each cell

In [7]:
# new states of both cells are returned
h_1

(<tf.Tensor: shape=(4, 1), dtype=float32, numpy=
 array([[0.71947163],
        [0.3826521 ],
        [0.2337083 ],
        [0.43024176]], dtype=float32)>,
 <tf.Tensor: shape=(4, 2), dtype=float32, numpy=
 array([[-0.6549138 ,  0.69779426],
        [-0.39431095,  0.42925397],
        [-0.24925856,  0.27320802],
        [-0.4371829 ,  0.4746568 ]], dtype=float32)>)

In [148]:
# you can again play with the weights if you want!
stacked_rnn_weights = stacked_rnn.get_weights()

U1 = stacked_rnn_weights[0]
W1 = stacked_rnn_weights[1]
b1 = stacked_rnn_weights[2]

U2 = stacked_rnn_weights[3]
W2 = stacked_rnn_weights[4]
b2 = stacked_rnn_weights[5]

## The `RNN` API <a id="rnn"></a>
The mother of all RNN APIs! This API can be used to wrap any *cell* into an RNN encapsulation so that the input is applied across all timestamps. 
Just like how `SimpleRNN` uses `SimpleRNNCell` for its internal computation, the `RNN` API takes *any* kind of cell (`SimpleRNNCell`, `StackedRNNCell`, ...) and uses that cell to compute across all timestamps of the input

<img src="./images/RNN.gif" width="600px">

In [16]:
rnn = tf.keras.layers.RNN(
    stacked_rnn
)

output = rnn(
    inputs_to_play,
)

output

<tf.Tensor: shape=(4, 2), dtype=float32, numpy=
array([[ 0.08811959, -0.6183534 ],
       [-0.29053357, -0.10664143],
       [-0.3485677 , -0.37520707],
       [-0.56441534,  0.13226347]], dtype=float32)>