# Vanilla RNNs, GRUs and the `scan` function

In [1]:
import numpy as np 
from numpy import random
from time import perf_counter

In [2]:
def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

## Forward method for Vanilla RNNs and GRUs
- Embedding Size (`emb`): 128
- Hidden state size(`h_dim`): (16, 1)

In [4]:
random.seed(10)
emb = 128
T = 256
h_dim = 16
h_0 = np.zeros((h_dim, 1))

# Random initialization of weights and biases
w1 = random.standard_normal((h_dim, emb+h_dim))
w2 = random.standard_normal((h_dim, emb+h_dim))
w3 = random.standard_normal((h_dim, emb+h_dim))

b1 = random.standard_normal((h_dim, 1))
b2 = random.standard_normal((h_dim, 1))
b3 = random.standard_normal((h_dim, 1))

X = random.standard_normal((T, emb, 1))
weights = [w1, w2, w3, b1, b2, b3]

## Vanilla RNN
Structure of Vanilla RNN
<img src="RNN.PNG" width="400"/>
\begin{equation}
h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)
\label{eq: htRNN}
\end{equation}
    
\begin{equation}
\hat{y}^{<t>}=g(W_{yh}h^{<t>} + b_y)
\label{eq: ytRNN}
\end{equation}

where $[h^{<t-1>},x^{<t>}]$ means that $h^{<t-1>}$ and $x^{<t>}$ are concatenated together. In the next cell we provide the implementation of the forward method for a vanilla RNN. 

In [6]:
def forward_V_RNN(inputs, weights):
    x, h_t = inputs
    
    # Weights
    wh, _, _, bh, _, _ = weights
    
    # New hidden state
    h_t = np.dot(wh, np.concatenate([h_t, x])) + bh
    h_t = sigmoid(h_t)
    
    return h_t, h_t

$\hat{y}^{<t>}$ is omitted for simplicity

## Forward method for GRUs
<img src="GRU.PNG" width="400"/>


- GRUs have 2 more gates
    - Relevance Gate $\Gamma_r$
    - Update Gate $\Gamma_u$
- They controll how the hidden state $h^{<t>}$ is updated on every time step
- With these gates, GRUs are capable of keeping relevant information in the hidden state even for long sequences
- The equations needed for the forward methods in GRUs are here

$$\Gamma_r = \sigma\left(W_r[h^{<t-1>}, x^{<t>}] + b_r\right)$$
$$\Gamma_u = \sigma\left(W_u[h^{<t-1>}, x^{<t>}] + b_u\right)$$
$$c^{<t>} = \tanh\left(W_h[\Gamma_r * h^{<t-1>}, x^{<t>}] + b_h\right)$$
$$h^{<t>} = \Gamma_u * c^{<t>} + \left(1 - \Gamma_u \right) * h^{<t-1>} $$

In [9]:
def forward_GRU(inputs, weights):
    x, h_t = inputs
    wu, wr, wc, bu, br, bc = weights
    
    z = np.dot(wr, np.concatenate([h_t, x])) + br
    r = sigmoid(z)
    
    z = np.dot(wu, np.concatenate([h_t, x])) + bu
    u = sigmoid(z)
    
    z = np.dot(wc, np.concatenate([r * h_t, x])) + bc
    c = np.tanh(z)
    
    h_t = u * c + (1 - u) * h_t
    return h_t, h_t

In [10]:
forward_GRU([X[1],h_0], weights)[0]

array([[ 9.77779014e-01],
       [-9.97986240e-01],
       [-5.19958083e-01],
       [-9.99999886e-01],
       [-9.99707004e-01],
       [-3.02197037e-04],
       [-9.58733503e-01],
       [ 2.10804828e-02],
       [ 9.77365398e-05],
       [ 9.99833090e-01],
       [ 1.63200940e-08],
       [ 8.51874303e-01],
       [ 5.21399924e-02],
       [ 2.15495959e-02],
       [ 9.99878828e-01],
       [ 9.77165472e-01]])

### Scan Function

In [11]:
# Forward propagation of RNNs
def scan(fn, elems, weights, h_0=None): 
    h_t = h_0
    ys = []
    for x in elems:
        y, h_t = fn([x, h_t], weights)
        ys.append(y)
        
    return ys, h_t

## RNNs vs GRUs
You have already seen how forward propagation is computed for vanilla RNNs and GRUs. As a quick recap, you need to have a forward method for the recurrent cell and a function like `scan` to go through all the elements from a sequence using a forward method. You saw that GRUs performed more computations than vanilla RNNs, and you can check that they have 3 times more parameters. In the next two cells, we compute forward propagation for a sequence with 256 time steps (`T`) for an RNN and a GRU with the same hidden state `h_t` size (`h_dim`=16).  

In [13]:
tic = perf_counter()
ys, h_T = scan(forward_V_RNN, X, weights, h_0)
toc = perf_counter()
RNN_time = (toc - tic) * 1000
print(f'It took {RNN_time:.2f}ms to run the forward method for the vanilla RNN')


It took 4.54ms to run the forward method for the vanilla RNN


In [15]:
tic = perf_counter()
ys, h_T = scan(forward_GRU, X, weights, h_0)
toc = perf_counter()
GRU_time = (toc - tic) * 1000
print(f'It took {GRU_time:.2f}ms to run the forward method for the vanilla GRU')


It took 9.00ms to run the forward method for the vanilla GRU
