# Recurrent Neural Network

In this logbook, we will implement the forward function of a simple recurrent neural network.


<img src="https://kermorvant.github.io/ml/images/RNN.png" style="width:650px" >


We will implement the forward pass in 2 steps : 

**Steps**:
1. Implement the calculations needed for one time-step of the RNN.
2. Implement a loop over $T_x$ time-steps in order to process all the inputs, one at a time. 


## A simple RNN cell

A Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell. 

<img src="https://kermorvant.github.io/ml/images/rnn_step_forward.png" style="width:700px;height:300px;">

The RNN cell takes as input $x^{\langle t \rangle}$ (current input) and $a^{\langle t - 1\rangle}$ (previous hidden state containing information from the past), and outputs $a^{\langle t \rangle}$ which is given to the next RNN cell and also used to predict $y^{\langle t \rangle}$ 

**Exercise**: Implement the RNN-cell described.

**Instructions**:
1. Compute the hidden state with tanh activation: $a^{\langle t \rangle} = \tanh(W_{aa} a^{\langle t-1 \rangle} + W_{ax} x^{\langle t \rangle} + b_a)$.
2. Using your new hidden state $a^{\langle t \rangle}$, compute the prediction $\hat{y}^{\langle t \rangle} = softmax(W_{ya} a^{\langle t \rangle} + b_y)$.
3. Store $(a^{\langle t \rangle}, a^{\langle t-1 \rangle}, x^{\langle t \rangle}, parameters)$ in cache
4. Return $a^{\langle t \rangle}$ , $y^{\langle t \rangle}$ and cache

We will vectorize over $m$ examples. Thus, $x^{\langle t \rangle}$ will have dimension $(n_x,m)$, and $a^{\langle t \rangle}$ will have dimension $(n_a,m)$. 


In [16]:
import numpy as np
from numpy.testing import *
# import some utils functions

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


In [1]:
def rnn_cell_forward(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell

    
    params: xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    params: a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    params: parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    
    returns: a_next -- next hidden state, of shape (n_a, m)
    returns: yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    returns: cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """
    
    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    
    
    # compute next activation state using the formula given above
    a_next = np.tanh(Waa.dot(a_prev)+Wax.dot(xt)+ba)
    # compute output of the current cell using the formula given above
    yt_pred = softmax(Wya.dot(a_next)+by)  
   
    
    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)
    
    return a_next, yt_pred, cache


In [2]:
np.random.seed(1)
xt = np.random.randn(3,10)
a_prev = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}

a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)
assert_allclose( a_next[4],[ 0.59584544 , 0.18141802,  0.61311866,  0.99808218 , 0.85016201,  0.99980978,-0.18887155,  0.99815551 , 0.6531151  , 0.82872037])
assert_equal(a_next.shape , (5, 10))          
assert_allclose(yt_pred[1],[0.9888161 , 0.01682021, 0.21140899 ,0.36817467 ,0.98988387 ,0.88945212,0.36920224, 0.9966312 , 0.9982559 , 0.17746526])
assert_equal(yt_pred.shape,(2, 10))
print ("all is correct !")

NameError: name 'np' is not defined

## RNN forward pass 

You can see an RNN as the repetition of the cell you've just built. If your input sequence of data is carried over 10 time steps, then you will copy the RNN cell 10 times. This is what is called "unrolling" the RNN. Each cell takes as input the hidden state from the previous cell ($a^{\langle t-1 \rangle}$) and the current time-step's input data ($x^{\langle t \rangle}$). It outputs a hidden state ($a^{\langle t \rangle}$) and a prediction ($y^{\langle t \rangle}$) for this time-step.


<img src="https://kermorvant.github.io/ml/images/RNN.png" style="width:800px;height:300px;">

The input sequence $x = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$  is carried over $T_x$ time steps. The network outputs $y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$. 



**Exercise**: Code the forward propagation of the RNN.

**Instructions**:
1. Create a vector of zeros ($a$) that will store all the hidden states computed by the RNN.
2. Initialize the "next" hidden state as $a_0$ (initial hidden state).
3. Start looping over each time step, your incremental index is $t$ :
    - Update the "next" hidden state and the cache by running `rnn_cell_forward`
    - Store the "next" hidden state in $a$ ($t^{th}$ position) 
    - Store the prediction in y
    - Add the cache to the list of caches
4. Return $a$, $y$ and caches

In [19]:
def rnn_forward(x, a0, parameters):
    """
    Implements the forward propagation of the recurrent neural network.

    params: x -- Input data for every time-step, of shape (n_x, m, T_x).
    params: a0 -- Initial hidden state, of shape (n_a, m)
    params: parameters -- python dictionary containing:
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)


    returns: a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    returns: y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    returns: caches -- tuple of values needed for the backward pass, contains (list of caches, x)
    """
    
    # Initialize "caches" which will contain the list of all caches
    caches = []
    
    # Retrieve dimensions from shapes of x and parameters["Wya"]
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wya"].shape
    
    # initialize "a" and "y" with zeros 
    a = np.zeros((n_a, m, T_x))
    y_pred = np.zeros((n_y, m, T_x))
    
    # Initialize a_next 
    a_next = a0
      
    # loop over all time-steps
    for t in range(T_x):
        # Update next hidden state, compute the prediction, get the cache 
        a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t],a_next , parameters)
        # Save the value of the new "next" hidden state in a 
        a[:,:,t] = a_next
        # Save the value of the prediction in y 
        y_pred[:,:,t] = yt_pred
        # Append "cache" to "caches" 
        caches.append(cache)
        
    
    # store values needed for backward propagation in cache
    caches = (caches, x)
    
    return a, y_pred, caches

In [20]:
np.random.seed(1)
x = np.random.randn(3,10,4)
a0 = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}
a, y_pred, caches = rnn_forward(x, a0, parameters)

assert_allclose(a[4][1] ,  [-0.99999375 , 0.77911235, -0.99861469 ,-0.99833267])
assert_equal(a.shape , (5, 10, 4))
assert_allclose(y_pred[1][3] , [0.79560373, 0.86224861, 0.11118257 ,0.81515947])
assert_equal(y_pred.shape , (2, 10, 4))
assert_allclose(caches[1][1][3] , [-1.1425182 , -0.34934272, -0.20889423 , 0.58662319])
assert_equal(len(caches) ,  2)
print ("all is correct !")

all is correct !
