[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Spinkk/TeachingTensorflow/blob/main/RNNs/RNNs%20as%20temporal%20weight%20sharing.ipynb)

# Basic RNNs as fully connected network architectures with temporally shared weights

In this notebook, we show how a basic RNN that uses a simple recurrent cell (no LSTM or GRU) transforms input by applying the same matrix multiplications at each time step, using the latest output as one of the inputs.

Remember that tf.keras.layers.Dense layers perform a matrix multiplication on their input, so we use these layers to define our RNN cell.

A basic RNN takes its own current hidden state (which is a vector of dimension n_outputs for each example in the batch) and linearly transforms it with a weight matrix, creating a vector of n_outputs dimensions. In addition (quite literally) to that, the RNN cell takes the input at the current time-step t (which is the feature vector in the example for time-step t). Adding the results from the two matrix multiplications together with a bias (and applying a tanh activation to the sum), we obtain the hidden state of the RNN-cell that will then be used at the next time-step t+1, together with the input data feature vector at t+1.

More formally written, the computation of a simple RNN cell at one time-step can be described as follows:

### $$h^{<t+1>} = tanh( W_{hh} h^{<t>} + W_{xh} x^{<t>} + b )$$ 

Where $x^{<t>}$ is the feature input at time-step t, and $W_{xh}$ is a matrix of shape x (the dimensionality of the input feature vector at a single time-step) by h (the dimensionality of the hidden state).


Below we implement this type of an RNN on some input (not inside a tf.keras.Model or tf.keras.Layer but as separate layers since we don't do any model fitting here). First we show how to use tensorflow/keras inbuilt RNN cell and RNN wrapper layers to achieve this.

In [1]:
import tensorflow as tf

In [3]:
with tf.device('/device:cpu:0'):
    
    # 1 batch with 24 time steps of 16 features. The time-dimension is the second after the batch dimension.
    input_shape = (2, 24, 16)

    # The length of the resulting vector (similar to the units argument in Dense layers)
    n_outputs = 3
    
    input_sequence = tf.random.uniform(shape = input_shape)
    
    simple_RNN_cell = tf.keras.layers.SimpleRNNCell(n_outputs)
    
    # return_sequences=False means we only output the final hidden_state.
    RNN = tf.keras.layers.RNN(simple_RNN_cell, return_sequences=False)
    
    output = RNN(input_sequence)
    
    print(f"The output of the RNN, which is the last hidden state is \n{output}\n\n")
    

The output of the RNN, which is the last hidden state is 
[[0.8912661  0.67020696 0.77448636]
 [0.6446345  0.4354966  0.7005535 ]]




# tf.keras.layers.RNN and tf.keras.layers.SimpleRNNCell from scratch

Now we want to look at how we can implement what happens inside the two pre-defined layers that we have used above.

To do so, we create two dense layers carrying the matrices $W_{hh}$ and $W_{xh}$ that are involved in the simple RNN cell. For the bias $b$ we create a separate tf.variable. Notice that we disable the use of a bias in the dense layers and do not use an activation either, because we want a simple matrix multiplication.

With the weight matrices and the bias defined, we can start to iterate over the same input sequence data that we have used before. The initial hidden state of the RNN cell is set to be a vector of zeros before looping over the input sequence.

At each time-step we use the previous hidden-state of the cell and update it, following the equation shown in the beginning of the notebook. The output of this custom RNN is then just the final hidden-state vector.

Notice how we use the same two weight matrices and the same bias for different temporal parts of the input data. This is why RNNs can be thought of as another variant of weight sharing.

In [4]:
with tf.device('/device:cpu:0'):
    ### We've seen the output of the randomly initialized simple RNN.
    ### Now we want to show the computations that were involved by doing the same with dense layers 
    
    ### Remember that dense layers without activation are just matrix multiplication of a weight matrix with input
    
    # We create the Dense layers (that will store the matrices)
    dense_layer_hstate = tf.keras.layers.Dense(n_outputs, activation=None, use_bias=False)
    dense_layer_input = tf.keras.layers.Dense(n_outputs, activation=None, use_bias=False)
    bias = tf.Variable([0. for _ in range(n_outputs)])
    
    # create the initial hidden state
    state = tf.zeros((1,n_outputs), tf.float32)
    
    # iterate over the time-steps
    for t in tf.range(input_shape[1]):
        # on this iteration we use the input at timestep t
        input_t =input_sequence[:,t,:]
        
        # we compute the sum of the input at t matrix multiplied, with the previous state matrix multiplied
        # and an additional bias added.
        x_sum = dense_layer_input(input_t) + dense_layer_hstate(state) + bias
        
        # finally we use tanh as an activation function to update the RNN cell state
        state = tf.nn.tanh(x_sum)

    print(f"The output of the custom RNN that we built with dense layers is: \n{state}\n\n")

The output of the custom RNN that we built with dense layers is: 
[[-0.70980406  0.8229091   0.9880202 ]
 [ 0.01065047  0.912312    0.9980492 ]]




# Copying the weights of the pre-defined RNN to our custom RNN to verify the implementation

To make sure that we do the same computations as in the pre-defined tf.keras.layers.SimpleRNNCell and the wrapper layer (which implements the for loop) tf.keras.layers.RNN, we take the weights that we have used before and assign them to the two dense layers and the bias. We observe that we now get the exact same output - our implementation is correct.

This type of testing can be helpful to verify your custom implementations of the computations involved in the layers.

In [5]:
with tf.device('/device:cpu:0'):
    # Now we want to copy the weights of this RNN to a fully connected model to obtain the same output
    
    RNN_cell_input_weights = RNN.trainable_variables[0]

    RNN_cell_hstate_weights = RNN.trainable_variables[1]
    
    RNN_cell_biases = RNN.trainable_variables[2]
    
    dense_layer_hstate.weights[0].assign(tf.reshape(RNN_cell_hstate_weights, dense_layer_hstate.weights[0].shape))
    dense_layer_input.weights[0].assign(tf.reshape(RNN_cell_input_weights, dense_layer_input.weights[0].shape))
    bias.assign(RNN_cell_biases)
    
    # again we run our custom RNN, this time with the same weights as the tf.keras.layers.RNN version
    
    # same code as above
    state = tf.zeros((1,n_outputs), tf.float32)

    for t in tf.range(input_shape[1]):
        input_t =input_sequence[:,t,:]
        
        x_sum = dense_layer_input(input_t) + dense_layer_hstate(state) + bias

        state = tf.nn.tanh(x_sum)
    
    print(f"With the same weights as the pre-defined RNN, our custom RNN's output is\n {state}\n")
    
    print(f"The outputs of the pre-defined RNN and our custom RNN are the same: {tf.reduce_all(state==output)}")

With the same weights as the pre-defined RNN, our custom RNN's output is
 [[0.8912661  0.67020696 0.77448636]
 [0.6446345  0.4354966  0.7005535 ]]

The outputs of the pre-defined RNN and our custom RNN are the same: True


# A note on using RNNs to return a sequence of hidden-states

Before we set the return_sequences argument to False and we ended up with a single vector as the output of the RNN (the final hidden state). To obtain a sequence from the RNN, we want to return the hidden states from each time-step. This is useful for cases in which we want to translate from one sequence to another as in language translation. To implement this with the pre-defined layers, we use the same cell layer as before but re-instantiate the RNN wrapper layer with the new argument.

In [6]:
with tf.device('/device:cpu:0'):
    # simple_RNN_cell and input_sequence remain unchanged
    
    # return_sequences=True means we output the hidden_states of all time-steps.
    RNN = tf.keras.layers.RNN(simple_RNN_cell, return_sequences=True)
    
    RNN_outputs = RNN(input_sequence)
    
    print(f"The output of the RNN, which is the last hidden state is \n {RNN_outputs}\n\n")

The output of the RNN, which is the last hidden state is 
 [[[ 0.3396979   0.6822263   0.5196427 ]
  [ 0.17712753 -0.30245814  0.40522677]
  [ 0.87390345  0.85211354  0.7118771 ]
  [ 0.65884954 -0.38833636  0.7659966 ]
  [ 0.96329224  0.7859971   0.6243708 ]
  [ 0.515611   -0.76265275  0.6495711 ]
  [ 0.95947945  0.8897012   0.7111566 ]
  [ 0.5496499  -0.50042963  0.3595592 ]
  [ 0.914206    0.78939044  0.8333135 ]
  [ 0.81257606 -0.28883561  0.63706404]
  [ 0.77599657  0.2869477   0.36363408]
  [ 0.8101806   0.6166689   0.5593483 ]
  [ 0.43481672 -0.36114278  0.23310821]
  [ 0.555743    0.41584212  0.6592722 ]
  [ 0.74648666  0.7087062   0.76838   ]
  [ 0.39154133 -0.39427656  0.06196904]
  [ 0.90339917  0.8030102   0.85790867]
  [ 0.31084213 -0.60803187  0.3543192 ]
  [ 0.73040915  0.7899899   0.7256308 ]
  [ 0.6208359  -0.30283618  0.02103382]
  [ 0.15764168  0.57924265  0.94759715]
  [ 0.5922894   0.5579112   0.44225782]
  [ 0.7010376  -0.04795119  0.12987894]
  [ 0.8912661   0.670

# Return sequences from scratch

To implement this in a way that will allow you to also use this in tensorflow models that use the @tf.function decorator, we can't just create a list of the hidden states and append to it - tensorflow does not allow list appends in graph mode. Instead we need to make use of a tf.TensorArray object (this also allows for the case in which we do not know how many time-steps we will process and thus can't know beforehand how many hidden-states we want to output).

A tf.TensorArray object is not just a tensor and thus we need to call the .stack() method on it to obtain the hidden_states that it stores. Since the result has the batch-dimension and the time-dimension switched, we make use of a permuted transpose.

In [7]:
with tf.device('/device:cpu:0'):

    state = tf.zeros((input_sequence.shape[0],n_outputs), tf.float32)

    # initialize the hidden_states TensorArray that we want to output (shape is: batch, time-steps, h_dim)
    hidden_states = tf.TensorArray(dtype=tf.float32, size = input_sequence.shape[1])

    for t in tf.range(input_shape[1]):
        input_t =input_sequence[:,t,:]

        x_sum = dense_layer_input(input_t) + dense_layer_hstate(state) + bias

        state = tf.nn.tanh(x_sum)

        # write the states to the TensorArray
        hidden_states = hidden_states.write(t, state)

    # transpose the sequence of hidden_states from TensorArray accordingly (batch and time dimensions switched)
    custom_RNN_outputs = tf.transpose(hidden_states.stack(), [1,0,2])

    print(f"The output of the custom RNN that we built with dense layers is: \n{custom_RNN_outputs}\n\n")

The output of the custom RNN that we built with dense layers is: 
[[[ 0.3396979   0.6822263   0.5196427 ]
  [ 0.17712753 -0.30245814  0.40522677]
  [ 0.87390345  0.85211354  0.7118771 ]
  [ 0.65884954 -0.38833636  0.7659966 ]
  [ 0.96329224  0.7859971   0.6243708 ]
  [ 0.515611   -0.76265275  0.6495711 ]
  [ 0.95947945  0.8897012   0.7111566 ]
  [ 0.5496499  -0.50042963  0.3595592 ]
  [ 0.914206    0.78939044  0.8333135 ]
  [ 0.81257606 -0.28883561  0.63706404]
  [ 0.77599657  0.2869477   0.36363408]
  [ 0.8101806   0.6166689   0.5593483 ]
  [ 0.43481672 -0.36114278  0.23310821]
  [ 0.555743    0.41584212  0.6592722 ]
  [ 0.74648666  0.7087062   0.76838   ]
  [ 0.39154133 -0.39427656  0.06196904]
  [ 0.90339917  0.8030102   0.85790867]
  [ 0.31084213 -0.60803187  0.3543192 ]
  [ 0.73040915  0.7899899   0.7256308 ]
  [ 0.6208359  -0.30283618  0.02103382]
  [ 0.15764168  0.57924265  0.94759715]
  [ 0.5922894   0.5579112   0.44225782]
  [ 0.7010376  -0.04795119  0.12987894]
  [ 0.8912661 

Now let us verify that the @tf.function decorator works with this.

In [9]:

with tf.device('/device:cpu:0'):
    @tf.function
    def tf_func():

        state = tf.zeros((input_sequence.shape[0],n_outputs), tf.float32)


        # initialize the hidden_states TensorArray that we want to output (shape is: batch, time-steps, h_dim)
        hidden_states = tf.TensorArray(dtype=tf.float32, size=input_sequence.shape[1])

        for t in tf.range(input_shape[1]):
            input_t =input_sequence[:,t,:]

            x_sum = dense_layer_input(input_t) + dense_layer_hstate(state) + bias

            state = tf.nn.tanh(x_sum)

            # write the states to the TensorArray
            hidden_states = hidden_states.write(t, state)

        # transpose the sequence of hidden_states from TensorArray accordingly (batch and time dimensions switched)
        custom_RNN_outputs = tf.transpose(hidden_states.stack(), [1,0,2])
        
        return custom_RNN_outputs
    custom_RNN_outputs = tf_func()
    print(f"The output of the custom RNN that we built with dense layers is: \n{custom_RNN_outputs}\n\n")

The output of the custom RNN that we built with dense layers is: 
[[[ 0.3396979   0.6822263   0.5196427 ]
  [ 0.17712753 -0.30245814  0.40522677]
  [ 0.87390345  0.85211354  0.7118771 ]
  [ 0.65884954 -0.38833636  0.7659966 ]
  [ 0.96329224  0.7859971   0.6243708 ]
  [ 0.515611   -0.76265275  0.6495711 ]
  [ 0.95947945  0.8897012   0.7111566 ]
  [ 0.5496499  -0.50042963  0.3595592 ]
  [ 0.914206    0.78939044  0.8333135 ]
  [ 0.81257606 -0.28883561  0.63706404]
  [ 0.77599657  0.2869477   0.36363408]
  [ 0.8101806   0.6166689   0.5593483 ]
  [ 0.43481672 -0.36114278  0.23310821]
  [ 0.555743    0.41584212  0.6592722 ]
  [ 0.74648666  0.7087062   0.76838   ]
  [ 0.39154133 -0.39427656  0.06196904]
  [ 0.90339917  0.8030102   0.85790867]
  [ 0.31084213 -0.60803187  0.3543192 ]
  [ 0.73040915  0.7899899   0.7256308 ]
  [ 0.6208359  -0.30283618  0.02103382]
  [ 0.15764168  0.57924265  0.94759715]
  [ 0.5922894   0.5579112   0.44225782]
  [ 0.7010376  -0.04795119  0.12987894]
  [ 0.8912661 

2021-11-29 19:55:04.471725: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Finally, we again verify that the outputs with return_sequences=True match between our custom implementation of the computations involved and the pre-defined RNN-wrapper and cell. Since we did not re-initialize the weights, the outputs should still be the same.

In [10]:
print(tf.reduce_all(custom_RNN_outputs == RNN_outputs).numpy())

True
