<a href="https://colab.research.google.com/github/ketanhdoshi/ml/blob/master/examples/RNN_Examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Basic RNN example built from scratch using Tensorflow - [here](https://medium.com/@erikhallstrm/hello-world-rnn-83cd7105b767)

A simple Echo-RNN that remembers the input data and then echoes it after a few time-steps

The input data-matrix, and the current batch batchX_placeholder is in the dashed rectangle. This “batch window” is slided truncated_backprop_length steps to the right at each run, hence the arrow. In our example below batch_size = 3, truncated_backprop_length = 3, and total_series_length = 36. Note that these numbers are just for visualization purposes, the values are different in the code.

The dark squares show '1' values and the light squares show '0' values

![alt text](https://miro.medium.com/max/700/1*n45uYnAfTDrBvG87J-poCA.jpeg)

In [None]:
from __future__ import print_function, division
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

num_epochs = 100
total_series_length = 50000
truncated_backprop_length = 15 # Number of timesteps per sample
state_size = 4
num_classes = 2 # Number of output classes
echo_step = 3  # Number of steps by which to shift the input data for echoing
batch_size = 5 # Number of training samples per batch
num_batches = total_series_length//batch_size//truncated_backprop_length

def generateData():
    # Populate 1D array randomly with values either '0' or '1', each with a
    # probability of 50%
    x = np.array(np.random.choice(2, total_series_length, p=[0.5, 0.5]))
    
    # Shift values of 'x' rightwards by 'echo_step' steps, and set the
    # newly shifted-in values to 0. This is the label (ie. expected output)
    y = np.roll(x, echo_step)
    y[0:echo_step] = 0

    # Reshape 'x' and 'y' to 2D with 'batch_size' rows
    x = x.reshape((batch_size, -1))  # The first index changing slowest, subseries as rows
    y = y.reshape((batch_size, -1))

    return (x, y)

# Initialise the X and Y as placeholders for feeding the input and label data
batchX_placeholder = tf.placeholder(tf.float32, [batch_size, truncated_backprop_length])
batchY_placeholder = tf.placeholder(tf.int32, [batch_size, truncated_backprop_length])

# Initialise the activation state
init_state = tf.placeholder(tf.float32, [batch_size, state_size])

# Initialise the weights and biases as Variables to be learned
# This is [W_aa W_ax] and b_a. The 'state_size + 1' in the weight dimension is because we
# are concatenating W_aa and W_ax
W = tf.Variable(np.random.rand(state_size+1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)), dtype=tf.float32)

# This is W_ay and b_y
W2 = tf.Variable(np.random.rand(state_size, num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,num_classes)), dtype=tf.float32)

**Unstack** the columns (axis = 1) of the batch into a Python list. The RNN will simultaneously be training on different parts in the time-series; steps 4 to 6, 16 to 18 and 28 to 30 in the current batch-example.

![alt text](https://miro.medium.com/max/700/1*f2iL4zOkBUBGOpVE7kyajg.png)

In [None]:
# Unstack columns - split the batch data into adjacent time-steps
# Each of the two variables below is a list that represents a time-series with multiple entries at each step.
inputs_series = tf.unstack(batchX_placeholder, axis=1)
labels_series = tf.unstack(batchY_placeholder, axis=1)

**Forward pass calculation**

Calculate the sum of two affine transforms current_input * Wa + current_state * Wb in the figure below. By concatenating those two tensors you will only use one matrix multiplication. The addition of the bias b is broadcasted on all samples in the batch

![alt text](https://miro.medium.com/max/700/1*fdwNNJ5UOE3Sx0R_Cyfmyg.png)

In [None]:
# Forward pass to do the actual RNN computation
current_state = init_state
states_series = []
for current_input in inputs_series:
    current_input = tf.reshape(current_input, [batch_size, 1])
    input_and_state_concatenated = tf.concat([current_input, current_state], 1)  # Increasing number of columns

    next_state = tf.tanh(tf.matmul(input_and_state_concatenated, W) + b)  # Broadcasted addition
    states_series.append(next_state)
    current_state = next_state

In [None]:
# Calculate loss for the batch, after a fully connected softmax layer from 
# the state to the output which makes the classes one-hot encoded
logits_series = [tf.matmul(state, W2) + b2 for state in states_series] #Broadcasted addition
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]

# logits is of shape [batch_size, num_classes] and labels of shape [batch_size]
losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)

train_step = tf.train.AdagradOptimizer(0.3).minimize(total_loss)

**Visualize** the training - plot the loss over time, show training input, training output and the current predictions by the network on different sample series in a training batch.

In [None]:
def plot(loss_list, predictions_series, batchX, batchY):
    plt.subplot(2, 3, 1)
    plt.cla()
    plt.plot(loss_list)

    for batch_series_idx in range(5):
        one_hot_output_series = np.array(predictions_series)[:, batch_series_idx, :]
        single_output_series = np.array([(1 if out[0] < 0.5 else 0) for out in one_hot_output_series])

        plt.subplot(2, 3, batch_series_idx + 2)
        plt.cla()
        plt.axis([0, truncated_backprop_length, 0, 2])
        left_offset = range(truncated_backprop_length)
        plt.bar(left_offset, batchX[batch_series_idx, :], width=1, color="blue")
        plt.bar(left_offset, batchY[batch_series_idx, :] * 0.5, width=1, color="red")
        plt.bar(left_offset, single_output_series * 0.3, width=1, color="green")

    plt.draw()
    plt.pause(0.0001)

**Run the training** by executing the TensorFlow graph in a session

In [None]:
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    plt.ion()
    plt.figure()
    plt.show()
    loss_list = []

    for epoch_idx in range(num_epochs):
        # New data is generated on each epoch (not the usual way to do it, but 
        # it works in this case since everything is predictable
        x,y = generateData()
        _current_state = np.zeros((batch_size, state_size))

        print("New data, epoch", epoch_idx)

        for batch_idx in range(num_batches):
            start_idx = batch_idx * truncated_backprop_length
            end_idx = start_idx + truncated_backprop_length

            batchX = x[:,start_idx:end_idx]
            batchY = y[:,start_idx:end_idx]

            _total_loss, _train_step, _current_state, _predictions_series = sess.run(
                [total_loss, train_step, current_state, predictions_series],
                feed_dict={
                    batchX_placeholder:batchX,
                    batchY_placeholder:batchY,
                    init_state:_current_state
                })

            loss_list.append(_total_loss)

            if batch_idx%100 == 0:
                print("Step",batch_idx, "Loss", _total_loss)
                plot(loss_list, _predictions_series, batchX, batchY)

plt.ioff()
plt.show()

### Modify basic RNN Example using the terminology and logic in Andrew Ng [assignment](https://github.com/Kulbear/deep-learning-coursera/blob/master/Sequence%20Models/Building%20a%20Recurrent%20Neural%20Network%20-%20Step%20by%20Step%20-%20v2.ipynb)

This is a similar RNN as the example above. It has been re-written so that it is easier to understand and uses the terminology from Andrew Ng's RNN assignment. Also, generated data is re-organised in a more intuitive way.

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

#-------------------------
# Input/Output Data parameters
#-------------------------
m = 30 # Number of training samples per batch
num_batches = 120
num_epochs = 80 # Number of training epochs
echo_steps = 3 # Number of steps by which to shift the input data for echoing

#-------------------------
# RNN cell parameters
#-------------------------
n_x = 1 # Number of input features
n_a = 4 # Number of activation state features
n_y = 2 # Number of output features
T_x = 15 # Number of time steps

#------------------------------------------------------------------
# Generate input and output data in the right format for the problem
# The input data consists of a single long sequence of 0s and 1s. The
# expected output data echoes the same sequence but shifted by 'echo_steps'
#
# Total number of rows = 'm' * num_batches
# Each row has values in 'T_x' time-steps
# Each value in the input data has 'n_x' features
#------------------------------------------------------------------
def generate_data (n_x, T_x, m, num_batches, echo_steps):
  seq_length = num_batches * m * T_x
  
  # Populate 1D array randomly with values either '0' or '1', each with a
  # probability of 50%
  x = np.array(np.random.choice(2, seq_length, p=[0.5, 0.5]))
    
  # Shift values of 'x' rightwards by 'echo_step' steps, and set the
  # newly shifted-in values to 0. This is the label (ie. expected output)
  y = np.roll(x, echo_steps)
  y[0:echo_steps] = 0

  # Reshape 'x' to 3D with 'n_x' columns and 'T_x' depth. Number of rows will be
  # 'm' * num_batches
  x = x.reshape((-1, n_x, T_x))  # The first index changing slowest, subseries as rows
  
  # Reshape 'y' to 2D
  y = y.reshape((-1, T_x))
  
  return (x, y)

#------------------------------------------------------------------
# RNN class
#------------------------------------------------------------------
class RNN(object):
    def __init__(self, m, n_a, n_x, n_y, T_x):
      self.m = m
      self.n_a = n_a
      self.n_x = n_x
      self.n_y = n_y
      self.T_x = T_x

      # Initialise the weights and biases as Variables to be learned
      # Note the shapes of each of these
      self.W_aa = tf.Variable(np.random.rand(n_a, n_a), dtype=tf.float32)
      self.b_a = tf.Variable(np.zeros((1, n_a)), dtype=tf.float32)
  
      self.W_ax = tf.Variable(np.random.rand(n_x, n_a), dtype=tf.float32)
  
      self.W_ay = tf.Variable(np.random.rand(n_a, n_y),dtype=tf.float32)
      self.b_y = tf.Variable(np.zeros((1, n_y)), dtype=tf.float32)
  
      # Initialise the X and Y as placeholders for feeding the input and label data
      self.batchX = tf.placeholder(tf.float32, [m, n_x, T_x])
      self.batchY = tf.placeholder(tf.int32, [m, T_x])

      # Initialise the activation state
      self.a_init = tf.placeholder(tf.float32, [m, n_a])

    #------------------------------------------------------------------
    # A single RNN cell
    #------------------------------------------------------------------
    def _cell (self, a_prev, X_t):
      
      a_next = tf.tanh(tf.matmul(X_t, self.W_ax) + tf.matmul(a_prev, self.W_aa) + self.b_a)
      logit = tf.matmul(a_next, self.W_ay) + self.b_y
      y_pred = tf.nn.softmax(logit)
      
      return (a_next, logit, y_pred)
      
    #------------------------------------------------------------------
    # A complete forward-pass by unrolling the RNN cell by the required
    # number of time-steps
    #------------------------------------------------------------------
    def _forward (self):
      # Unstack columns - split the batch data into adjacent time-steps
      # Each of the two variables below is a list that represents a time-series with multiple entries at each step.
      X_list = tf.unstack(self.batchX, axis=2) # Axis 2 is the time-step (ie. T_x) axis
      Y_list = tf.unstack(self.batchY, axis=1) # Axis 1 is the time-step (ie. T_x) axis
      
      # Initialise the activation state
      a_prev = self.a_init
      logit_list = []

      # Loop through each time-step
      for X_t in X_list:
        a_next, logit, y_pred = self._cell (a_prev, X_t)
        a_prev = a_next
        
        # List of logits from each step
        logit_list.append (logit)
          
      # logits is of shape [batch_size, num_classes] and labels of shape [batch_size]
      losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) for logits, labels in zip(logit_list, Y_list)]
      self.total_loss = tf.reduce_mean(losses)
      self.train_step = tf.train.AdagradOptimizer(0.3).minimize(self.total_loss)

    #------------------------------------
    # Build the RNN
    #------------------------------------
    def build (self):
      self._forward ()
    
    #------------------------------------
    # Train the RNN
    #--------------------------
    def train(self, x, y, num_epochs):
      with tf.Session() as sess:
        tf.global_variables_initializer().run()
        print('Initialized')
        loss_list = []
        
        # Loop through each epoch
        for epoch_idx in range(num_epochs):
          a_0 = np.zeros((self.m, self.n_a))

          print()

          for batch_idx in range(num_batches):
            
            # Select 'm' rows from the data at a time
            start_idx = batch_idx * m
            end_idx = start_idx + m

            batchX = x[start_idx:end_idx, :, :]
            batchY = y[start_idx:end_idx, :]
            #print ('Epoch ', epoch_idx, ', Batch', batch_idx)

            _total_loss, _train_step = sess.run(
                [self.total_loss, self.train_step],
                feed_dict={
                    self.batchX:batchX,
                    self.batchY:batchY,
                    self.a_init:a_0
                })
            
            loss_list.append(_total_loss)

            if batch_idx % 25 == 0:
                print('Epoch ', epoch_idx, 'Batch', batch_idx, "Loss", _total_loss)
                
model = RNN (m, n_a, n_x, n_y, T_x)
model.build()

# NB: Alternately we can generate the data before the RNN, then pass the
#'x' and 'y'to the RNN constructor. The constructor can check the shape of 'x'
# and 'y' and create the RNN accordingly
x,y = generate_data(n_x, T_x, m, num_batches, echo_steps)
model.train(x, y, num_epochs)

### Modify Example to use static_rnn()

### Upgrade the above example to use dynamic_rnn(). Use it for a time-series prediction - Geron pg 643

### Optionally experiment with Variable Input Length - Geron pg 635

### Optionally use a LSTM

### Use Keras with deep multi-layered LSTM with dropout for a real-world application