# Recurrent Neural Networks 

These are the deep learning models with a feedback mechanism, or we can say that the output layer is added to the next input and fed back to the same layer. 
 
This type of neural networks are helpful in solving the issues of maintaining context for sequential data, like Stock, Music, weather etc. At each iterative step, the model takes an input and the current state of the network, and produces an output and a new state that is again fed to the network.

<img src="rnn.jpeg" height="450" width="450">

Recurrent Networks are very sensitive to changes in their parameters. Since the model is complex and deep, the model may face problems like Exploding Gradient or Vanishing Gradient. For an intuition say we have a neural network of 500 layers and we need to pass an value of 1.01 through the network then (1.01)^500 = 144.77, means we have sent 1.01 at one end and got 144.77!!, now that what we call exploding gradient problem. And if we've sent 0.99 then, (0.99)^500 = 0.00657 the value completely diminished, that is vanishing gradient.
    
To solve these types of problems a new method was proposed, for remembering important data and forgetting usless data, called Long short term memory or LSTM.

# Long Short-Term Memory

The LSTM are more complex models used for remembering important data for long time, and forgetting useless data for better preservance of context. The LSTM cell function by different logistic gates and those are responsible for maintaing the data. One is for inputting the data, one is for outputting the data and other is to keep or forget the data depending on the need of neural network.


<img src = "lstm.png" height="400" width="600">

## Simple LSTM using Tensorflow

In [24]:
import tensorflow as tf
import numpy as np

In [25]:
tf.reset_default_graph()
sess = tf.Session()

Now we will create a network having only one LSTM cell. We have to pass 2 elements to LSTM, the previous output and state, (h and c). Therefore we initialize a state vector, state. Here, state is a tuple with 2 elements, each one is of size [1 x 4], onr for passing previous output to next time step and another for passing previous state to next time step.

In [26]:
LSTM_CELL_SIZE = 4 # output dimension

lstm_cell = tf.nn.rnn_cell.LSTMCell(LSTM_CELL_SIZE, state_is_tuple = True, name="basic_lstm_cell")
state = (tf.zeros([1,LSTM_CELL_SIZE]),) * 2
state

(<tf.Tensor 'zeros:0' shape=(1, 4) dtype=float32>,
 <tf.Tensor 'zeros:0' shape=(1, 4) dtype=float32>)

Now for input:
 
here we will take batch size of 1 and sequence length = 6

In [27]:
input_data = tf.constant([[1,2,3,1,2,3]],dtype=tf.float32)
print (sess.run(input_data))

[[1. 2. 3. 1. 2. 3.]]


Pass the input to the LSTM 

In [28]:
with tf.variable_scope("LSTM_sample1"):
    output, state_new = lstm_cell(input_data, state)
    
sess.run(tf.global_variables_initializer())
print (sess.run(state_new))

LSTMStateTuple(c=array([[-0.4775696 ,  0.6253809 ,  0.28827164,  0.87296957]],
      dtype=float32), h=array([[-0.04009858,  0.28063062,  0.07246678,  0.43371633]],
      dtype=float32))


In [29]:
print(sess.run(output))

[[-0.04009858  0.28063062  0.07246678  0.43371633]]


## Stacked LSTM

Lets build a 2 layer LSTM

In [30]:
sess = tf.Session()

In [31]:
input_dimension = 6

Stacked LSTM cell:

In [32]:
cells = []

First layer

In [33]:
LSTM_CELL_SIZE_1 = 4 # 4 hidden nodes
cell_1 = tf.nn.rnn_cell.LSTMCell(LSTM_CELL_SIZE_1)
cells.append(cell_1)

Second layer

In [34]:
LSTM_CELL_SIZE_2 = 5 # 5 hidden nodes
cell_2 = tf.nn.rnn_cell.LSTMCell(LSTM_CELL_SIZE_2)
cells.append(cell_2)

Multi-layer LSTM, it takes multiple single layer LSTM to create a multilayer stacked LSTM model

In [35]:
stacked_lstm = tf.contrib.rnn.MultiRNNCell(cells)

Creating the RNN from stack_lstm

In [36]:
#Batch size x time steps x features
data = tf.placeholder(tf.float32, [None, None, input_dimension])
output, state = tf.nn.dynamic_rnn(stacked_lstm, data, dtype = tf.float32)

Say the input sequence length is 3, and the dimensinoality of the inputs is 6. The input should be a Tensor of shape: [batch_size, max_time, dimension] here it is (2,3,6)

Input: [batch_size x time_steps x features]

In [38]:
input_data = [[[1,2,3,4,5,6], [1,2,3,5,6,4],[1,2,1,2,5,6]], [[1,2,3,4,1,2], [1,1,2,2,3,4],[4,5,6,1,2,3]]]

Sending the input to network

In [40]:
sess.run(tf.global_variables_initializer())
sess.run(output, feed_dict={data: input_data})

array([[[-0.033536  ,  0.01661026, -0.05789775, -0.04315641,
          0.0637572 ],
        [-0.05706692,  0.09019572, -0.10040656, -0.04486463,
          0.16141193],
        [-0.09696683,  0.1301327 , -0.14138485, -0.07254476,
          0.19495438]],

       [[-0.05439714, -0.02239576, -0.04589338, -0.06350569,
          0.02007721],
        [-0.10373282, -0.03095715, -0.09698373, -0.10114256,
          0.05825395],
        [-0.155613  , -0.0603398 , -0.13643041, -0.14622346,
          0.04277685]]], dtype=float32)

Here we have the output which is of the shape (2,3,5) == 2 batches, 3 elements in sequence and the dimensionality of output