# RNN(Recurrent Neural Networks)
### RNNs are a type of neural network which is used basically for sequential data. Especially when the order of the data points plays an import role. 
### One big example is time series, where on the basis of previous data one has to predict the output value of future time step. Apart from that, automated text generation by analyzing the sequence. Can be used for images but CNNs are preferred having better conclusions with spatial data, anyways any spatial data is also a sequential data.
### Input to RNN at each time step contains 2 things:
#### 1) Current state values/vectors
#### 2) State vector is an encoded memory containing the learnings from previous time steps(for initial state it's set to 0) 

# Steps:
### 1) Gather data
### 2) input data = batch_size * features * time steps
### 3) arrange them according to different time steps i.e. for each time step batch_size*features
### 4) Set state of dimensions batch_size*state_size
### 5) Make initial state's matrix values 0
### 6) For each time step concatenate current state and current input by columns to obtain the mix of current input data and past learnings
### 7) Step 6 in a loop where compute logits with first layer weights and squash them to non linearity with tanh or softmax to obtain the new state ahead
### 8) In step 7, save the states obtained in a list and update current state with the new state obtained by above process in every iteration
### 9) Once the hidden layer set is over.
### 10) Take the final layer of each RNN units and subject them to softmax separately to obtain prediction probability
### 11) Calculate separate losses for each RNN units
### 12) Backprop the total average loss using Adagrad / Gradient Descent etc.

In [2]:
#Step 0 : Load dependencies
from IPython.display import Image
from IPython.core.display import HTML 
from __future__ import print_function, division
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

# Basic flowcharts showing RNN structure

In [3]:
Image(url= "https://cdn-images-1.medium.com/max/1600/1*UkI9za9zTR-HL8uM15Wmzw.png")

In [4]:
Image(url="http://d3kbpzbmcynnmx.cloudfront.net/wp-content/uploads/2015/09/rnn.jpg")

In [5]:
Image(url="http://karpathy.github.io/assets/rnn/charseq.jpeg")   #very intuitive with example

In [6]:
#Step - 1 : hyperparameters
num_epochs = 100 #number of iterations/learnings where each epoch would be subjected to a new batch
total_series_length = 50000
truncated_backprop_length = 15
state_size = 4 #number of neurons in our hidden layer
num_classes = 1
echo_step = 3
batch_size = 5
num_batches = total_series_length//batch_size//truncated_backprop_length

In [7]:
#Step - 2 : Collect data(here:generate data)
def generateData():
    #0,1, 50K samples, 50% chance each chosen
    x = np.random.choice(2,total_series_length,p=[0.5,0.5])
    #x = np.random.choice(2,10,p=[0.5,0.5])
    y = np.roll(x,echo_step)  #shift echo_step steps to the right
    x = x.reshape([batch_size,-1])
    y = y.reshape([batch_size,-1])
    
    return (x,y)

data = generateData()

In [8]:
data[0].shape #x shape

(5, 10000)

In [9]:
data[1].shape #y shape

(5, 10000)

In [10]:
#Step- 3 : Build the Model
batchX_placeholder = tf.placeholder(tf.float32,[batch_size,truncated_backprop_length])
batchY_placeholder = tf.placeholder(tf.int32,[batch_size,truncated_backprop_length])
#Also the RNN-state is supplied in a placeholder, 
#which is saved from the output of the previous run 
#this state placeholder is the key 
init_state = tf.placeholder(tf.float32,[batch_size,state_size])

In [11]:
#declare weights and biases
#here: 3 layer recurrent neural net with 1 hidden layer
w = tf.Variable(np.random.rand(state_size+1,state_size),dtype=tf.float32)
b = tf.Variable(np.zeros((1,state_size)),dtype=tf.float32)

w2 = tf.Variable(np.random.rand(state_size,num_classes),dtype=tf.float32)
b2 = tf.Variable(np.zeros((1,num_classes)),dtype=tf.float32)

# Let's start building the RNN model

In [12]:
# first split the data into adjacent time steps
# unpack columns (each array for a column)
input_series = tf.unstack(batchX_placeholder,axis = 1)
label_series = tf.unstack(batchY_placeholder,axis = 1)

In [13]:
#forward pass
#state placeholder
current_state = init_state
#series of states through time
states_series = []


#for each set of inputs
#forward pass through the network to get new state value
#store all states in memory
for current_input in input_series:
    current_input = tf.reshape(current_input,[batch_size,1])  #format the input
    input_and_state_concatenated = tf.concat(axis=1,values=[current_input,current_state]) 
    #above mixing input and state data therefore increase in number of columns
    
    #now perform matrix multiplication between weights and input, add bias
    #squash with a nonlinearity, for probabiolity value
    next_state = tf.nn.tanh(tf.matmul(input_and_state_concatenated,w)+b)
    states_series.append(next_state)
    current_state = next_state

In [14]:
Image(url= "https://cdn-images-1.medium.com/max/1600/1*fdwNNJ5UOE3Sx0R_Cyfmyg.png")

You may wonder the variable name truncated_backprop_length is supposed to mean. When a RNN is trained, it is actually treated as a deep neural network with reoccurring weights in every layer. These layers will not be unrolled to the beginning of time, that would be too computationally expensive, and are therefore truncated at a limited number of time-steps. In our sample schematics above, the error is backpropagated three steps in our batch

In [15]:
#calculate loss and minimize it

#calculate loss
#second part of forward pass
#logits short for logistic transform
logit_series = [tf.matmul(x,w2)+b2 for x in states_series]

#apply softmax nonlinearity for output probability
predictions_series = [tf.nn.softmax(logits) for logits in logit_series]

label_series = [tf.reshape(label,[5,1]) for label in label_series]

#measure loss, calculate softmax again on logits, then compute cross entropy
#measures the difference between two probability distributions
#this will return A Tensor of the same shape as labels and of the same type as logits 
#with the softmax cross entropy loss.
losses = [tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels) for logits,labels in zip(logit_series,label_series)]

#computes average, one value
total_loss = tf.reduce_mean(losses)

#use adagrad to minimize with .3 learning rate
#minimize it with adagrad, not SGD
#One downside of SGD is that it is sensitive to
#the learning rate hyper-parameter. When the data are sparse and features have
#different frequencies, a single learning rate for every weight update can have
#exponential regret.
#Some features can be extremely useful and informative to an optimization problem but 
#they may not show up in most of the training instances or data. If, when they do show up, 
#they are weighted equally in terms of learning rate as a feature that has shown up hundreds 
#of times we are practically saying that the influence of such features means nothing in the 
#overall optimization. it's impact per step in the stochastic gradient descent will be so small 
#that it can practically be discounted). To counter this, AdaGrad makes it such that features 
#that are more sparse in the data have a higher learning rate which translates into a larger 
#update for that feature
#sparse features can be very useful.
#Each feature has a different learning rate which is adaptable. 
#gives voice to the little guy who matters a lot
#weights that receive high gradients will have their effective learning rate reduced, 
#while weights that receive small or infrequent updates will have their effective learning rate increased. 
#great paper http://seed.ucsd.edu/mediawiki/images/6/6a/Adagrad.pdf
train_step = tf.train.AdagradOptimizer(0.3).minimize(total_loss)

In [16]:
%pylab inline
#Step 3 Training the network
with tf.Session() as sess:
    #we stupidly have to do this everytime, it should just know
    #that we initialized these vars. v2 guys, v2..
    sess.run(tf.initialize_all_variables())
    #interactive mode
    plt.ion()
    #initialize the figure
    plt.figure()
    #show the graph
    plt.show()
    #to show the loss decrease
    loss_list = []

    for epoch_idx in range(num_epochs):
        #generate data at eveery epoch, batches run in epochs
        x,y = generateData()
        #initialize an empty hidden state
        _current_state = np.zeros((batch_size, state_size))

        print("New data, epoch", epoch_idx)
        #each batch
        for batch_idx in range(num_batches):
            #starting and ending point per batch
            #since weights reoccuer at every layer through time
            #These layers will not be unrolled to the beginning of time, 
            #that would be too computationally expensive, and are therefore truncated 
            #at a limited number of time-steps
            start_idx = batch_idx * truncated_backprop_length
            end_idx = start_idx + truncated_backprop_length

            batchX = x[:,start_idx:end_idx]
            batchY = y[:,start_idx:end_idx]
            
            #run the computation graph, give it the values
            #we calculated earlier
            _total_loss, _train_step, _current_state, _predictions_series = sess.run(
                [total_loss, train_step, current_state, predictions_series],
                feed_dict={
                    batchX_placeholder:batchX,
                    batchY_placeholder:batchY,
                    init_state:_current_state
                })

            loss_list.append(_total_loss)

            if batch_idx%100 == 0:
                print("Step",batch_idx, "Loss", _total_loss)

Populating the interactive namespace from numpy and matplotlib
Instructions for updating:
Use `tf.global_variables_initializer` instead.


<matplotlib.figure.Figure at 0x7f67480dd790>

New data, epoch 0
Step 0 Loss 0.0
Step 100 Loss 0.0
Step 200 Loss 0.0
Step 300 Loss 0.0
Step 400 Loss 0.0
Step 500 Loss 0.0
Step 600 Loss 0.0
New data, epoch 1
Step 0 Loss 0.0
Step 100 Loss 0.0
Step 200 Loss 0.0
Step 300 Loss 0.0
Step 400 Loss 0.0
Step 500 Loss 0.0
Step 600 Loss 0.0
New data, epoch 2
Step 0 Loss 0.0
Step 100 Loss 0.0
Step 200 Loss 0.0
Step 300 Loss 0.0
Step 400 Loss 0.0
Step 500 Loss 0.0
Step 600 Loss 0.0
New data, epoch 3
Step 0 Loss 0.0
Step 100 Loss 0.0
Step 200 Loss 0.0
Step 300 Loss 0.0
Step 400 Loss 0.0
Step 500 Loss 0.0
Step 600 Loss 0.0
New data, epoch 4
Step 0 Loss 0.0
Step 100 Loss 0.0
Step 200 Loss 0.0
Step 300 Loss 0.0
Step 400 Loss 0.0
Step 500 Loss 0.0
Step 600 Loss 0.0
New data, epoch 5
Step 0 Loss 0.0
Step 100 Loss 0.0
Step 200 Loss 0.0
Step 300 Loss 0.0
Step 400 Loss 0.0
Step 500 Loss 0.0
Step 600 Loss 0.0
New data, epoch 6
Step 0 Loss 0.0
Step 100 Loss 0.0
Step 200 Loss 0.0
Step 300 Loss 0.0
Step 400 Loss 0.0
Step 500 Loss 0.0
Step 600 Loss 0.0
New da

KeyboardInterrupt: 