## Basic RNNs in Tensorflow
- 1 Layer of 5 recurrent neurons
- Outputs of a layer of recurrent neurons for all instances in a mini-batch
$$ \textbf{Y}_{(t)} = \phi \big(  \textbf{X}_{(t)} . \textbf{W}_x + \textbf{Y}_{(t-1)}^T . \textbf{W}_y + b \big) $$
$$ = \phi \big( \big[ \textbf{X}_{(t)}  \textbf{Y}_{(t-1)}  \big] . \textbf{W} + b  \big)$$

where $ \textbf{W} = {\textbf{W}_x\brack \textbf{W}_y}  $

#### Notations in vectorized form:
- $m =$ number of instances in mini-batch
- $n_{neurons}$ number of neurons
- $n_{inputs}$  number of input features
- $dim(x)$ a function that determines shape or dimension of element $x$.
- $\textbf{Y}_{(t)}$ is the layer output at time $t$ for each instance of the mini-batch:
$$ dim(\textbf{Y}_{(t)}) = m\times n_{neurons}$$ 
- $\textbf{X}_{(t)}$ is a matrix containing the inputs for all instances:
$$ dim(\textbf{X}_{(t)}) = m \times n_{inputs}$$
- $\textbf{W}_{x}$ is a matrix containing the connections weights for the inputs of the **current** time step:
$$ dim(\textbf{W}_{x}) = n_{inputs} \times n_{neurons} $$
- $\textbf{W}_{y}$ is a matrix containing the connections werights for the outputs of the **previous** time step:
$$ dim(\textbf{W}_{y}) = n_{nuerons} \times n_{neurons} $$
- Weight matrices $\textbf{W}_x$ and $\textbf{W}_y$ are often concatenated into a single matrix $\textbf{W}$ of shape $(n_{inputs} + n_{nuerons}) \times n_{neurons}$
- Bias term $b$ is just a 1-dimensional vector of size $1 \times n_{neurons}$ 

#### TODO: Put image of network we are building here

In [89]:
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
import os
import numpy as np

tf.set_random_seed(1) # seed to obtain similar outputs
os.environ['CUDA_VISIBLE_DEVICES'] = '' # avoids using GPU for this session

### Single Recurrent Neuron
![alt txt](https://docs.google.com/drawings/d/e/2PACX-1vQXBLYvvI1dqAHdLA0hQdsP1PojmCfuSCMK2DXEL0uTvRUqvD1eYK8fsECcNCoekxCbgWJ-k7QF_1s4/pub?w=703&h=508)

In [133]:
# Implementation of RNN with Single Neuron
N_INPUTS = 4
N_NEURONS = 1

class SingleRNN(object):
    def __init__(self, n_inputs, n_neurons):
        self.X0 = tf.placeholder(tf.float32, [None, n_inputs])
        self.X1 = tf.placeholder(tf.float32, [None, n_inputs])

        self.Wx = tf.Variable(tf.random_normal(shape=[n_inputs, n_neurons], dtype=tf.float32))
        self.Wy = tf.Variable(tf.random_normal(shape=[n_neurons, n_neurons], dtype=tf.float32))
        b = tf.Variable(tf.zeros([1, n_neurons], dtype=tf.float32))

        self.Y0 = tf.tanh(tf.matmul(self.X0, self.Wx) + b)
        self.Y1 = tf.tanh(tf.matmul(self.Y0, self.Wy) + tf.matmul(self.X1, self.Wx) + b)

In [135]:
# Now we feed input at both time steps
# Generate mini-batch with 4 instances (i.e., each instance has an input sequence of exactly two inputs)
#                 instance1 instance2 instance3 instance4                   
X0_batch = np.array([[0,1,2,0], [3,4,5,0], [6,7,8,0], [9,0,1,0]]) # t = 0
X1_batch = np.array([[9,8,7,0], [0,0,0,0], [6,5,4,0], [3,2,1,0]]) # t = 1

model = SingleRNN(N_INPUTS, N_NEURONS)
with tf.Session() as sess:
    # initialize and run all variables so that we can use their values directly
    init = tf.global_variables_initializer()
    sess.run(init)
    Y0_val, Y1_val, Wx, Wy = sess.run([model.Y0, model.Y1, model.Wx, model.Wy], feed_dict={model.X0: X0_batch, model.X1: X1_batch})

In [136]:
print(Y0_val)

[[-0.55767643]
 [ 0.7142067 ]
 [ 0.98433769]
 [ 0.99762088]]


In [137]:
print(Wx)

[[ 0.45043385]
 [ 0.74536854]
 [-0.68741149]
 [ 0.21806668]]


---

### Basic Layer of Recurrent Neurons
![alt txt](https://docs.google.com/drawings/d/e/2PACX-1vQov6BGg1fXOb7Bg5zenPh7R5j6VsZJh_D6JevQ_sm_fCxmXORxad3qLIFGG1FojzJig0qdcAQoGYoN/pub?w=643&h=404)

In [87]:
# RNN unrolled through two time steps
N_INPUTS = 3 # number of features in input
N_NEURONS = 5

class BasicRNN(object):
    def __init__(self, n_inputs, n_neurons):
        self.X0 = tf.placeholder(tf.float32, [None, n_inputs])
        self.X1 = tf.placeholder(tf.float32, [None, n_inputs])

        Wx = tf.Variable(tf.random_normal(shape=[n_inputs, n_neurons], dtype=tf.float32))
        Wy = tf.Variable(tf.random_normal(shape=[n_neurons, n_neurons], dtype=tf.float32))
        b = tf.Variable(tf.zeros([1, n_neurons], dtype=tf.float32))

        self.Y0 = tf.tanh(tf.matmul(self.X0, Wx) + b)
        self.Y1 = tf.tanh(tf.matmul(self.Y0, Wy) + tf.matmul(self.X1, Wx) + b)

In [88]:
# Now we feed input at both time steps
# Generate mini-batch with 4 instances (i.e., each instance has an input sequence of exactly two inputs)
#                 instance1 instance2 instance3 instance4                   
X0_batch = np.array([[0,1,2], [3,4,5], [6,7,8], [9,0,1]]) # t = 0
X1_batch = np.array([[9,8,7], [0,0,0], [6,5,4], [3,2,1]]) # t = 1

model = BasicRNN(N_INPUTS, N_NEURONS)
with tf.Session() as sess:
    # initialize and run all variables so that we can use their values directly
    init = tf.global_variables_initializer()
    sess.run(init)
    Y0_val, Y1_val = sess.run([model.Y0, model.Y1], feed_dict={model.X0: X0_batch, model.X1: X1_batch})

In [85]:
print(Y0_val) # output at t = 0 with 4 X 5 dimensions (m X n_neurons)

[[-0.82335067  0.9679544  -0.87781847  0.15444483  0.47437516]
 [-0.99492419  0.99998653 -0.99989074 -0.62147528  0.99839431]
 [-0.99986637  1.         -0.99999982 -0.92323399  0.9999963 ]
 [-0.89664513 -0.75321907  0.9998976  -0.99999988  0.99999988]]


In [86]:
print(Y1_val) # output at t = 1

[[-0.95814323  1.         -0.99999988 -0.60029292  0.9999997 ]
 [ 0.86120242  0.41333306 -0.38317758  0.97402877 -0.15842104]
 [-0.91327202  0.99999303 -0.99999648  0.35196573  0.99996436]
 [-0.99632466  0.89853793 -0.99984962 -0.99980336  0.9999882 ]]


---

### RNNs using Static Unrolling Through Time

![alt txt](https://docs.google.com/drawings/d/e/2PACX-1vQSmyivQcisygW5O7DrkeOd-FF3nUTH0CV_vwSg3hzFdvwaKhY18JYM-e0wWL7Bif6F66LdzkaTTXUK/pub?w=929&h=343)

Deals with situations when dealing with extremely large inputs and outputs

In [52]:
class StaticRNN(object):
    def __init__(self, n_steps, n_inputs, n_neurons):
        self.X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
        X_seqs = tf.unstack(tf.transpose(self.X, perm =[1,0,2]))
        
        basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) # build copies of cell for each time step (unrolling).
        output_seqs, states = tf.contrib.rnn.static_rnn(basic_cell, X_seqs, dtype=tf.float32) # does chaining for each input with the cells
        
        self.outputs = tf.transpose(tf.stack(output_seqs), perm=[1,0,2])

In [53]:
# must create 3D tensor as this is what Tensorflow RNN Cell requires
X_batch = np.array([
     # t = 0    t = 1
    [[0,1,2], [9,8,7]], # instance 0
    [[3,4,5], [0,0,0]], # instance 1
    [[6,7,8], [6,5,4]], # instance 2 
    [[9,0,1], [3,2,1]], # instnace 3
])

In [54]:
X_batch.shape

(4, 2, 3)

In [57]:
tf.reset_default_graph() # resetting graph everything we run the code to avoid TF error
N_STEPS = 2
model = StaticRNN(N_STEPS, N_INPUTS, N_NEURONS)

with tf.Session() as sess:
    
    # initialize and run all variables so that we can use their values directly
    init = tf.global_variables_initializer()
    sess.run(init)
    output_vals = model.outputs.eval(feed_dict={model.X: X_batch})       

In [58]:
print(output_vals)

[[[-0.35060367  0.85217297  0.6135031   0.46224031 -0.80418861]
  [-0.93800652  0.99982429  0.99781042 -0.99988985  0.99659526]]

 [[-0.6539489   0.99790823  0.96494722 -0.80104691 -0.61586177]
  [-0.90182787 -0.03686395  0.22155505 -0.3443155   0.51058269]]

 [[-0.83310556  0.99997252  0.99734652 -0.99106479 -0.31516016]
  [-0.96041662  0.99637145  0.9749549  -0.99955434  0.98075169]]

 [[-0.96983677 -0.8127926   0.91314077 -0.99944443  0.99990416]
  [-0.34946752  0.84921968  0.38088277 -0.9891333   0.75855315]]]


There is still a problem with Static RNN version because it builds one cell per time step, which can amplify if we had a very large time steps. 

---

### Dynamic Unrolling Through Time
Basically runs a while loop over a cell the appropriate number of times. Definitely a much cleaner and computationally effecient way of building RNNs on tensorflow. Also, no need to unstack, stack, or transpose.

Here we also handle variable length input sequences.

In [68]:
class DynamicRNN(object):
    def __init__(self, n_steps, n_inputs, n_neurons):
        # denotes the size of the sequence (helpful for supporting varying sizes of input sequences, e.g., sentences)
        self.seq_length = tf.placeholder(tf.int32, [None]) 
        self.X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) # 3D tensor
        
        basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
        self.outputs, self.states = tf.nn.dynamic_rnn(basic_cell, self.X, dtype=tf.float32, sequence_length=self.seq_length)

In [69]:
# must create 3D tensor as this is what Tensorflow RNN Cell requires
X_batch = np.array([
     # t = 0    t = 1
    [[0,1,2], [9,8,7]], # instance 0
    [[3,4,5], [0,0,0]], # instance 1 (padded with a zero vector)
    [[6,7,8], [6,5,4]], # instance 2 
    [[9,0,1], [3,2,1]], # instnace 4
])

seq_length_batch = np.array([2,1,2,2])

In [73]:
tf.reset_default_graph() # resetting graph everything we run the code to avoid TF error

N_STEPS = 2
model = DynamicRNN(N_STEPS, N_INPUTS, N_NEURONS)

with tf.Session() as sess:
    
    # initialize and run all variables so that we can use their values directly
    init = tf.global_variables_initializer()
    sess.run(init)
    outputs_val, states_val = sess.run([model.outputs, model.states],
                                       feed_dict={model.X: X_batch, model.seq_length: seq_length_batch})     

In [74]:
print(outputs_val) # pairs t = 0 ,  t = 1 and outputs them as multidimensional array

[[[-0.37425023  0.74163836 -0.83193213 -0.77348614  0.73784202]
  [ 0.03848177 -0.35700297 -0.77217108 -0.61098522  0.99779701]]

 [[-0.36137825  0.78198385 -0.97801197 -0.93518633  0.98954976]
  [ 0.          0.          0.          0.          0.        ]]

 [[-0.34836701  0.81669199 -0.99730986 -0.98258781  0.99963427]
  [-0.10163038 -0.43126816  0.2514632  -0.0913566   0.92772639]]

 [[ 0.99900454 -0.99995887  0.85338706  0.9904601  -0.89884502]
  [ 0.68933213 -0.73745525 -0.86696005 -0.27243653  0.93952006]]]


In [75]:
print(states_val) # notice that states contain the final state (i.e, t = 1)

[[ 0.03848177 -0.35700297 -0.77217108 -0.61098522  0.99779701]
 [-0.36137825  0.78198385 -0.97801197 -0.93518633  0.98954976]
 [-0.10163038 -0.43126816  0.2514632  -0.0913566   0.92772639]
 [ 0.68933213 -0.73745525 -0.86696005 -0.27243653  0.93952006]]


---

### Training RNN Classifier on MNIST

![alt txt](https://docs.google.com/drawings/d/e/2PACX-1vSiMstqkE9hTYmhPD3KMeFRNNKYA2NnrCayahBOEL1TalRqaWF7rH8a7O-nP9c-mKOdZRsWtmAGZfNN/pub?w=969&h=368)

Note that even though image classification can be done more effectively using CNN, RNNs will still perform well since the sequence is also important in the process of drawing digits.

In [122]:
class ImageRNN(object):
    def __init__(self, n_steps, n_inputs, n_neurons, n_outputs):
        self.X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
        self.y = tf.placeholder(tf.int32, [None])
        
        basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
        outputs, states = tf.nn.dynamic_rnn(basic_cell, self.X, dtype=tf.float32)
        
        # computes loss
        logits = fully_connected(states, n_outputs, activation_fn=None) # log probabilities
        self.loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.y, logits=logits)) 
        
        # evaluation (accuracy)
        correct =  tf.nn.in_top_k(logits, self.y, 1) # tf.equal(tf.argmax(logits, 1), tf.argmax(self.y, 1))
        self.accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))        

In [123]:
# parameters 
N_STEPS = 28
N_INPUTS = 28
N_NEURONS = 150
N_OUTPUTS = 10
N_EPHOCS = 10
BATCH_SIZE = 150
DISPLAY_STEP = 1
LEARNING_RATE = 0.001

In [124]:
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/")
X_test = mnist.test.images.reshape((-1, N_STEPS, N_INPUTS))
y_test = mnist.test.labels

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [125]:
tf.reset_default_graph()

# build model
model = ImageRNN(N_STEPS, N_INPUTS, N_NEURONS, N_OUTPUTS)

# training procedures (backpropagation)
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE)
training_op = optimizer.minimize(model.loss)

with tf.Session() as sess:
    # initialize and run all variables
    init = tf.global_variables_initializer()
    sess.run(init)
    
    for epoch in range(N_EPHOCS):
        avg_cost = 0. # average loss
        total_batch = mnist.train.num_examples // BATCH_SIZE
        for iteration in range(total_batch): # note iterations are depended on batch size (increase for faster computation)
            X_batch, y_batch = mnist.train.next_batch(BATCH_SIZE)
            X_batch = X_batch.reshape((-1, N_STEPS, N_INPUTS))
            _, cost = sess.run([training_op, model.loss], feed_dict={model.X: X_batch, model.y: y_batch})
            avg_cost += cost / total_batch
        
        acc_train = model.accuracy.eval(feed_dict={model.X: X_batch, model.y: y_batch})
        acc_test = model.accuracy.eval(feed_dict={model.X: X_test, model.y: y_test})
        
        # Display cost, accuracy (test, train) based on display step
        if (epoch+1) % DISPLAY_STEP == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))
            print("Train Accuracy: ", acc_train, " ", "Test Accuracy: ", acc_test)
            
    print("Training Finished!!!")

Epoch: 0001 cost= 0.512163897
Train Accuracy:  0.946667   Test Accuracy:  0.9265
Epoch: 0002 cost= 0.206084174
Train Accuracy:  0.96   Test Accuracy:  0.9477
Epoch: 0003 cost= 0.159465483
Train Accuracy:  0.986667   Test Accuracy:  0.959
Epoch: 0004 cost= 0.133277364
Train Accuracy:  0.973333   Test Accuracy:  0.9529
Epoch: 0005 cost= 0.122922636
Train Accuracy:  0.973333   Test Accuracy:  0.9576
Epoch: 0006 cost= 0.106411702
Train Accuracy:  0.966667   Test Accuracy:  0.9624
Epoch: 0007 cost= 0.100989035
Train Accuracy:  0.966667   Test Accuracy:  0.9702
Epoch: 0008 cost= 0.089291672
Train Accuracy:  0.993333   Test Accuracy:  0.9682
Epoch: 0009 cost= 0.083972478
Train Accuracy:  0.986667   Test Accuracy:  0.9721
Epoch: 0010 cost= 0.083262836
Train Accuracy:  0.986667   Test Accuracy:  0.9749
Training Finished!!!


---

### References:
- [All sorts of Text Classificaiton Deep Learning models](https://github.com/brightmart/text_classification)
- [NTHU Machine Learning](https://nthu-datalab.github.io/ml/labs/13_Sentiment_Analysis_and_Neural_Machine_Translation/13_Sentiment_Analysis_and_Neural_Machine_Translation.html)
- [Introduction to RNN](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/)