Recurrent Neural networks
=====

### RNN  

<img src ="imgs/rnn.png" width="20%">

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.

In [1]:
import keras
output_dim = 100

Using TensorFlow backend.
  return f(*args, **kwds)


In [3]:
keras.layers.recurrent.SimpleRNN(output_dim,
                                  activation="tanh",
                                  kernel_initializer="glorot_uniform",
                                  recurrent_initializer="orthogonal",
                                  kernel_regularizer=None,
                                  bias_regularizer=None,
                                  recurrent_regularizer=None,
                                  dropout=0.0,
                                  recurrent_dropout=0.0)

<keras.layers.recurrent.SimpleRNN at 0x12703aa58>

#### Backprop Through time  

Contrary to feed-forward neural networks, the RNN is characterized by the ability of encoding longer past information, thus very suitable for sequential models. The BPTT extends the ordinary BP algorithm to suit the recurrent neural
architecture.

<img src ="imgs/rnn2.png" width="45%">

In [4]:
%matplotlib inline

In [8]:
import numpy as np
import pandas as pd
import theano
import theano.tensor as T
import keras 
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.preprocessing import image
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt

from keras.datasets import imdb
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.preprocessing import sequence
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM, GRU, SimpleRNN
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from keras.layers.core import Activation, TimeDistributedDense, RepeatVector
from keras.callbacks import EarlyStopping, ModelCheckpoint

import tensorflow as tf

tf.python.control_flow_ops = tf



ImportError: cannot import name 'TimeDistributedDense'

#### IMDB sentiment classification task

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. 

IMDB provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. 

There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. 

http://ai.stanford.edu/~amaas/data/sentiment/

#### Data Preparation - IMDB

In [None]:
max_features = 20000
maxlen = 100  # trim reviews after this number of words (among top max_features most common words)
batch_size = 32

print("Loading data...")
# (X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features, test_split=0.2)
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features)

print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('Example:')
print(X_train[:1])

# Add padding - for reviews below maxlen words, add (maxlen - len) 0
print("Pad sequences (samples x time)")
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

max_epoch = 50

#### Model building 

![Sentiment analysis architecture](https://cdn-images-1.medium.com/max/489/1*27JmK8VBdphpSCWNb4MhNA.png!)

Input:
sequence of one-hot vectors, 
each on fed to the embedding layer, transforms it into the ___ dimentional vector,
feed it to the 

take the last hidden state
pass it to the

get the output
compute the loss,
do back propagation

In [7]:
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(SimpleRNN(128))  
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=["accuracy"])

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=1, validation_data=(X_test, y_test))
acc , loss = model.evaluate(X_test, y_test, batch_size=batch_size)
print()
print('Test accuracy:', acc)
print('Test loss:', loss)

Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/1
Test accuracy: 0.602803128452
Test loss: 0.65844


### LSTM  

A LSTM network is an artificial neural network that contains LSTM blocks instead of, or in addition to, regular network units. A LSTM block may be described as a "smart" network unit that can remember a value for an arbitrary length of time. 

Unlike traditional RNNs, an Long short-term memory network is well-suited to learn from experience to classify, process and predict time series when there are very long time lags of unknown size between important events.

<img src ="imgs/gru.png" width="60%">

In [8]:
keras.layers.recurrent.LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal', 
                            forget_bias_init='one', activation='tanh', 
                            inner_activation='hard_sigmoid', 
                            W_regularizer=None, U_regularizer=None, b_regularizer=None, 
                            dropout_W=0.0, dropout_U=0.0)

<keras.layers.recurrent.LSTM at 0x7f3225d3c4e0>

### GRU  

Gated recurrent units are a gating mechanism in recurrent neural networks. 

Much similar to the LSTMs, they have fewer parameters than LSTM, as they lack an output gate.

In [9]:
keras.layers.recurrent.GRU(output_dim, init='glorot_uniform', inner_init='orthogonal', 
                           activation='tanh', inner_activation='hard_sigmoid', 
                           W_regularizer=None, U_regularizer=None, b_regularizer=None, 
                           dropout_W=0.0, dropout_U=0.0)

<keras.layers.recurrent.GRU at 0x7f3225d3c390>

### Your Turn! - Hands on Rnn

In [10]:
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))

# Play with those! try and get better results!
model.add(SimpleRNN(128))  
#model.add(GRU(128))  
#model.add(LSTM(128))  

model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', optimizer='adam' ,metrics=["accuracy"])

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, 
          nb_epoch=max_epoch, validation_data=(X_test, y_test))
acc , loss = model.evaluate(X_test, y_test, batch_size=batch_size)
print()
print('Test accuracy:', acc)
print('Test loss:', loss)

Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50

Test accuracy: 0.772219604321
Test loss: 0.72404


In [11]:
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))

# Play with those! try and get better results!
# model.add(SimpleRNN(128))  
model.add(GRU(128))  
# model.add(LSTM(128))  

model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', optimizer='adam' ,metrics=["accuracy"])

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, 
          nb_epoch=max_epoch, validation_data=(X_test, y_test))
acc , loss = model.evaluate(X_test, y_test, batch_size=batch_size)
print()
print('Test accuracy:', acc)
print('Test loss:', loss)

Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50

Test accuracy: 2.243032588
Test loss: 0.81452


In [12]:
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))

# Play with those! try and get better results!
# model.add(SimpleRNN(128))  
# model.add(GRU(128))  
model.add(LSTM(128))  

model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy', optimizer='adam' ,metrics=["accuracy"])

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, 
          nb_epoch=max_epoch, validation_data=(X_test, y_test))
acc , loss = model.evaluate(X_test, y_test, batch_size=batch_size)
print()
print('Test accuracy:', acc)
print('Test loss:', loss)

Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50

Test accuracy: 2.07217236311
Test loss: 0.81636
