# Recurrent Neural Networks Demo

Author: Runshan Fu (Fall 2017 95-865 TA), George H. Chen

In this demo, we will implement RNN models for sentiment analysis on IMDB reviews. We will start from the original review texts and predict the sentiment (positive or negative) for each review. This demo is borrowed from the book *Deep Learning with Python* by Francois Chollet and also uses code from user `mdaoust` in [this stackoverflow post](https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset).

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

## Load the dataset
We directly load the data as lists of intergers from keras. We restrict the movie reviews to the top 2000 most common words, and make all the reviews exactly 200 words (truncating as needed or padding with a special padding character; note that truncating works by removing the *start* of the review and keeping only the last 200 words; padding works by adding the special padding character to the *start* of the review).

In [2]:
from keras.datasets import imdb
from keras.preprocessing import sequence

# load the dataset and only keep the top words (most frequently occurring)
vocab_size = 2000
INDEX_FROM = 2  # for dealing with some special characters; this is a technical thing related to the IMDB dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size, index_from=INDEX_FROM)

word_to_idx = imdb.get_word_index()
word_to_idx = {word:(word_idx + INDEX_FROM) for word, word_idx in word_to_idx.items()}
word_to_idx['<PAD>'] = 0
word_to_idx['<START>'] = 1
word_to_idx['<UNK>'] = 2

idx_to_word = {word_idx:word for word, word_idx in word_to_idx.items()}

# turn the lists of integers into a 2D integer tensor of shape `(samples, maxlen)`
x_train = sequence.pad_sequences(x_train, maxlen=200)
x_test = sequence.pad_sequences(x_test, maxlen=200)

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


ValueError: Object arrays cannot be loaded when allow_pickle=False

In [None]:
print(x_train.shape)

In [None]:
print(x_train[0])

Here is an example of a review with positive sentiment (this particular review has been truncated, so we only see the last 200 words of the review):

In [None]:
print(' '.join(idx_to_word[idx] for idx in x_train[0]))

In [None]:
y_train[0]

Here is an example of a review with negative sentiment (this particular review has been padded at the beginning; note that when we can see the starting of a review, the first token of the actual review is a special token `<START>`):

In [None]:
print(' '.join(idx_to_word[idx] for idx in x_train[-3]))

In [None]:
y_train[-3]

## Use pre-trained word embeddings
We use GloVe embeddings instead of learning our own task-specific word embedding. First download the pre-computed embeddings from 2014 English Wikipedia on https://nlp.stanford.edu/projects/glove/ (specifically the one with 6 billion tokens, `globe.6B.zip`). Unzip it (so that `glove.6B.100d.txt` is located in the directory `./glove/`).

We first create a dictionary that maps each English word to its corresponding 100-dimensional GloVe embedding.

In [None]:
word_to_embedding = {}

# we will use the 100-dimensional embedding vectors
with open("./glove/glove.6B.100d.txt") as f:
    # each row represents a word vector
    for line in f:
        values = line.split()
        # the first part is word
        word = values[0]
        # the rest of the values form the embedding vector
        embedding = np.asarray(values[1:], dtype='float32')
        word_to_embedding[word] = embedding

print('Found %s word vectors.' % len(word_to_embedding))

Next, we create an embedding matrix, where the i-th row holds the GloVe embedding for the i-th top word (except where i=0 is the special padding token `<PAD>`, i=1 is the special `<START>` token, and i=2 is the special `<UNK>` token; for these special cases the embedding vector is left as all zeros).

In [None]:
embedding_dim = 100

embedding_matrix = np.zeros((vocab_size, embedding_dim))
for idx in range(vocab_size):
    word = idx_to_word[idx]
    if word in word_to_embedding:
        embedding_matrix[idx] = word_to_embedding[word]

## Feedforward network with embeddings

This first neural net is *not* a recurrent neural net. It does not do anything special to account for time series structure. This net is meant to be a baseline that we compare a recurrent neural net against. To make the comparison somewhat fair, in both cases, the last two layers have the same output dimensions: the second-to-last layer is a Dense layer with 64 neurons and `relu` activation, and the last layer is a Dense layer with 1 neuron and `sigmoid` activation.

In [None]:
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense
# initialize the model
feedforward_model = Sequential()
feedforward_model.add(Embedding(vocab_size, embedding_dim, input_length=200))
feedforward_model.add(Flatten())
feedforward_model.add(Dense(64, activation='relu'))
feedforward_model.add(Dense(1, activation='sigmoid'))
feedforward_model.summary()

In [None]:
# load the GloVe embeddings in the model
feedforward_model.layers[0].set_weights([embedding_matrix])
# set the embedding layer to be not trainable, so the weights do not change during the training
feedforward_model.layers[0].trainable = False

feedforward_model.summary()  # the summary changes after we turn off training for the 0-th layer (note the last line "Non-trainable params")

In [None]:
# compile and train the model
feedforward_model.compile(optimizer='adam',
                          loss='binary_crossentropy',
                          metrics=['acc'])

history = feedforward_model.fit(x_train, y_train,
                                validation_split=0.2,
                                epochs=10,
                                batch_size=32)

In [None]:
# plot the accuracy rates for each epoch on training and validation data
acc = history.history['acc']
val_acc = history.history['val_acc']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

## LSTM

Now we use an LSTM recurrent neural net. If you want to use a different kind of RNN such as `SimpleRNN` or `GRU`, simply replace `LSTM` with `SimpleRNN` or `GRU` (both in importing the layer and in adding the layer to the model).

In [None]:
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense
rnn_model = Sequential()
rnn_model.add(Embedding(vocab_size, embedding_dim, input_length=200))
rnn_model.add(LSTM(64))
rnn_model.add(Dense(1, activation='sigmoid'))

# load the GloVe embeddings in the model
rnn_model.layers[0].set_weights([embedding_matrix])
rnn_model.layers[0].trainable = False

rnn_model.summary()

In [None]:
# compile and train the model
rnn_model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['acc'])
history = rnn_model.fit(x_train, y_train,
                        validation_split=0.2,
                        epochs=10,
                        batch_size=32)

In [None]:
# plot the accuracy rates for each epoch on training and validation data
acc = history.history['acc']
val_acc = history.history['val_acc']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.show()

## Finally evaluate on test data

We now compare the test set raw classification accuracies of the feedforward neural net vs the LSTM model. Keep in mind that we have set both of these models up so that right before the final logistic regression classification layer, we are representing each review as a feature vector of length 64. The LSTM model learns a much better 64-dimensional feature space to use (as evidenced by its dramatically higher prediction accuracy on the test set).

In [None]:
test_loss, test_acc = feedforward_model.evaluate(x_test, y_test)
print('Test accuracy:', test_acc)

In [None]:
test_loss, test_acc = rnn_model.evaluate(x_test, y_test)
print('Test accuracy:', test_acc)

To get predictions, we use the `predict_classes` function (note that there is a `predict` function that outputs whatever is the neural net's final output, which in this case is the probability of positive sentiment per test example, since the final layer is a Dense layer with 1 neuron and sigmoid activation).

In [None]:
predicted_labels = rnn_model.predict_classes(x_test)

In [None]:
predicted_labels

In [None]:
np.mean(predicted_labels.flatten() == y_test)

In case you're wondering how the predicted labels are computed, we can simply look at the raw neural net outputs (which are probabilities) and threshold at probability 0.5 (i.e., declare every test example with probability at least 0.5 of having positive sentiment to be in the positive sentiment class and declare all other test examples to have negative sentiment).

In [None]:
test_set_predicted_probs = rnn_model.predict(x_test)

In [None]:
test_set_predicted_probs

The code below computes the test set accuracy of `rnn_model`. Note that `test_set_predicted_probs >= .5` converts the test set predicted probabilities into actual classifications (1 if the predicted probability is greater than .5 and 0 otherwise). Flattening is needed since `test_set_predicted_probs >= .5` is actually a 2D array.

In [None]:
np.mean((test_set_predicted_probs >= .5).flatten() == y_test)