# Deep Learning - Recurrent Neural Networks

In [1]:
! pip install tensorflow==2.0



In [0]:
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import Sequential
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.layers import SimpleRNN, GRU, LSTM, Embedding, Dense
from tensorflow.keras.datasets import imdb

We will be using the IMDB dataset outlined in the keras documentation [here](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data). We will be applying a supervised learning application to text where we predict the sentiment of the IMDB reviews.

Take a look at the imports above. For the RNN based imports see the [RNN guide](https://www.tensorflow.org/guide/keras/rnn). For preprocessing using `sequence` we'll use [pad_sequence](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences). For Embedding, see the [Embedding documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding).

In [0]:
maxlen = 100 # Only use sentences up to this many words
n = 20000 # Only use the most frequent n words
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=n)

In [4]:
x_train.shape

(25000,)

In [5]:
x_test.shape

(25000,)

In [6]:
for i in range(10):
    print(f"Element {i} has a length of {len(x_train[i])}")

Element 0 has a length of 218
Element 1 has a length of 189
Element 2 has a length of 141
Element 3 has a length of 550
Element 4 has a length of 147
Element 5 has a length of 43
Element 6 has a length of 123
Element 7 has a length of 562
Element 8 has a length of 233
Element 9 has a length of 130


In [7]:
x_train[0][:10]

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65]

In [0]:
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

In [9]:
x_train.shape, x_test.shape

((25000, 100), (25000, 100))

Instead of representing the data as $n$ 0's and 1 one at the $i$th index, the data is provided as the values of $i$ for each word. Each data sample is a sequence of integers that represent the index of the word in our vocabulary. This saves on storage when compared to a vector that's as long as our vocabulary with all 0's and just one 1 as discussed in the lecture. We will be using the [Embedding layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding) to adapt this for our neural network.

In [10]:
print(f"All values of the targets are integers with the following max and min values")
print(f"{y_train.max()}, {y_train.min()}")

All values of the targets are integers with the following max and min values
1, 0


We will build three networks, using basic RNNs, GRUs and LSTMs. We will then compare their performance in predicting the classes of reviews appropriately.

In [0]:
# Define simple_layers which will go into a Sequential model saved as my_simple
# Here we are creating a simple RNN using one SimpleRNN layer with a dropout and recurrent_dropout
# You will need to use an Embedding layer before that to convert the data appropriately
# Determine an embedding size and use that for your SimpleRNN layer's output dimensions as well
# Finally, create an output layer that applies to our dataset task of binary classification

# YOUR CODE HERE
simple_layers = [Embedding(input_dim=n, output_dim=128), SimpleRNN(128, dropout=0.1, recurrent_dropout = 0.1), Dense(1, activation='sigmoid')]
my_simple = Sequential(simple_layers)

In [0]:
assert len(simple_layers) == 3
assert isinstance(simple_layers[0], Embedding)
assert isinstance(simple_layers[1], SimpleRNN)
assert isinstance(simple_layers[2], Dense)
assert simple_layers[0].output_dim == simple_layers[1].units
assert simple_layers[1].dropout > 0
assert simple_layers[1].recurrent_dropout > 0
assert my_simple

In [16]:
%%time
my_simple.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
my_simple.fit(x_train, y_train, batch_size=32, epochs=1)

Train on 25000 samples
CPU times: user 1min 55s, sys: 5.59 s, total: 2min 1s
Wall time: 1min 9s


In [0]:
# Define gru_layers which will go into a Sequential model saved as my_gru
# Here we are creating an RNN using GRUs, add 1 GRU layer with a dropout and recurrent_dropout
# You will need to use an Embedding layer before that to convert the data appropriately
# Determine an embedding size and use that for your GRU layer's output dimensions as well
# Finally, create an output layer that applies to our dataset task of binary classification

# YOUR CODE HERE
gru_layers = [Embedding(input_dim=n, output_dim=128), GRU(128, dropout=0.1, recurrent_dropout = 0.1), Dense(1, activation='sigmoid')]
my_gru = Sequential(gru_layers)

In [0]:
assert len(gru_layers) == 3
assert isinstance(gru_layers[0], Embedding)
assert isinstance(gru_layers[1], GRU)
assert isinstance(gru_layers[2], Dense)
assert gru_layers[0].output_dim == gru_layers[1].units
assert gru_layers[1].dropout > 0
assert gru_layers[1].recurrent_dropout > 0
assert my_gru

In [22]:
%%time
my_gru.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
my_gru.fit(x_train, y_train, batch_size=32, epochs=1)

Train on 25000 samples
CPU times: user 4min 42s, sys: 15.4 s, total: 4min 57s
Wall time: 2min 41s


In [0]:
# Define lstm_layers which will go into a Sequential model saved as my_lstm
# Here we are creating an RNN using LSTMs, add 1 LSTM layer with a dropout and recurrent_dropout
# You will need to use an Embedding layer before that to convert the data appropriately
# Determine an embedding size and use that for your LSTM layer's output dimensions as well
# Finally, create an output layer that applies to our dataset task of binary classification

# YOUR CODE HERE
lstm_layers = [Embedding(input_dim=n, output_dim=128), LSTM(128, dropout=0.1, recurrent_dropout = 0.1), Dense(1, activation='sigmoid')]
my_lstm = Sequential(lstm_layers)

In [0]:
assert len(lstm_layers) == 3
assert isinstance(lstm_layers[0], Embedding)
assert isinstance(lstm_layers[1], LSTM)
assert isinstance(lstm_layers[2], Dense)
assert lstm_layers[0].output_dim == lstm_layers[1].units
assert lstm_layers[1].dropout > 0
assert lstm_layers[1].recurrent_dropout > 0
assert my_lstm

In [25]:
%%time
my_lstm.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
my_lstm.fit(x_train, y_train, batch_size=32, epochs=1)

Train on 25000 samples
CPU times: user 5min 5s, sys: 19.1 s, total: 5min 24s
Wall time: 3min


In [27]:
# Evaluate your models on the test set and save the loss and accuracies to the appropriate variables:
# model_name_loss, model_name_acc

# YOUR CODE HERE
[my_simple_loss, my_simple_acc] = my_simple.evaluate(x_test, y_test) 
[my_gru_loss, my_gru_acc] = my_gru.evaluate(x_test, y_test) 
[my_lstm_loss, my_lstm_acc] = my_lstm.evaluate(x_test, y_test) 




In [28]:
print(f"Your simple model achieved an accuracy of {my_simple_acc:.2}.")
print(f"Your GRU model achieved an accuracy of {my_gru_acc:.2}.")
print(f"Your LSTM model achieved an accuracy of {my_lstm_acc:.2}.")

Your simple model achieved an accuracy of 0.6.
Your GRU model achieved an accuracy of 0.7.
Your LSTM model achieved an accuracy of 0.85.


Note that we are only running these models with 1 layer and training them for only 1 epoch. We can easily achieve better results by stacking multiple layers but the model would take a much longer time to train.

In [0]:
assert my_simple_acc > 0.4
assert my_gru_acc > 0.6
assert my_lstm_acc > 0.7

## Feedback

In [0]:
def feedback():
    """Provide feedback on the contents of this exercise
    
    Returns:
        string
    """
    # YOUR CODE HERE
    return "Some important things regarding what we should be thinking about when creating these models was missing however once we got started things moved more smoothly."