### In this chapter, we will use plain RNNs and variants of RNNs on a sentiment classification task: processing the input sequence and predicting whether the sentiment is positive or negative.

We'll use the IMDb reviews dataset for this task. The dataset contains 50,000 movie reviews, along with their sentiment – 25,000 highly polar movie reviews for training and 25,000 for testing.

#### Loading Data

In [1]:
from tensorflow.keras.datasets import imdb

With the module imported, importing the dataset (tokenized and separated into train and test sets) is as easy as running imdb.load_data. The only parameter we need to provide is the vocabulary size we wish to use.

Here, we will specify a vocabulary size of 8,000 for our models.

In [2]:
vocab_size = 8000
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


Let's inspect the X_train variable to see what we are working with.

In [3]:
print(type(x_train))
print(type(x_train[5]))
print(x_train[5])

<class 'numpy.ndarray'>
<class 'list'>
[1, 778, 128, 74, 12, 630, 163, 15, 4, 1766, 7982, 1051, 2, 32, 85, 156, 45, 40, 148, 139, 121, 664, 665, 10, 10, 1361, 173, 4, 749, 2, 16, 3804, 8, 4, 226, 65, 12, 43, 127, 24, 2, 10, 10]


The next step is to define an upper limit on the length of the sequences that we'll work with and limit all sequences to the defined maximum length.

In [4]:
maxlen = 200

The next step is to get all our sequences to the same length using the pad_sequences utility from Keras.

#### Staging and Preprocessing Our Data

The pad_sequences utility from the sequences module in Keras helps us in getting all the sequences to a specified length. 

In [5]:
from tensorflow.keras import preprocessing

In [6]:
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)

In [7]:
print(x_train[5])

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    1  778  128   74   12  630  163   15    4 1766 7982
 1051    2   32   85  156   45   40  148  139  121  664  665   10   10
 1361  173    4  749    2   16 3804    8    4  226   65   12   43  127
   24 

We can see that there are plenty of 0s at the beginning of the result. As you may have inferred, this is the padding that's done by the pad_sequence utility because the input sequence was shorter than 200. 

#### The Embedding Layer
The embedding layer is always the first layer in the model. You can follow it up with any architecture of your choice (RNNs, in our case).

#### Building the Plain RNN Model
#### Exercise 6.01: Building and Training an RNN Model for Sentiment Classification
In this exercise, we will build and train an RNN model for sentiment classification. Initially, we will define the architecture for the recurrent and prediction layers, and we will assess the model's performance on the test data. We will add the embedding layer and some dropout and complete the model definition by adding the RNN layer, dropout, and a dense layer to finish. Then, we'll check the accuracy of the predictions on the test data to assess how well the model generalizes. 

In [8]:
# set seeds for numpy and tensforflow for reproducible results
import numpy as np
import tensorflow as tf
np.random.seed(42)
tf.random.set_seed(42)

import all the necessary packages and layers and initializing a sequential model named model_rnn using the following commands:

In [9]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Flatten, Dense, Embedding, SpatialDropout1D, Dropout
model_rnn = Sequential()

2022-06-30 06:10:19.888409: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [10]:
# embedding layer
model_rnn.add(Embedding(vocab_size, output_dim=32))
model_rnn.add(SpatialDropout1D(0.4)) # miniize overfitting

In [11]:
# SimpleRNN layer
model_rnn.add(SimpleRNN(32))

In [12]:
# Dropout layer
model_rnn.add(Dropout(0.4))

In [13]:
# Prediction layer (Dense)
model_rnn.add(Dense(1, activation='sigmoid')) # sigmoid because we have binary classification.

In [16]:
# compile the model and view summary
model_rnn.compile(
    loss='binary_crossentropy',
    optimizer='rmsprop',
    metrics=['accuracy']
)

model_rnn.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          256000    
                                                                 
 spatial_dropout1d (SpatialD  (None, None, 32)         0         
 ropout1D)                                                       
                                                                 
 simple_rnn (SimpleRNN)      (None, 32)                2080      
                                                                 
 dropout (Dropout)           (None, 32)                0         
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 258,113
Trainable params: 258,113
Non-trainable params: 0
__________________________________________________

We can see that there are 258,113 parameters, most of which are present in the embedding layer. The reason for this is that the word embeddings are being learned during the training – so we're learning the embedding matrix, which is of dimensionality vocab_size(8000) × output_dim(32).

In [17]:
# fit model on train data
history_rnn = model_rnn.fit(
    x_train, y_train,
    batch_size=128,
    validation_split=0.2, # gives us a sense of the model performance on unseen data.
    epochs=10
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


From the training output, we can see that the validation accuracy goes up to about 87%

In [22]:
# prediction on test data
y_test_pred = (model_rnn.predict(x_test) > 0.5).astype('int32') # use this when working on binary classification.
classes_x = np.argmax(y_test_pred, axis=1)
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_test_pred))

0.83172


Overcome deprecated Sequential.predict_classes() method.

Multi-class classification - Softmax last layer
- np.argmax(model.predict(x), axis=-1)

Binary classification - Sigmoid last layer
- (model.predict(x) > 0.5).astype("int32")

We can see that the model does a decent job. We used a simple architecture with 32 neurons and used a vocabulary size of just 8000. Tweaking these and other hyperparameters may get you better results and you are encouraged to do so.

In this exercise, we have seen how to build an RNN-based model for text. We saw how an embedding layer can be used to derive word vectors for the task at hand. These word vectors are the representations for each incoming term, which are passed to the RNN layer. We have seen that even a simple architecture can give us good results. Now, let's discuss how this model can be used to make predictions on new, unseen reviews.

#### Making Predictions on Unseen Data
Our model (model_rnn) was trained on IMDb reviews that were tokenized, had their case lowered, had punctuation removed, had a defined vocabulary size, and were converted into a sequence of indices. Our function/pipeline for preparing data for the RNN model needs to perform the same steps.

In [23]:
# variable containing raw review text
inp_review = 'An excellent movie!'

The sentiment in the text is positive. If the model is working well enough, it should predict the sentiment as positive.

In [24]:
# text_to_word_sequence
from tensorflow.keras.preprocessing.text import text_to_word_sequence

The code above must tokenize this text into its constituent terms, normalize its case, and remove punctuation.

Check if it works as expected.

In [25]:
text_to_word_sequence(inp_review)

['an', 'excellent', 'movie']

In [28]:
# load vacabulary into dictionary named word_map
word_map = imdb.get_word_index()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


In [29]:
# limit the mapping to 8000 terms by sorting word_map variable on index and picking the first 8000 terms to match what is used on the train data.
vocab_map = dict(sorted(word_map.items(), key=lambda x: x[1])[:vocab_size])

The vocab map will be a dictionary containing the term for index mapping for the 8000 terms in the vocabulary. Using this mapping, we'll convert the tokenized sentence into a sequence of term indices by performing a lookup for each term and returning the corresponding index.

In [30]:
# preprocess function that accepts raw text, applies the text_to_word_sequence utility to it, performs a lookup from vocab_map, and returns the corresponding sequence of integers
def preprocess(review):
    inp_tokens = text_to_word_sequence(review)
    seq = []
    for token in inp_tokens:
        seq.append(vocab_map.get(token))
        
    return seq

In [31]:
preprocess(inp_review)

[32, 318, 17]

This is the sequence of term indices corresponding to the raw text. Note that the data is now in the same format as the IMDb data we loaded. This sequence of indices can be fed to the RNN model (using the predict_classes method) to classify the sentiment, as shown in the following code. If the model is working well enough, it should predict the sentiment as positive

In [32]:
(model_rnn.predict([preprocess(inp_review)]) > 0.5).astype('int32')

array([[1]], dtype=int32)

The output prediction is 1 (positive), just as we expected

Let's apply the function to another raw text review and supply it to the model for prediction. Let's update the inp_review variable so that it contains the text "Don't watch this movie – poor acting, poor script, bad direction." The sentiment in the review is negative. We expect the model to classify it as such

In [33]:
inp_review = "Don't watch this movie - poor acting, poor script, bad direction."

In [34]:
(model_rnn.predict([preprocess(inp_review)]) > 0.5).astype('int32')

array([[0]], dtype=int32)

The predicted sentiment is negative, just as we would expect the model to behave.

#### LSTMs, GRUs, and Other Variants

#### Exercise 6.02: LSTM-Based Sentiment Classification Model
In this exercise, we will build a simple LSTM-based model to predict sentiment on our data. We will continue with the same setup we used previously (that is, the number of cells, embedding dimensions, dropout, and so on).

In [35]:
# LSTM layer
from tensorflow.keras.layers import LSTM

Instantiate the sequential model, add the embedding layer with the appropriate dimensions, and add a 40% spatial dropout

In [36]:
model_lstm = Sequential()
model_lstm.add(Embedding(vocab_size, output_dim=32))
model_lstm.add(SpatialDropout1D(0.4))

In [37]:
# LSTM layer
model_lstm.add(LSTM(32))

In [38]:
# dropout and dense layers
model_lstm.add(Dropout(0.4))
model_lstm.add(Dense(1, activation='sigmoid'))
model_lstm.compile(
    loss='binary_crossentropy',
    optimizer='rmsprop',metrics=['accuracy']
)
model_lstm.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, None, 32)          256000    
                                                                 
 spatial_dropout1d_1 (Spatia  (None, None, 32)         0         
 lDropout1D)                                                     
                                                                 
 lstm (LSTM)                 (None, 32)                8320      
                                                                 
 dropout_1 (Dropout)         (None, 32)                0         
                                                                 
 dense_1 (Dense)             (None, 1)                 33        
                                                                 
Total params: 264,353
Trainable params: 264,353
Non-trainable params: 0
________________________________________________

We can see from the model summary that the number of parameters in the LSTM layer is 8320. A quick check can confirm that this is exactly four times the number of parameters in the plain RNN layer we saw in Exercise 6.01, Building and Training an RNN Model for Sentiment Classification, which is in line with our expectations.

In [39]:
# fit on training data for 5 epochs and batch size of 128
history_lstm = model_lstm.fit(
    x_train,
    y_train,
    batch_size=128,
    validation_split=0.2,
    epochs=5
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [40]:
# test data performance
y_test_pred_lstm = (model_lstm.predict(x_test) > 0.5).astype("int32")
print(accuracy_score(y_test, y_test_pred_lstm))

0.87288


The accuracy we got (87%) is a significant improvement from the accuracy we got using plain RNNs (83.2%). It looks like the extra parameters and the extra predictive power from the cell state came in handy for our task.

#### Exercise 6.03: GRU-Based Sentiment Classification Model
In this exercise, we will build a simple GRU-based model to predict sentiments in our data. We will continue with the same setup that we used previously (that is, the number of cells, embedding dimensions, dropout, and so on). Using GRUs instead of LSTMs in the model is as simple as replacing "LSTM" with "GRU" when adding the layer.

In [41]:
# import GRU layer
from tensorflow.keras.layers import GRU

In [47]:
model_gru= Sequential()
model_gru.add(Embedding(vocab_size, output_dim=32))
model_gru.add(SpatialDropout1D(0.4))

In [48]:
# GRU layer
model_gru.add(GRU(32, reset_after=False))

In [49]:
# dropout and dense layers
model_gru.add(Dropout(0.4))
model_gru.add(Dense(1, activation='sigmoid'))
model_gru.compile(
    loss='binary_crossentropy',
    optimizer='rmsprop',metrics=['accuracy']
)
model_gru.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, None, 32)          256000    
                                                                 
 spatial_dropout1d_3 (Spatia  (None, None, 32)         0         
 lDropout1D)                                                     
                                                                 
 gru_1 (GRU)                 (None, 32)                6240      
                                                                 
 dropout_3 (Dropout)         (None, 32)                0         
                                                                 
 dense_3 (Dense)             (None, 1)                 33        
                                                                 
Total params: 262,273
Trainable params: 262,273
Non-trainable params: 0
________________________________________________

We can see from the model summary that the number of parameters in the LSTM layer is 8320. A quick check can confirm that this is exactly four times the number of parameters in the plain RNN layer we saw in Exercise 6.01, Building and Training an RNN Model for Sentiment Classification, which is in line with our expectations.

In [50]:
# fit on training data for 5 epochs and batch size of 128
history_gru = model_gru.fit(
    x_train,
    y_train,
    batch_size=128,
    validation_split=0.2,
    epochs=5
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [51]:
# predictions on test data
y_test_pred_gru = (model_gru.predict(x_test) > 0.5).astype("int32")
print(accuracy_score(y_test, y_test_pred_gru))

0.87068


We can see that our accuracy (87.1%) is close to (87.3%) from LSTMs. GRUs are simplifications of LSTMs that aim to provide similar accuracy with fewer parameters.