# Section 3-3 - Recurrent Neural Network

*WARNING: Cells 7 and 17 may require considerable training time.*

We now consider text data, in the form of Rotten Tomatoes movie reviews. Each review is a sentence up to 48 words, with sentiments ranging from 0 (very bad) to 4 (very good). Similar to our approach with CNNs, we look to go further than simply treating the data as a 'flat' vector.

In [1]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from time import time

np.random.seed(1337)

df = pd.read_csv('data/rottentomatoes.csv')

In [2]:
df.head()

Unnamed: 0,PhraseId,SentenceId,Phrase,Sentiment
0,1,1,A series of escapades demonstrating the adage ...,1
1,2,1,A series of escapades demonstrating the adage ...,2
2,3,1,A series,2
3,4,1,A,2
4,5,1,series,2


In [3]:
df['Phrase'].values[0]

'A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story .'

In [4]:
count = CountVectorizer(analyzer='word')

df_train = df.iloc[:124800, :]

X_train = count.fit_transform(df_train['Phrase'])
y_train = df_train['Sentiment'].values
y_train_onehot = pd.get_dummies(df_train['Sentiment']).values

In [5]:
df_test = df.iloc[124800:, :]

X_test = count.transform(df_test['Phrase'])
y_test = df_test['Sentiment'].values

In [6]:
for i in range(10):
    print(i+250, count.get_feature_names_out()[i+250])

250 ad
251 adage
252 adam
253 adamant
254 adams
255 adaptation
256 adaptations
257 adapted
258 adapts
259 add


## Benchmark

To calculate our benchmark accuracy score, we take a 'bag-of-words' approach by having each column be the word count and train a Random Forest on the word-count matrix.

In [7]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=0, verbose=3)
model = model.fit(X_train, y_train)

y_prediction = model.predict(X_test)
print("accuracy", np.sum(y_prediction == y_test) / float(len(y_test)))

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


building tree 1 of 100


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   15.5s remaining:    0.0s


building tree 2 of 100


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   29.5s remaining:    0.0s


building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100

[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed: 25.2min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.4s remaining:    0.0s


accuracy 0.5303582853486885


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   16.8s finished


## Pre-processing

As a pre-processing step, we convert the sentence into word tokens. The word tokens are then mapped to a (numerical) word index. The final step involves 'padding' the list of indices with zeros to ensure every row has the same length.

In [8]:
from collections import defaultdict

word_to_index = defaultdict(int)

for i, item in enumerate(count.get_feature_names_out()):
    word_to_index[item] = i+1

In [9]:
sequencer = count.build_analyzer()

In [10]:
def sentence_to_indices(sentence):
    return [word_to_index[word] for word in sequencer(sentence)]

In [11]:
X_train_seq = list(map(sentence_to_indices, df_train['Phrase']))
X_test_seq = list(map(sentence_to_indices, df_test['Phrase']))

In [12]:
from tensorflow.keras.preprocessing import sequence

X_train_pad = sequence.pad_sequences(X_train_seq, maxlen=48)
X_test_pad = sequence.pad_sequences(X_test_seq, maxlen=48)

2023-03-24 10:21:40.803721: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [13]:
df_train['Phrase'].values[0]

'A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story .'

In [14]:
sequencer(df_train['Phrase'].values[0])

['series',
 'of',
 'escapades',
 'demonstrating',
 'the',
 'adage',
 'that',
 'what',
 'is',
 'good',
 'for',
 'the',
 'goose',
 'is',
 'also',
 'good',
 'for',
 'the',
 'gander',
 'some',
 'of',
 'which',
 'occasionally',
 'amuses',
 'but',
 'none',
 'of',
 'which',
 'amounts',
 'to',
 'much',
 'of',
 'story']

In [15]:
X_train_seq[0]

[10531,
 8224,
 4076,
 3100,
 12023,
 252,
 12021,
 13226,
 6445,
 5188,
 4750,
 12023,
 5204,
 6445,
 462,
 5188,
 4750,
 12023,
 4991,
 11053,
 8224,
 13242,
 8201,
 529,
 1682,
 8094,
 8224,
 13242,
 520,
 12182,
 7845,
 8224,
 11444]

In [16]:
X_train_pad[0]

array([    0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0, 10531,  8224,  4076,
        3100, 12023,   252, 12021, 13226,  6445,  5188,  4750, 12023,
        5204,  6445,   462,  5188,  4750, 12023,  4991, 11053,  8224,
       13242,  8201,   529,  1682,  8094,  8224, 13242,   520, 12182,
        7845,  8224, 11444], dtype=int32)

## Long Short-Term Memory

To preserve the sequential nature of the sentence, we would train our Recurrent Neural Network (RNN) by feeding each word index one by one. However, it gets expensive to keep track of long-term dependencies, for example between "series" and "story" in the first sentence. Gradient contributions in deep networks have a tendency of vanishing to zero - this effect is referred to as the 'vanishing gradient' problem.

Long Short-Term Memory networks (LSTMs) was introduced to get around this problem with a gating mechanism. These gates limit how much the existing state is 'affected' by previous states. How much each gate lets through is itself a parameter that gets trained. Chris Olah has an excellent blog post that explains how LSTMs work:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

In [17]:
# https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding, LSTM
from keras.losses import binary_crossentropy
from keras.optimizers import Adam

start = time()

model = Sequential()
model.add(Embedding(len(word_to_index)+1, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(5, activation='sigmoid'))

model.compile(loss=binary_crossentropy, optimizer=Adam(), metrics=['accuracy'])

model.fit(X_train_pad, y_train_onehot, epochs=2)

print('\ntime taken %s seconds' % str(time() - start))

Epoch 1/2
Epoch 2/2

time taken 362.1953592300415 seconds


In [18]:
y_prediction = np.argmax(model.predict(X_test_pad), axis=-1)
print("\naccuracy", np.sum(y_prediction == y_test) / float(len(y_test)))


accuracy 0.6054702495201536


Intuitively, preserving the sentence structure would improve performance (for example, in distinguishing between "good" and "not good"). It is rewarding to see that this is indeed the case. LSTMs is one of the more complicated neural network architectures, but highly impressive applications recently makes it a very worthwhile topic of study.