# Recurrent Neural Networks

Recurrent neural networks are simply networks that can have connections going backward in the network, unlike the nets we have seen so far.

Here, we introduce an idea of timesteps where we input a datapoint to the net at each timestep. As a datapoint passes through an RNN, it will proceed to the output layer but the information being sent through the reverse connections will be fed into the net while the next datapoint is passed through the RNN.

## Why Are Backward Connections Useful?

The way RNNs operate means the current datapoint will receive information derived from the previous datapoint since weights will persist and feed information into the net as a new datapoint goes through the net.

The prior information continues to influence how the net operates because, at a third timestep, the information will continue going through the loops in the network architecture, the same at the fourth step, and so on. Therefore, a fundamental difference between RNNs and the nets we have seen in the past is this persistent "memory" of previous datapoints, which allows us to use context of the previous datapoints to make inferences about each datapoint.

### Examples

If we have a paragraph of text, we could feed it into the net word-by-word and try to classify the whole sequence. It gives us the power of using the *context* built by the other words in the paragraph. Any use of language, such as recognizing spoken words, handwritten text, or sign language, can benefit from this.

In addition, in handwriting recognition where we have a whole word of text, certainly knowing some letters in the word can help you figure out an unknown letter in the center. For example, if we have a K, it is very very unlikely the next letter will be a Q in English. For another example, here's a word I wrote:

![img](feature.png)

Obviously, being an intelligent human, you know the third letter is an 'a' and not a 'u' because "feuture" isn't a word and your professor has passable spelling skills. However, looking at that letter in isolation makes its identity totally ambiguous. Letter/digit recognition is the kind of task we have assigned to neural nets before, but RNNs have a more unique ability to read letters one-by-one, and use the past letters to influence later classifications.

The commonality in all of these examples is that they all involve making inferences about *sequences* of inputs (letters, words, signs), rather than just individual inputs.

Note that we are talking only of new datapoints being influenced by prior datapoints, but we will also learn about bidirectional RNNs, which look both forward and backward in time by distances the net will learn.

Of course, CNNs also use local structure within datapoints to make inferences about the whole datapoint too, but we have to specify what size filters and how many filters to use for this. With RNNs, the net will automatically learn to use as much of the past information as it needs to use for a given point.

## Training RNNs

The most common approach to training RNNs is essentially the same as any other neural net we have seen: stochastic gradient descent and backpropagation. SGD will operate just the same as the other nets. Recall we have used backpropagation to compute exact gradients needed by SGD by systematic use of the chain rule propagating backwards from the loss function to the weights and biases in the network.

RNNs present a challenge to this idea--what does it mean to propagate backwards in a net that has loops?! There would be infinite paths the method could take since there are loops it could traverse arbitrarily many times.

## Sentiment Analysis

One area of application of RNNs is in sentiment analysis--attempts to identify the feelings associated with written text. Examples:

* Is a review of a product or service or song or movie positive, neutral, or negative?
* Was someone happy or sad in describing their day?
* Are song lyrics sad or happy or excited?

### Movie Reviews

Let's see what we can do with movie reviews for an experiment in sentiment analysis. The data comes from the Internet Movie Database (IMBD), provided by

* Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). [Learning Word Vectors for Sentiment Analysis](https://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf). *The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)*.

The dataset contains 50,000 reviews (half in a training set, half in a testing set). Each dataset has reviews from disjoint sets of movies, at most 30 per movie. Each review on IMDB includes a rating from 0 to 10, but the data includes only negative reviews ($\leq 4$ rating) and positive reviews ($\geq 6$ rating) as the labels for the datapoints.

We will use the dataset to try to classify the reviews as positive or negative.

In [1]:
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing import text_dataset_from_directory
from tensorflow.strings import regex_replace

import os
import shutil
import zipfile

In [2]:
# unzip the imdb dataset
with zipfile.ZipFile('../datasets/imdb.zip', 'r') as zip_ref:
    zip_ref.extractall('../datasets/imdb/')

In [3]:
# clean the data by removing linebreaks
def prepareData(dir):
    # read the directory of datapoints and labels into a Dataset object
    data = text_dataset_from_directory(dir)
    
    # replace HTML linebreaks from the text with spaces
    return data.map(lambda text, label: (regex_replace(text, '<br />', ' '), label))

# read the directory into memory and clean the text
trainData = prepareData('../datasets/imdb/train')
testData = prepareData('../datasets/imdb/test')

Found 25000 files belonging to 2 classes.
Found 25000 files belonging to 2 classes.


In [4]:
# randomly print a review and label
for text_batch, label_batch in trainData.take(1):
    print(text_batch.numpy()[0])
    print(label_batch.numpy()[0]) # 0 = negative, 1 = positive

b'I have watched some pretty poor films in the past, but what the hell were they thinking of when they made this movie. Had the production crew turned into zombies when they came up with the idea of making it, because you sure have to be brain dead to find any enjoyment in it.  I am a fan of most genres and enjoy "shoot \'em up" games, but merging the daft scenes from the game just made this ridiculous and unwatchable.  As most have already said, there was hardly any script and the acting was weak. I won\'t waste my time describing it.  Anyone who rates this film above 4 has to be part of the production company or Sega, or else they have a very warped concept of entertainment.  I must say, I was more annoyed with the video shop, who gave this a thumbs up, which led me to rent it. Thank god I had a second film to watch to restore some of my faith in movies.  Comic book guy would be right if he said "Worst movie ever"!'
0


#### Dense Net Experiments

In [16]:
# create a TextVectorization layer to turn input string into a sequence of integers,
# each representing one token
maxTokens = 1000
vectorizeLayer = TextVectorization(max_tokens = maxTokens,
                                   output_mode = 'int',
                                   output_sequence_length = 100)

# adapt() fits the TextVectorization layer to our text dataset. This is when the
# max_tokens most common words (i.e. the vocabulary) are selected.
trainText = trainData.map(lambda text, label: text)

vectorizeLayer.adapt(trainText)

In [17]:
model = Sequential()

model.add(Input(shape=(1,), dtype = 'string'))

# add layer to the model
model.add(vectorizeLayer)

# add an embedding layer to turn integers into fixed-length vectors
model.add(Embedding(maxTokens + 1, 128))

# add a fully-connected recurrent layer
model.add(Dense(64, activation = 'relu'))

# add a dense layer
model.add(Dense(64, activation = 'relu'))

# add softmax classifier
model.add(Dense(1, activation = 'sigmoid'))

In [18]:
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
text_vectorization_2 (TextVe (None, 100)               0         
_________________________________________________________________
embedding_5 (Embedding)      (None, 100, 128)          128128    
_________________________________________________________________
dense_10 (Dense)             (None, 100, 64)           8256      
_________________________________________________________________
dense_11 (Dense)             (None, 100, 64)           4160      
_________________________________________________________________
dense_12 (Dense)             (None, 100, 1)            65        
Total params: 140,609
Trainable params: 140,609
Non-trainable params: 0
_________________________________________________________________


In [19]:
model.fit(trainData, epochs = 10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x23d314d41c8>

#### Recurrent Layer

In [13]:
model = Sequential()

model.add(Input(shape=(1,), dtype = 'string'))

# add layer to the model
model.add(vectorizeLayer)

# add an embedding layer to turn integers into fixed-length vectors
model.add(Embedding(maxTokens + 1, 128))

# add a fully-connected recurrent layer
model.add(SimpleRNN(64))

# add a dense layer
model.add(Dense(64, activation = 'relu'))

# add softmax classifier
model.add(Dense(1, activation = 'sigmoid'))

In [14]:
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
text_vectorization_1 (TextVe (None, 100)               0         
_________________________________________________________________
embedding_4 (Embedding)      (None, 100, 128)          128128    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 64)                12352     
_________________________________________________________________
dense_8 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_9 (Dense)              (None, 1)                 65        
Total params: 144,705
Trainable params: 144,705
Non-trainable params: 0
_________________________________________________________________


In [15]:
model.fit(trainData, epochs = 10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x23d2be48c08>

#### Experiment With LSTM

In [9]:
model = Sequential()

model.add(Input(shape=(1,), dtype = 'string'))

# add layer to the model
model.add(vectorizeLayer)

# add an embedding layer to turn integers into fixed-length vectors
model.add(Embedding(maxTokens + 1, 128))

# add a fully-connected recurrent layer
model.add(LSTM(64))

# add a dense layer
model.add(Dense(64, activation = 'relu'))

# add softmax classifier
model.add(Dense(1, activation = 'sigmoid'))

In [6]:
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
text_vectorization (TextVect (None, 100)               0         
_________________________________________________________________
embedding (Embedding)        (None, 100, 128)          128128    
_________________________________________________________________
lstm (LSTM)                  (None, 64)                49408     
_________________________________________________________________
dense (Dense)                (None, 64)                4160      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
Total params: 181,761
Trainable params: 181,761
Non-trainable params: 0
_________________________________________________________________


In [7]:
model.fit(trainData, epochs = 10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x23d2b9bd088>

In [20]:
# delete the unzipped imdb dataset (this is just so I can upload to GitHub efficiently)
if os.path.isfile('../datasets/imdb/README'):
    shutil.rmtree('../datasets/imdb')