# RNN, LSTM, and GRU

## Introduction
This notebook contains some of the details for the implementation of Recurrent Neural Networks (RNNs) and two variants of it, the LSTM and GRU. To see the full implementation, see the source code [here](https://github.com/hongjos/rnn). Recurrent Neural Networks (RNNs) were introduced in the 1980s, with the aim of modeling sequences of data, such as text, music, and time series. However, early RNN models faced challenges with learning long-term dependencies, which limited their effectiveness. To deal with this, variants of the RNN were introduced later such as the LSTM and GRU.

To test and compare implementations of the RNN, LSTM and GRU, we will use the [IMDB Movie Reviews dataset](https://keras.io/api/datasets/imdb/) for sentiment classification.

This notebook just gives a standard implementation of the models to get a better understanding of them. You shouldn't actually use this implementation to do real life stuff. You should be using the ones from [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) or [TensorFlow](https://www.tensorflow.org/guide/keras/rnn).


## Prelims
The implementation for the RNN will primariy use `numpy`. Other libraries used in this notebook are just for getting the data and visualizing the results. 

In [3]:
from rnn import RNN
from lstm import LSTM
from gru import GRU

import numpy as np
import math, time
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from keras.datasets import imdb # used for evaluation

### One-Hot Encoding
To represent words as numerical values, we can represent each word as a vector with a 1 in the index that identifies the word and 0s elsewhere. For example, if you were a baby and only knew the words 'goo' and 'ga'. We can use 0 to represent the 'goo' and 1 for 'ga'. Then the phrase 'goo goo ga ga' would look something like:

In [14]:
vocab = {'goo' : 0, 'ga' : 1}
phrase = ['goo', 'goo', 'ga', 'ga']

[vocab[x] for x in phrase]

[0, 0, 1, 1]

The one-hot encoding would look something like:

In [13]:
seq = np.zeros((len(phrase), 2))

for i, word in enumerate(phrase):
    seq[i][vocab[word]] = 1

seq

array([[1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.]])

To test if the implementations are working as intended, we will fit the model to a very small dataset with 3 movie reviews for sentiment classification.

| Review | Class |
| -------- | ---- |
| this movie is good | Positive |
| this movie is bad | Negative |
| this movie is not good | Negative |

Our test set will contain the review:
| Review | Class |
| -------- | ---- |
| this movie is not bad | Positive |

0 will represent the positive sentiment class, and 1 for the negative.

In [18]:
vocab = {"this" : 0, "movie" : 1, "is" : 2, "good": 3, "bad" : 4, "not" : 5}
vocab_size = 6

xtrain_string = ["this movie is good", "this movie is bad", "this movie is not good"]
ytrain_val = [0, 1, 1]
xtest_string = ["this movie is not bad"]
ytest_val = [0]

xtrain, ytrain, xtest, ytest = [], [], [], []

def one_hot_x(lst, size=6):
    """
    One-hot enconding for a list of strings sequences.
    """
    ret = []

    # go thru list and make string sequence into vectors
    for s in lst:
        seq = np.zeros((len(s.split()), size, 1))

        # one-hot encode the sequence
        for i,val in enumerate(s.split()):
            seq[i][vocab[val]] = 1
        ret.append(seq) # add to list
        
    return ret

def one_hot_y(lst, num_class=2):
    """
    One-hot enconding for the classes.
    """
    ret = []
    # go thru list and make classes into vectors.
    for x in lst:
        vec = np.zeros((num_class, 1))
        vec[x] = 1 # one-hot encode the class
        ret.append(vec) # add to list
    return ret


xtrain, xtest = one_hot_x(xtrain_string), one_hot_x(xtest_string)
ytrain, ytest = one_hot_y(ytrain_val), one_hot_y(ytest_val)

array([[[1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]],

       [[0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.]],

       [[0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [0.]],

       [[0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.]]])

## RNN

<div>
<img src="https://stanford.edu/~shervine/teaching/cs-230/illustrations/architecture-rnn-ltr.png?9ea4417fc145b9346a3e288801dbdfdc" width="800"/>
</div>

# TESTING

In [5]:
test = {'a' : 1, 'b' : 2}
for t in test:
    name = 'd' + t
    print(name)

da
db


In [10]:
from rnn import RNN
from lstm import LSTM
from gru import GRU
import numpy as np

vocab_size = 5

###
seq1 = np.zeros((3,vocab_size,1))
s1 = [0,2,4]

for i,val in enumerate(s1):
    seq1[i][val] = 1

y1 = np.zeros((2,1))
y1[0] = 1


###
seq2 = np.zeros((4,vocab_size,1))
s2 = [1,3,2,0]

for i,val in enumerate(s2):
    seq2[i][val] = 1

y2 = np.zeros((2,1))
y2[1] = 1

###
seq3 = np.zeros((3,vocab_size,1))
s3 = [0,3,1]

for i,val in enumerate(s3):
    seq3[i][val] = 1

y3 = np.zeros((2,1))
y3[0] = 1

Xtrain = [seq1, seq2]
Ytrain = [y1, y2]

Xtest = [seq3]
Ytest = [y3]

rnntest = RNN(input_dim=vocab_size, output_dim=2, hidden_dim=5, learning_rate=.05)
lstmtest = LSTM(input_dim=vocab_size, output_dim=2, hidden_dim=5, learning_rate=.3)
grutest = GRU(input_dim=vocab_size, output_dim=2, hidden_dim=5, learning_rate=.3)

epochs = 30

print("rnn")
rnntest.fit(Xtrain, Ytrain, num_epochs=epochs)
rnntest.evaluate(Xtest, Ytest)

print("lstm")
lstmtest.fit(Xtrain, Ytrain, num_epochs=epochs)
lstmtest.evaluate(Xtest, Ytest)

print("gru")
grutest.fit(Xtrain, Ytrain, num_epochs=epochs)
grutest.evaluate(Xtest, Ytest)

rnn
Epoch 0, training loss: 1.8445101635539731
Epoch 5, training loss: 1.3735530882162283
Epoch 10, training loss: 0.580944937868262
Epoch 15, training loss: 0.22389386677896467
Epoch 20, training loss: 0.11914532106805094
Epoch 25, training loss: 0.07427093843491467
lstm
Epoch 0, training loss: 1.9206830636450158
Epoch 5, training loss: 1.8123636807488614
Epoch 10, training loss: 0.12019302831117626
Epoch 15, training loss: 0.03901948314936534
Epoch 20, training loss: 0.02313178273642524
Epoch 25, training loss: 0.016333873033484207
gru
Epoch 0, training loss: 3.3360171694868272
Epoch 5, training loss: 1.093418181528314
Epoch 10, training loss: 0.07985457717012517
Epoch 15, training loss: 0.03903190652096257
Epoch 20, training loss: 0.02518610745585075
Epoch 25, training loss: 0.0183439745802757


0.0

## Evaluation on IMDB Dataset

Words are indexed by overall frequency in the dataset e.g. `3` encodes the third most frequent word.


In [46]:
from keras.datasets import imdb
import numpy as np
vocab_size = 100

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)

word_index = imdb.get_word_index()
index_to_word = dict([(value,key) for (key,value) in word_index.items()])

(X_train_small, y_train_small) = (X_train[0:100], y_train[0:100])
(X_test_small, y_test_small) = (X_test[0:100], y_test[0:100])

In [47]:
def decode(input):
    sentence = []
    for i in input:
        sentence.append(index_to_word.get(i-3, '!'))
    print(" ".join(sentence))
    return sentence

def one_hot(value, vec_size):
    """
    Given a scalar returns the one-hot encoding vector.
    """
    # initialize input matrix
    x = np.zeros((vec_size, 1))
    x[value] = 1
        
    return x

def subtract_one(X):
    """
    Subtracts one from each value.
    """
    for i in range(X.size):
        X[i] = [x-1 for x in X[i]]
    
    return X

def one_hot_x(X):
    oh = [None] * X.size
    for i, input in enumerate(X):
        xx = np.zeros((len(input), vocab_size, 1))
        for j, val in enumerate(input):
            xx[j][val] = 1
        oh[i] = xx
    
    return oh

def one_hot_y(Y):
    oh = [None] * Y.size
    for i, y in enumerate(Y):
        oh[i] = one_hot(y, 2)
    
    return oh

In [48]:
# subtract ones
X_train_small = subtract_one(X_train_small)
X_test_small = subtract_one(X_test_small)  

In [49]:
tx = one_hot_x(X_train_small)
ty = one_hot_y(y_train_small)
testingx = one_hot_x(X_test_small)
testingy = one_hot_y(y_test_small)

In [51]:
from rnn import RNN
RNNModel = RNN(input_dim=vocab_size, hidden_dim=10, output_dim=2, learning_rate=1e-4)
RNNModel.fit(tx, ty, 10)

Epoch 1, training loss: 38.02021500786034
Epoch 2, training loss: 37.88570052221522
Epoch 3, training loss: 37.76739698482241
Epoch 4, training loss: 37.66479923610425
Epoch 5, training loss: 37.57031353036847
Epoch 6, training loss: 37.48792986875993
Epoch 7, training loss: 37.406753961560895
Epoch 8, training loss: 37.32507970195589
Epoch 9, training loss: 37.24167319569322
Epoch 10, training loss: 37.158977623183155


[38.02021500786034,
 37.88570052221522,
 37.76739698482241,
 37.66479923610425,
 37.57031353036847,
 37.48792986875993,
 37.406753961560895,
 37.32507970195589,
 37.24167319569322,
 37.158977623183155]

In [45]:
RNNModel.evaluate(testingx, testingy)
# RNNModel.evaluate(tx, ty)

0.43