<p style="text-align:center">
    <a href="https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML311Coursera747-2022-01-01" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# Machine Learning Foundation

## Course 5, Part g: RNN DEMO


## Using RNNs to classify sentiment on IMDB data
For this exercise, we will train a "vanilla" RNN to predict the sentiment on IMDB reviews.  Our data consists of 25000 training sequences and 25000 test sequences.  The outcome is binary (positive/negative) and both outcomes are equally represented in both the training and the test set.

Keras provides a convenient interface to load the data and immediately encode the words into integers (based on the most common words).  This will save us a lot of the drudgery that is usually involved when working with raw text.

We will walk through the preparation of the data and the building of an RNN model.  Then it will be your turn to build your own models (and prepare the data how you see fit).


In [1]:
%%capture
!pip install --upgrade tensorflow

In [2]:
from tensorflow import keras
from tensorflow.keras.utils import pad_sequences
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, Embedding, SimpleRNN
from tensorflow.keras.datasets import imdb

# Initializes weights randomly using a normal (Gaussian) distribution
from tensorflow.keras.initializers import RandomNormal
from tensorflow.keras.initializers import Identity # Identity matrix

In [3]:
max_features = 2000
maxlen = 30 # max length of a sequence - truncate after this
batch_size = 32
# Truncate means cut something short

In [4]:
# Load data
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
(25000,) (25000,)
(25000,) (25000,)


In [5]:
# Truncates the sequences as per 'maxlen'
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test  = pad_sequences(x_test, maxlen=maxlen)
print("x_train shape:", x_train.shape)
print("x_test shape:", x_test.shape)

x_train shape: (25000, 30)
x_test shape: (25000, 30)


In [6]:
# Example sequence looks like
x_train[123, :]

array([ 219,  141,   35,  221,  956,   54,   13,   16,   11,    2,   61,
        322,  423,   12,   38,   76,   59, 1803,   72,    8,    2,   23,
          5,  967,   12,   38,   85,   62,  358,   99], dtype=int32)

## Keras layers for (Vanilla) RNNs

In this exercise, we will not use pre-trained word vectors.  Rather we will learn an embedding as part of the Neural Network.  This is represented by the Embedding Layer below.

### Embedding Layer
`keras.layers.embeddings.Embedding(input_dim, output_dim, embeddings_initializer='uniform', embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None)`

- This layer maps each integer into a distinct (dense) word vector of length `output_dim`.
- Can think of this as learning a word vector embedding "on the fly" rather than using an existing mapping (like GloVe)
- The `input_dim` should be the size of the vocabulary.
- The `input_length` specifies the length of the sequences that the network expects.

### SimpleRNN Layer
`keras.layers.recurrent.SimpleRNN(units, activation='tanh', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)`

- This is the basic RNN, where the output is also fed back as the "hidden state" to the next iteration.
- The parameter `units` gives the dimensionality of the output (and therefore the hidden state).  Note that typically there will be another layer after the RNN mapping the (RNN) output to the network output.  So we should think of this value as the desired dimensionality of the hidden state and not necessarily the desired output of the network.
- Recall that there are two sets of weights, one for the "recurrent" phase and the other for the "kernel" phase.  These can be configured separately in terms of their initialization, regularization, etc.






In [7]:
rnn_hidden_dim = 5
word_embedding_dim = 50
model_rnn = Sequential()
model_rnn.add(Embedding(input_dim = max_features,
                        output_dim = word_embedding_dim))
model_rnn.add(SimpleRNN(units = rnn_hidden_dim, # no of hidden neurons
                        # used for linear transformation of the inputs
                        kernel_initializer = RandomNormal(stddev=0.001), # random values
                        # used for the linear transformation of recurrent state
                        recurrent_initializer = Identity(gain=1.0),
                        activation = 'relu',
                        input_shape = x_train.shape[1]))
# Note: Since ReLU + RNN can explode, kernel is small and
# recurrent uses identity
model_rnn.add(Dense(1, activation='sigmoid'))
model_rnn.summary()

  super().__init__(**kwargs)


In [8]:
from tensorflow.keras.optimizers import RMSprop

In [9]:
rmsprop = RMSprop(learning_rate=0.0001)

model_rnn.compile(loss = 'binary_crossentropy',
                  optimizer = rmsprop,
                  metrics = ['accuracy'])

In [10]:
model_rnn.fit(x_train, y_train,
              batch_size = batch_size,
              epochs = 10,
              validation_data = (x_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 11ms/step - accuracy: 0.5110 - loss: 0.6920 - val_accuracy: 0.6164 - val_loss: 0.6754
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.6281 - loss: 0.6615 - val_accuracy: 0.6699 - val_loss: 0.6310
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.6982 - loss: 0.6127 - val_accuracy: 0.7055 - val_loss: 0.5904
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.7235 - loss: 0.5724 - val_accuracy: 0.7202 - val_loss: 0.5530
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.7437 - loss: 0.5334 - val_accuracy: 0.7318 - val_loss: 0.5275
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 11ms/step - accuracy: 0.7570 - loss: 0.5049 - val_accuracy: 0.7432 - val_loss: 0.5109
Epoch 7/10
[1m782

<keras.src.callbacks.history.History at 0x7ad762694fb0>

In [11]:
score, acc = model_rnn.evaluate(x_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.7677 - loss: 0.4719
Test score: 0.4711362421512604
Test accuracy: 0.7692800164222717


## Exercise

In this exercise, we will illustrate:
- Preparing the data to use sequences of length 80 rather than length 30.  Does it improve the performance?
- Trying different values of the "max_features".  Does this  improve the performance?
- Trying smaller and larger sizes of the RNN hidden dimension.  How does it affect the model performance?  How does it affect the run time?


In [22]:
def create_rnn_model(max_features, maxlen, learning_rate):
    # Generate data
    (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = max_features)
    x_train = pad_sequences(x_train, maxlen = maxlen)
    x_test = pad_sequences(x_test, maxlen = maxlen)

    # Construct mdoel architecture
    model = Sequential()
    model.add(Embedding(input_dim = max_features,
                            output_dim = word_embedding_dim))
    model.add(SimpleRNN(units = rnn_hidden_dim,# no of hidden neurons
                            # used for linear transformation of the inputs
                            kernel_initializer = RandomNormal(stddev = 0.001),
                            # used for the linear transformation of recurrent state
                            recurrent_initializer = Identity(gain = 1.0),
                            activation = 'relu',
                            input_shape = x_train.shape[1:]))
    # Note: Since ReLU + RNN can explode, kernel is small and
    # recurrent uses identity
    model.add(Dense(1, activation='sigmoid'))
    print("Model architecture:")
    display(model.summary())

    rmsprop = RMSprop(learning_rate=learning_rate)
    model.compile(loss = 'binary_crossentropy',
                    optimizer = rmsprop,
                    metrics = ['accuracy'])

    model.fit(x_train, y_train,
                batch_size = batch_size,
                epochs = 10,
                validation_data = (x_test, y_test))


    score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
    print('Test score:', score)
    print('Test accuracy:', acc)

In [12]:
max_features = 20000
maxlen = 80

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = max_features)
x_train = pad_sequences(x_train, maxlen = maxlen)
x_test = pad_sequences(x_test, maxlen = maxlen)

In [13]:
rnn_hidden_dim = 5
word_embedding_dim = 50
model_rnn = Sequential()
model_rnn.add(Embedding(input_dim = max_features,
                        output_dim = word_embedding_dim))
model_rnn.add(SimpleRNN(units = rnn_hidden_dim,# no of hidden neurons
                        # used for linear transformation of the inputs
                        kernel_initializer = RandomNormal(stddev = 0.001),
                        # used for the linear transformation of recurrent state
                        recurrent_initializer = Identity(gain = 1.0),
                        activation = 'relu',
                        input_shape = x_train.shape[1:]))
# Note: Since ReLU + RNN can explode, kernel is small and
# recurrent uses identity
model_rnn.add(Dense(1, activation='sigmoid'))
model_rnn.summary()

In [14]:
rmsprop = RMSprop(learning_rate=0.0001)

model_rnn.compile(loss = 'binary_crossentropy',
                  optimizer = rmsprop,
                  metrics = ['accuracy'])

model_rnn.fit(x_train, y_train,
              batch_size = batch_size,
              epochs = 10,
              validation_data = (x_test, y_test))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 29ms/step - accuracy: 0.5244 - loss: 0.6843 - val_accuracy: 0.7288 - val_loss: 0.6171
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 29ms/step - accuracy: 0.7304 - loss: 0.5983 - val_accuracy: 0.7918 - val_loss: 0.5862
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 28ms/step - accuracy: 0.7923 - loss: 0.5534 - val_accuracy: 0.7846 - val_loss: 0.5559
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 30ms/step - accuracy: 0.8146 - loss: 0.5237 - val_accuracy: 0.8034 - val_loss: 0.5417
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 30ms/step - accuracy: 0.8295 - loss: 0.5028 - val_accuracy: 0.7303 - val_loss: 0.5811
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 30ms/step - accuracy: 0.8370 - loss: 0.4828 - val_accuracy: 0.8096 - val_loss: 0.5129
Epoch 7/10
[1m7

<keras.src.callbacks.history.History at 0x7ad76b6592e0>

In [15]:
score, acc = model_rnn.evaluate(x_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.8151 - loss: 0.4637
Test score: 0.47113022208213806
Test accuracy: 0.8133599758148193


In [21]:
max_features = 5000
maxlen = 80
learning_rate = 0.0001

create_rnn_model(max_features=5000, maxlen=80, learning_rate=0.0001)

Model architecture:


  super().__init__(**kwargs)


None

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 24ms/step - accuracy: 0.5566 - loss: 0.6756 - val_accuracy: 0.7270 - val_loss: 0.5477
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 25ms/step - accuracy: 0.7457 - loss: 0.5158 - val_accuracy: 0.7684 - val_loss: 0.4889
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 25ms/step - accuracy: 0.8037 - loss: 0.4382 - val_accuracy: 0.7990 - val_loss: 0.4352
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 24ms/step - accuracy: 0.8277 - loss: 0.3913 - val_accuracy: 0.8025 - val_loss: 0.4195
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 26ms/step - accuracy: 0.8459 - loss: 0.3579 - val_accuracy: 0.8225 - val_loss: 0.3963
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 24ms/step - accuracy: 0.8524 - loss: 0.3438 - val_accuracy: 0.8196 - val_loss: 0.3934
Epoch 7/10
[1m7

---
### Machine Learning Foundation (C) 2020 IBM Corporation
