<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="cognitiveclass.ai logo">
</center>


# **Recurrent Neural Networks**


Estimated time needed: **45** minutes

A recurrent neural network (RNN) is a type of artificial neural network which uses sequential data or time series data as input. Its typically used for ordinal or temporal problems like language translation, speech recognition, and time series forecasting.

In this lab, we will understand the fundamental building blocks of an RNN. We will train a simple binary text classifier on top of an existing pre-trained module that embeds sentences.


## __Table of Contents__

<ol>
    <li><a href="#Objectives">Objectives</a></li>
    <li>
        <a href="#Setup">Setup</a>
        <ol>
            <li><a href="#Installing-Required-Libraries">Installing Required Libraries</a></li>
            <li><a href="#Importing-Required-Libraries">Importing Required Libraries</a></li>
            <li><a href="#Defining-Helper-Functions">Defining Helper Functions</a></li>
        </ol>
    </li>
    <li>
        <a href="#RNN-Fundamentals">RNN Fundamentals</a>
        <ol>
            <li><a href="#Vanilla-Recurrent-Neural-Network"> Vanilla Recurrent Neural Network</a></li>
            <li><a href="#Unrolling-in-time-of-a-RNN">Unrolling in time of a RNN</a></li>
            <li><a href="#Training-an-RNN">Training an RNN</a></li>
        </ol>
    </li>
    <li><a href="#Types-of-RNNs">Types of RNNs</a></li>
    <li><a href="#Pre-trained-RNNs">Pre-trained RNNs</a></li>
</ol>


## Objectives

After completing this lab you will be able to:

 - Describe the fundamental building blocks of RNNs.
 - Implement pre-trained RNNs to solve time-series prediction, and forecasting, and text classification tasks


----


## Setup


For this lab, we will be using the following libraries:

*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`numpy`](https://numpy.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for mathematical operations.
*   [`sklearn`](https://scikit-learn.org/stable/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for machine learning and machine-learning-pipeline related functions.
*   [`seaborn`](https://seaborn.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for visualizing the data.
*   [`matplotlib`](https://matplotlib.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for additional plotting tools.


### Installing Required Libraries

The following required libraries are pre-installed in the Skills Network Labs environment. However, if you run these notebook commands in a different Jupyter environment (like Watson Studio or Ananconda), you will need to install these libraries by removing the `#` sign before `!mamba` in the code cell below.


In [1]:
# All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented.
# !mamba install -qy pandas numpy seaborn matplotlib scikit-learn
# Note: If your environment doesn't support "!mamba install", use "!pip install"

The following required libraries are __not__ pre-installed in the Skills Network Labs environment. __You will need to run the following cell__ to install them:


In [2]:
%%capture

!pip install tensorflow_hub
!pip install tensorflow --upgrade
!mamba install -qy tqdm
!pip install skillsnetwork

### Importing Required Libraries


```python
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import tensorflow as tf
print(tf. __version__)
import skillsnetwork
from tensorflow import keras
from tensorflow.keras import layers
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.losses import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Embedding,Masking,LSTM, GRU, Conv1D, Dropout
from tensorflow.keras.optimizers import Adam
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, SimpleRNN
from tensorflow.keras.datasets import reuters
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from sklearn.metrics import accuracy_score,precision_recall_fscore_support
import tensorflow_hub as hub


# You can also use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

sns.set_context('notebook')
sns.set_style('white')
np.random.seed(2024)
```

In [3]:
import tensorflow as tf
print(tf.__version__)

2.20.0


In [4]:
# General modules
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import skillsnetwork

# Tensorflow module
import tensorflow_hub as hub
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import reuters
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.layers import Masking, LSTM, GRU, Conv1D
from tensorflow.keras.layers import Dense, Dropout, Embedding, SimpleRNN, TextVectorization

# Scikit-Learn modules
from sklearn.model_selection import train_test_split as TTS
from sklearn.metrics import accuracy_score, precision_recall_fscore_support as PRFS

%matplotlib inline

def warn(*ares, **kwargs):
    pass

import warnings
warnings.filterwarnings('ignore')

sns.set_context('notebook')
sns.set_style('white')
np.random.seed(2024)

## Helper Functions


In [5]:
# function to compute the accuracy, precision, recall and F1 score of a model's predictions.
def calculate_results(y_true, y_pred):
    model_accuracy = accuracy_score(y_true, y_pred)
    model_precision, model_recall, model_f1, _ = PRFS(y_true, y_pred, average="weighted")
    model_results = {"accuracy" : model_accuracy,
                     "precision": model_precision,
                     "recall"   : model_recall,
                     "f1"       : model_f1}

    return model_results

## RNN Fundamentals

RNNs fall in the category of neural networks that maintain some kind of **state**. They can process sequential data of arbitrary length. By doing so, they overcome certain limitations faced by classical neural networks. Classical NNs only accept fixed-length vectors as input and output fixed-length vectors. RNNs operate over sequences of vectors. Classical NNs aren't built to consider the sequential nature of some data. RNNs work with sequential data forms like language, video frames, time series, and so on.

The RNN layer uses a for-loop to iterate over the time-steps of a sequence, and maintains an internal state that encodes information about all time-steps that have been observed so far. The Keras RNN API has built-in `keras.layers.RNN` and `keras.layers.LSTM` layers that make it easy to quickly build RNN models.


### Vanilla Recurrent Neural Network

RNNs use these two simple formulas:

$$ \mathbf s_t = \mbox{tanh }(U \mathbf x_t + W \mathbf s_{t-1}) $$

$$ \mathbf y_t = V \mathbf s_t $$

The following plot shows the hyperbolic tan function, `tanh`:

<img src="https://github.com/DataScienceUB/DeepLearningMaster2019/blob/master/images/TanhReal.gif?raw=1" alt="" style="width: 300px;">

#### Terminology:
* $s_t$ current network, or the hidden state
* $\mathbf s_{t-1}$ previous hidden state
* $\mathbf x_t$ current input
* $U, V, W$ matrices that are parameters of the RNN
* $\mathbf y_t$ output at time $t$

These equations say that the current network state or the hidden state, is a function of the previous hidden state and the current input.

### Unrolling in time of a RNN

Given an input sequence, we apply RNN formulas in a recurrent way until we process all input elements. The $U,V,W$ parameters are shared across all recurrent steps. This implies that at each time step, the output is a function of all inputs from previous time steps. The network has a form of memory, encoding information about the time-steps it has seen so far.

Some important observations:
- The initial values for $U,V,W$ as well as for $\mathbf s$ must be provided when training an RNN.
- Hidden state  acts as a memory of the network. It can capture information about the previous steps. It embeds the representation of the sequence.
- We can look at the network's output at every stage or just the final stage.

### Training an RNN

A RNN has a layer for each time step, and its weights are shared across time. It is trained using backpropagation through time, and is done using the following steps:
- The input or the training set is made of several input ($n$-dimensional) sequences $\{\mathbf{X}_i \}$ and corresponding outcomes. Each element of a sequence $\mathbf{x}_j \in \mathbf{X}_i$ is also a vector.
- We use a loss function to measure how well the network's output fits to the expected outcome, such as ground truth.
- We apply an optimization method like stochastic gradient descent or Adam to optimize the loss function
- After the forward pass, gradients of the cost function are propagated backwards through the unrolled network


## Types of RNNs

Predicting the output, $y_t$, at each time step is not always the case. Different RNN architectures can be used to solve different kinds of problems.


|Type|Input|Output|Example problem
|-|-|-|-
|*many-to-many*|An input sequence|An output sequence|Part of Speech (POS) tagging
|*many-to-one*|An input sequence|Value of output sequence for last timestep|Text classification: positive tweet or negative?
|*one-to-many*|Single value of input sequence|An output sequence| Given an input image, predict sequence data


## Pre-trained RNNs


In this section, we will be experimenting with existing RNNs. We will use the NLP disaster dataset. The dataset contains a `test.csv` and a `train.csv` each of which have the following information:

* The text of a tweet
* A keyword from that tweet (although this may be blank!)
* The location the tweet was sent from (may also be blank)

Our task is to predict whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.


Let us start by downloading and unzipping the dataset.


In [6]:
await skillsnetwork.prepare("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML311-Coursera/labs/Module4/L1/nlp_disaster.zip",overwrite=True)


Downloading nlp_disaster.zip:   0%|          | 0/607343 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

Saved to '.'


Now we will read in the train dataset. Here we use `frac=1` so all rows in the training dataset are returned in a random order. We also set a random state to ensure reproducibility of results.


In [7]:
train_df = pd.read_csv("train.csv")
# shuffle the dataset
train_df_shuffled = train_df.sample(frac=1, random_state=42)
train_df_shuffled.head()

Unnamed: 0,id,keyword,location,text,target
2644,3796,destruction,,So you have a new weapon that can cause un-ima...,1
2227,3185,deluge,,The f$&amp;@ing things I do for #GISHWHES Just...,0
5448,7769,police,UK,DT @georgegalloway: RT @Galloway4Mayor: ÛÏThe...,1
132,191,aftershock,,Aftershock back to school kick off was great. ...,0
6845,9810,trauma,"Montgomery County, MD",in response to trauma Children of Addicts deve...,0


We will use 90% of the entire labelled dataset for training, and 10% of it for testing purposes.


In [8]:
X_train, X_test, y_train, y_test = TTS(train_df_shuffled['text'].values,
                                       train_df_shuffled['target'],
                                       test_size = 0.1,
                                       random_state = 42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((6851,), (762,), (6851,), (762,))

In [9]:
X_train[0:5]

array(['@mogacola @zamtriossu i screamed after hitting tweet',
       'Imagine getting flattened by Kurt Zouma',
       '@Gurmeetramrahim #MSGDoing111WelfareWorks Green S welfare force ke appx 65000 members har time disaster victim ki help ke liye tyar hai....',
       "@shakjn @C7 @Magnums im shaking in fear he's gonna hack the planet",
       'Somehow find you and I collide http://t.co/Ee8RpOahPk'],
      dtype=object)

`TextVectorization` is a preprocessing layer which maps text features to integer sequences. We also specify `lower_and_strip_punctuation` as the standardization method to apply to the input text. The text will be lowercased and all punctuation removed. Next we split on the whitespace, and pass `None` to `ngrams` so no ngrams are created.



In [10]:
text_vectorizer = TextVectorization(# Don't limit vocabulary size
                                    max_tokens = None,
                                    # Remove punctuation and make letters lowercase
                                    standardize = "lower_and_strip_punctuation",
                                    # Split text based on whitespace
                                    split = "whitespace",
                                    # Don't group anything, every token alone
                                    ngrams = None,
                                    # Convert each word to an integer index
                                    output_mode = "int",
                                    # Length of each sentence == length of largest sentence
                                    output_sequence_length = None)

In [11]:
# define hyperparameters
# number of words in the vocabulary
max_vocab_length = 10000
# tweet average length
max_length = 15

Below we define an `Embedding` layer with a vocabulary of 10,000, a vector space of 128 dimensions in which words will be embedded, and input documents that have 15 words each.


In [12]:
# Note use in here
embedding = Embedding(input_dim = max_vocab_length, # vocabulary size
                      output_dim = 128,             # vector length for each word
                      input_length = max_length)    # number of token in each padded sequence
# "hello" → [0.21, -0.33, ..., 0.88] (128 numbers)

The `hub.KerasLayer` wraps a SavedModel (or a legacy TF1 Hub format) as a Keras Layer. The `universal-sentence-encoder` is an encoder of greater-than-word length text trained on a variety of data. It can be used for text classification, semantic similarity, clustering, and other natural language tasks.

> We can train a simple binary text classifier on top of any TF-Hub module that can embed sentences. The Universal Sentence Encoder was partially trained with custom text classification tasks in mind. These kinds of classifiers can be trained to perform a wide variety of classification tasks often with a very small amount of labeled examples.

More on this is found in the Tensorflow Hub [documentation](https://tfhub.dev/google/universal-sentence-encoder/4)


In [15]:
encoder_layer = hub.KerasLayer(# Loads the Universal Sentence Encoder (USE) from TensorFlow Hub
                                "https://tfhub.dev/google/universal-sentence-encoder/4",
                               # A scalar string (just one string at a time)
                               input_shape = [],
                               # Data type of the input
                               dtype = tf.string,
                               # Model weights will not be updated during training
                               trainable = False,
                               # Keras internal name (any name)
                               name = "pretrained")

The `encoder_layer` will take as input variable length English text and the output is a 512 dimensional vector.


We will add a Dense layer with unit 1 to create a simple binary text classifier on top of any TF-Hub module. Next, we will compile and fit it using 20 epochs.


In [31]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Lambda

In [32]:
input_layer = Input(shape=(), dtype=tf.string, name='text_input')
USE = Lambda(lambda x: encoder_layer(x),
                    output_shape = (512,),  # The output of the USE is 512-dimensional vector
                    name = 'encoder_layer')(input_layer)

output_layer = Dense(1, activation='sigmoid', name='output_layer')(USE)

model = Model(inputs=input_layer, outputs=output_layer, name='model_pretrained')
model.summary()

In [34]:
model.compile(loss="binary_crossentropy",
                     optimizer="adam",
                     metrics=["accuracy"])

In [35]:
model.fit(x = X_train,
          y = y_train,
          epochs = 20,
          validation_data = (X_test, y_test))

Epoch 1/20
[1m215/215[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 23ms/step - accuracy: 0.6849 - loss: 0.6683 - val_accuracy: 0.7717 - val_loss: 0.6132
Epoch 2/20
[1m215/215[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 14ms/step - accuracy: 0.7827 - loss: 0.5957 - val_accuracy: 0.7795 - val_loss: 0.5633
Epoch 3/20
[1m215/215[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 14ms/step - accuracy: 0.7922 - loss: 0.5480 - val_accuracy: 0.7861 - val_loss: 0.5312
Epoch 4/20
[1m215/215[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 18ms/step - accuracy: 0.7994 - loss: 0.5185 - val_accuracy: 0.7874 - val_loss: 0.5104
Epoch 5/20
[1m215/215[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 12ms/step - accuracy: 0.7990 - loss: 0.4979 - val_accuracy: 0.7900 - val_loss: 0.4956
Epoch 6/20
[1m215/215[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 12ms/step - accuracy: 0.7997 - loss: 0.4812 - val_accuracy: 0.7913 - val_loss: 0.4852
Epoch 7/20
[1m215/215

<keras.src.callbacks.history.History at 0x7c31678b9220>

In [46]:
ypred = tf.squeeze(tf.round(model.predict(X_test)))
ypred
# tf.round()   --> round to 0 or 1
# tf.squeeze() --> remove extra dimensions (762,1) to (762,)

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step


<tf.Tensor: shape=(762,), dtype=float32, numpy=
array([0., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0.,
       1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 0., 0.,
       1., 0., 1., 0., 0., 1., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., 0.,
       1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0.,
       0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 1., 0., 1.,
       1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 0.,
       0., 1., 1., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1., 1.,
       1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0.,
       1., 0., 0., 0., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1.,
       0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 1., 0., 1., 0.,
       1., 1., 1., 0., 1., 0., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 0.,
       1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.,
       0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0

The model is able to predict the tweet class with a fairly high accuracy.


## Authors


[Kopal Garg](https://www.linkedin.com/in/gargkopal/)


Kopal is a Masters student in Computer Science at the University of Toronto.


## Change Log


|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2022-07-18|0.1|Kopal|Create Lab|
|2022-08-30|0.1|Steve Hord|QA pass edits|


Copyright © 2022 IBM Corporation. All rights reserved.
