
# Recurrent Neural Networks (RNN) with Expanded Details

This notebook provides an overview of Recurrent Neural Networks (RNN), including their architecture, how they work, implementation on multiple datasets, and hyperparameter tuning.



## Background

Recurrent Neural Networks (RNNs) are a type of neural network architecture designed to recognize patterns in sequences of data, such as time series or natural language. Unlike traditional feedforward neural networks, RNNs have connections that form directed cycles, enabling them to maintain a 'memory' of previous inputs.

### Key Features of RNNs
- **Memory**: RNNs retain information from previous inputs, which is crucial for tasks where context is important.
- **Weights Sharing**: The same weights are used across all time steps, making RNNs efficient for sequence processing.
- **Applications**: RNNs are widely used in tasks such as language modeling, machine translation, speech recognition, and time series forecasting.

### Types of RNNs
- **Simple RNN**: The basic form of RNN.
- **LSTM (Long Short-Term Memory)**: A more complex variant designed to handle long-term dependencies.
- **GRU (Gated Recurrent Unit)**: A variant similar to LSTM but with a simplified architecture.



## Mathematical Foundation

### The RNN Cell

An RNN cell takes an input \( x_t \) at time step \( t \) and updates its hidden state \( h_t \) based on the previous hidden state \( h_{t-1} \):

\[
h_t = \tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)
\]

Where:
- \( W_{xh} \) and \( W_{hh} \) are weight matrices.
- \( b_h \) is the bias term.
- \( \tanh \) is the activation function.

The output \( y_t \) is typically computed as:

\[
y_t = W_{hy}h_t + b_y
\]

### LSTM and GRU

LSTM and GRU are variants of RNN that include gating mechanisms to better capture long-term dependencies and prevent issues like vanishing gradients.

#### LSTM

An LSTM cell contains three gates:
- **Forget Gate**: Controls what information to discard from the cell state.
- **Input Gate**: Controls what information to add to the cell state.
- **Output Gate**: Controls what information to output.

#### GRU

A GRU cell simplifies the LSTM by combining the forget and input gates into a single update gate.



## Implementation in Python

We'll implement RNNs, LSTM, and GRU using TensorFlow and Keras on a text sequence dataset (e.g., IMDB movie reviews).


In [1]:

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, LSTM, GRU, Dense

# Load the IMDB dataset
max_features = 10000  # Number of words to consider as features
maxlen = 500  # Cut texts after this number of words
batch_size = 32

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to ensure uniform input length
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# Define a function to create models
def create_model(cell_type='SimpleRNN'):
    model = Sequential()
    model.add(Embedding(max_features, 128))
    if cell_type == 'SimpleRNN':
        model.add(SimpleRNN(128))
    elif cell_type == 'LSTM':
        model.add(LSTM(128))
    elif cell_type == 'GRU':
        model.add(GRU(128))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate SimpleRNN
simple_rnn_model = create_model('SimpleRNN')
simple_rnn_model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.2)
print("SimpleRNN Evaluation:")
simple_rnn_model.evaluate(x_test, y_test)

# Train and evaluate LSTM
lstm_model = create_model('LSTM')
lstm_model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.2)
print("LSTM Evaluation:")
lstm_model.evaluate(x_test, y_test)

# Train and evaluate GRU
gru_model = create_model('GRU')
gru_model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.2)
print("GRU Evaluation:")
gru_model.evaluate(x_test, y_test)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 65ms/step - accuracy: 0.5541 - loss: 0.6824 - val_accuracy: 0.7208 - val_loss: 0.5476
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 55ms/step - accuracy: 0.7223 - loss: 0.5457 - val_accuracy: 0.6504 - val_loss: 0.6046
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 54ms/step - accuracy: 0.7982 - loss: 0.4497 - val_accuracy: 0.6810 - val_loss: 0.6020
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 53ms/step - accuracy: 0.7905 - loss: 0.4519 - val_accuracy: 0.6196 - val_loss: 0.6525
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 54ms/step - accuracy: 0.6975 - loss: 0.5644 - val_accuracy: 0.6592 - val_loss: 0.6383
SimpleRNN

[0.4817153811454773, 0.8562800288200378]


## Hyperparameter Tuning

We'll perform hyperparameter tuning using Keras Tuner to find the best values for parameters such as the number of units in the RNN layers, dropout rate, and learning rate.


In [None]:
!pip install keras_tuner
import keras_tuner as kt

def model_builder(hp):
    model = Sequential()
    model.add(Embedding(max_features, 128))

    # Tune the number of units in the RNN layers
    hp_units = hp.Int('units', min_value=32, max_value=512, step=32)

    # Choose between SimpleRNN, LSTM, and GRU
    hp_rnn_type = hp.Choice('rnn_type', values=['SimpleRNN', 'LSTM', 'GRU'])

    if hp_rnn_type == 'SimpleRNN':
        model.add(SimpleRNN(hp_units))
    elif hp_rnn_type == 'LSTM':
        model.add(LSTM(hp_units))
    elif hp_rnn_type == 'GRU':
        model.add(GRU(hp_units))

    model.add(Dense(1, activation='sigmoid'))

    # Tune the learning rate for the optimizer
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

    return model

tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     max_epochs=10,
                     factor=3,
                     directory='my_dir',
                     project_name='intro_to_kt')

stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

tuner.search(x_train, y_train, epochs=10, validation_split=0.2, callbacks=[stop_early])

# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"The optimal number of units in the RNN layers is {best_hps.get('units')}.")
print(f"The optimal type of RNN is {best_hps.get('rnn_type')}.")
print(f"The optimal learning rate is {best_hps.get('learning_rate')}.")

# Build the model with the optimal hyperparameters and train it
model = tuner.hypermodel.build(best_hps)
model.fit(x_train, y_train, epochs=10, validation_split=0.2)
model.evaluate(x_test, y_test)


Collecting keras_tuner
  Downloading keras_tuner-1.4.7-py3-none-any.whl.metadata (5.4 kB)
Collecting kt-legacy (from keras_tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl.metadata (221 bytes)
Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras_tuner
Successfully installed keras_tuner-1.4.7 kt-legacy-1.0.5

Search: Running Trial #1

Value             |Best Value So Far |Hyperparameter
448               |448               |units
SimpleRNN         |SimpleRNN         |rnn_type
0.001             |0.001             |learning_rate
2                 |2                 |tuner/epochs
0                 |0                 |tuner/initial_epoch
2                 |2                 |tuner/bracket
0                 |0                 |tuner/round

Epoch 1/2
[1m625/62


## Conclusion

In this notebook, we've explored Recurrent Neural Networks (RNNs), including their basic architecture, variants like LSTM and GRU, implementation on text data, and hyperparameter tuning. RNNs are a versatile tool for handling sequential data and are widely used in various applications like natural language processing and time series forecasting.
