
# Gated Recurrent Units (GRUs) with Expanded Details

This notebook provides an overview of Gated Recurrent Units (GRUs), including their architecture, how they work, implementation on a dataset, and hyperparameter tuning.



## Background

Gated Recurrent Units (GRUs) are a type of recurrent neural network (RNN) architecture designed to address the vanishing gradient problem in standard RNNs. GRUs are a simpler alternative to Long Short-Term Memory (LSTM) networks, as they have fewer parameters and are faster to train.

### Key Features of GRUs
- **Gating Mechanisms**: GRUs use update and reset gates to control the flow of information.
- **Simplified Architecture**: Compared to LSTMs, GRUs have fewer parameters, making them computationally more efficient.
- **Applications**: GRUs are used in tasks like language modeling, speech recognition, and time series prediction.

### GRU Cell
A GRU cell has two main components:
- **Update Gate**: Determines how much of the past information to keep.
- **Reset Gate**: Determines how much of the past information to forget.

The GRU architecture combines the hidden state and the cell state into a single vector, simplifying the model and making it more efficient.



## Mathematical Foundation

### The GRU Cell

A GRU cell updates its hidden state \( h_t \) based on the previous hidden state \( h_{t-1} \) and the current input \( x_t \):

1. **Reset Gate** \( r_t \):

\[
r_t = \sigma(W_{xr}x_t + W_{hr}h_{t-1} + b_r)
\]

2. **Update Gate** \( z_t \):

\[
z_t = \sigma(W_{xz}x_t + W_{hz}h_{t-1} + b_z)
\]

3. **Candidate Hidden State** \( \tilde{h}_t \):

\[
\tilde{h}_t = \tanh(W_{xh}x_t + r_t \ast (W_{hh}h_{t-1}) + b_h)
\]

4. **Final Hidden State** \( h_t \):

\[
h_t = (1 - z_t) \ast h_{t-1} + z_t \ast \tilde{h}_t
\]

Where:
- \( \sigma \) is the sigmoid activation function.
- \( \ast \) denotes element-wise multiplication.
- \( W \) and \( b \) are the weight matrices and bias vectors.



## Implementation in Python

We'll implement a GRU using TensorFlow and Keras on a text sequence dataset (e.g., IMDB movie reviews).


In [None]:

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

# Load the IMDB dataset
max_features = 10000  # Number of words to consider as features
maxlen = 500  # Cut texts after this number of words
batch_size = 32

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to ensure uniform input length
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# Define the GRU model
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(GRU(128))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the GRU model
model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.2)

# Evaluate the model
print("GRU Evaluation:")
model.evaluate(x_test, y_test)



## Hyperparameter Tuning

We'll perform hyperparameter tuning using Keras Tuner to find the best values for parameters such as the number of units in the GRU layer, dropout rate, and learning rate.


In [None]:
!pip install keras_tuner
import keras_tuner as kt

def model_builder(hp):
    model = Sequential()
    model.add(Embedding(max_features, 128))
    
    # Tune the number of units in the GRU layer
    hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
    
    model.add(GRU(hp_units))
    model.add(Dense(1, activation='sigmoid'))
    
    # Tune the learning rate for the optimizer
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
    
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    
    return model

tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     max_epochs=10,
                     factor=3,
                     directory='my_dir',
                     project_name='gru_tuning')

stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

tuner.search(x_train, y_train, epochs=10, validation_split=0.2, callbacks=[stop_early])

# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"The optimal number of units in the GRU layer is {best_hps.get('units')}.")
print(f"The optimal learning rate is {best_hps.get('learning_rate')}.")

# Build the model with the optimal hyperparameters and train it
model = tuner.hypermodel.build(best_hps)
model.fit(x_train, y_train, epochs=10, validation_split=0.2)
model.evaluate(x_test, y_test)



## Conclusion

In this notebook, we've explored Gated Recurrent Units (GRUs), including their basic architecture, implementation on text data, and hyperparameter tuning. GRUs are a powerful tool for handling sequential data, particularly when efficiency is a concern compared to LSTMs.
