<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/gru.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gated Recurrent Unit (GRU)

## Overview

The Gated Recurrent Unit (GRU) is a gating mechanism in recurrent neural networks, introduced in 2014 by Cho et al. as a simpler alternative to the LSTM. GRUs aim to solve the vanishing gradient problem that comes with standard recurrent neural networks, while being more computationally efficient than LSTMs.

## Historical Context

- **Introduced**: 2014 by Kyunghyun Cho and colleagues in their paper "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation"
- **Motivation**: To create a more computationally efficient alternative to LSTMs without sacrificing too much performance
- **Impact**: Quickly became popular for sequence modeling tasks including machine translation and speech processing

## Architecture

GRU uses two gates to control the flow of information:

1. **Reset Gate**: Determines how much of the previous state to forget
2. **Update Gate**: Controls how much of the candidate activation to use in updating the cell state

Unlike LSTM which has three gates (input, output, and forget gates) and separate cell state and hidden state, GRU combines these mechanisms into a simpler form.

![GRU Architecture](https://www.researchgate.net/profile/Amir-Gandomi/publication/335969095/figure/fig3/AS:804158287929344@1568800566656/The-architecture-of-GRU-cell-At-time-step-t-the-GRU-cell-takes-xt-and-h-t-1-as-input.png)

*Note: The above image is a reference - if used in actual implementation, ensure proper attribution.*

## Mathematical Formulation

For a given time step $t$, a GRU computes the following:

**Update Gate**:
$$z_t = \sigma(W_z \cdot [h_{t-1}, x_t] + b_z)$$

**Reset Gate**:
$$r_t = \sigma(W_r \cdot [h_{t-1}, x_t] + b_r)$$

**Candidate Hidden State**:
$$\tilde{h}_t = \tanh(W \cdot [r_t * h_{t-1}, x_t] + b)$$

**Final Hidden State**:
$$h_t = (1 - z_t) * h_{t-1} + z_t * \tilde{h}_t$$

Where:
- $x_t$ is the input at time step $t$
- $h_{t-1}$ is the previous hidden state
- $z_t$ is the update gate vector
- $r_t$ is the reset gate vector
- $\tilde{h}_t$ is the candidate hidden state
- $\sigma$ is the sigmoid activation function
- $*$ denotes element-wise multiplication

## Comparison with LSTM

| Feature | GRU | LSTM |
|---------|-----|------|
| Number of gates | 2 (reset and update) | 3 (input, output, forget) |
| State representation | Single hidden state | Separate cell state and hidden state |
| Parameter count | Fewer parameters | More parameters |
| Computational efficiency | Generally faster | More computationally intensive |
| Performance on long sequences | Good | Slightly better in some cases |
| Memory requirements | Lower | Higher |

In practice, GRUs often achieve comparable performance to LSTMs with lower computational cost, making them preferable for applications with limited computational resources or when training on large datasets.

## Implementation Example

### PyTorch Implementation

In [None]:
import torch
import torch.nn as nn

class GRUModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
        super(GRUModel, self).__init__()
        
        # Defining the layers
        self.gru = nn.GRU(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
        
    def forward(self, x, h):
        # Forward pass through GRU
        out, h = self.gru(x, h)        
        # Use the final hidden state
        out = self.fc(self.relu(out[:, -1]))
        
        return out, h
    
    def init_hidden(self, batch_size):
        # Initialize hidden state with zeros
        weight = next(self.parameters()).data
        hidden = weight.new(self.n_layers, batch_size, self.hidden_dim).zero_()
        return hidden

### TensorFlow/Keras Implementation

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout

def create_gru_model(input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
    model = Sequential()
    
    # Add GRU layers
    for i in range(n_layers):
        if i == 0:
            model.add(GRU(hidden_dim, return_sequences=(i < n_layers-1), input_shape=(None, input_dim)))
        else:
            model.add(GRU(hidden_dim, return_sequences=(i < n_layers-1)))
        
        model.add(Dropout(drop_prob))
    
    # Output layer
    model.add(Dense(output_dim, activation='softmax'))
    
    return model

## Applications

GRUs have been successfully applied in various domains:

1. **Natural Language Processing**
   - Machine translation
   - Text generation
   - Speech recognition
   - Sentiment analysis

2. **Time Series Analysis**
   - Stock price prediction
   - Weather forecasting
   - Anomaly detection

3. **Bioinformatics**
   - Protein structure prediction
   - Gene expression analysis

4. **Robotics**
   - Movement prediction
   - Reinforcement learning agents

## Advantages and Limitations

### Advantages
- More computationally efficient than LSTMs
- Fewer parameters to train
- Effectively captures medium-range dependencies
- Less prone to overfitting on small datasets

### Limitations
- May not capture long-term dependencies as effectively as LSTMs in some cases
- Still vulnerable to vanishing gradients over very long sequences
- Limited context integration compared to attention-based models like Transformers

## References

- Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). [Learning phrase representations using RNN encoder-decoder for statistical machine translation](https://arxiv.org/abs/1406.1078). EMNLP.
- Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). [Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling](https://arxiv.org/abs/1412.3555). NIPS Deep Learning Workshop.
- Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). [An Empirical Exploration of Recurrent Network Architectures](http://proceedings.mlr.press/v37/jozefowicz15.pdf). ICML.