<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/rnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recurrent Neural Networks (RNNs)

Recurrent Neural Networks are a class of artificial neural networks designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or numerical time series data. Unlike traditional feedforward neural networks, RNNs maintain an internal state (memory) that captures information about what has been calculated so far.

## Basic Architecture

The fundamental feature of an RNN is the feedback loop within the network that allows information to persist. This creates a form of memory that enables processing of sequences.

### Key Components

- **Input Layer**: Receives data at each time step
- **Hidden State**: Maintains memory between time steps
- **Output Layer**: Produces prediction at each time step
- **Recurrent Connection**: Passes the hidden state from one time step to the next

### Mathematical Representation

At each time step $t$, an RNN processes an input $x_t$ and updates its hidden state $h_t$:

$h_t = \tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)$

The output $y_t$ is then computed as:

$y_t = W_{hy}h_t + b_y$

Where:
- $W_{xh}$: Input-to-hidden weights
- $W_{hh}$: Hidden-to-hidden (recurrent) weights
- $W_{hy}$: Hidden-to-output weights
- $b_h$ and $b_y$: Bias terms

## RNN Variants

### Simple/Vanilla RNN
The basic form described above, which suffers from vanishing/exploding gradient problems.

### Bidirectional RNN
Processes sequences in both forward and backward directions, capturing context from both past and future.

### Deep (Stacked) RNN
Multiple recurrent layers stacked on top of each other, allowing the network to learn more complex patterns.

### Gated RNNs
- **LSTM (Long Short-Term Memory)**: Introduces cell state and gating mechanisms to combat vanishing gradients
- **GRU (Gated Recurrent Unit)**: Simplified version of LSTM with fewer parameters

These variants address the limitations of vanilla RNNs and are more commonly used in practice.

## Training RNNs

RNNs are typically trained using Backpropagation Through Time (BPTT), which unfolds the recurrent network across time steps and applies standard backpropagation.

### Challenges in Training

1. **Vanishing Gradients**: Over many time steps, gradients can become extremely small, preventing learning of long-term dependencies.
2. **Exploding Gradients**: Conversely, gradients can become extremely large, causing instability in training.
3. **Long Training Times**: Processing sequences sequentially is computationally intensive.

### Techniques to Address Challenges

- **Gradient Clipping**: Prevents exploding gradients by scaling gradients when they exceed a threshold.
- **Truncated BPTT**: Limits backpropagation to a fixed number of time steps.
- **Gated Architectures**: LSTMs and GRUs help mitigate vanishing gradients.

## Applications

RNNs excel at sequence-based tasks including:

1. **Natural Language Processing**:
   - Language modeling
   - Text generation
   - Machine translation
   - Sentiment analysis

2. **Speech Recognition**:
   - Converting speech to text
   - Speaker identification

3. **Time Series Analysis**:
   - Stock price prediction
   - Weather forecasting
   - Sensor data analysis

4. **Music Generation**:
   - Composing melodies and harmonies
   - Predicting musical sequences

5. **Anomaly Detection**:
   - Identifying unusual patterns in sequential data

In [None]:
# Simple RNN implementation in TensorFlow/Keras
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

# Define a basic RNN model
def create_rnn_model(input_shape, output_size):
    model = Sequential([
        SimpleRNN(64, input_shape=input_shape, return_sequences=True),
        SimpleRNN(32),
        Dense(output_size, activation='softmax')
    ])
    
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Example usage
# model = create_rnn_model((sequence_length, features), num_classes)

In [None]:
# RNN implementation in PyTorch
import torch
import torch.nn as nn

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        # x shape: (batch_size, sequence_length, input_size)
        batch_size = x.size(0)
        
        # Initialize hidden state
        h0 = torch.zeros(1, batch_size, self.hidden_size).to(x.device)
        
        # Forward propagate RNN
        out, _ = self.rnn(x, h0)
        
        # Get output from last time step
        out = self.fc(out[:, -1, :])
        return out

# Example usage
# model = SimpleRNN(input_size=10, hidden_size=64, output_size=5)

## Advantages and Limitations

### Advantages:
- Natural fit for sequential data processing
- Ability to handle variable-length sequences
- Parameter sharing across time steps enables generalization
- Can be extended to bidirectional processing for capturing context

### Limitations:
- Difficulty learning long-term dependencies (vanilla RNNs)
- Computationally expensive for very long sequences
- Sequential nature limits parallelization during training
- Often outperformed by Transformer-based architectures for many sequence tasks

## Historical Context

- **1980s**: Early forms of RNNs proposed by John Hopfield and others
- **1990**: Backpropagation Through Time (BPTT) formalized by Paul Werbos
- **1997**: LSTM architecture introduced by Hochreiter and Schmidhuber to address vanishing gradients
- **2000s**: RNNs gained prominence in speech recognition and NLP
- **2014**: GRU architecture proposed by Cho et al. as a simplified alternative to LSTMs
- **2015-2017**: Peak of RNN usage in sequence modeling tasks
- **2017-Present**: Transformers begin to replace RNNs in many applications, though RNNs remain relevant for specific use cases

## References

- Elman, J. L. (1990). [Finding structure in time](https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1). Cognitive Science, 14(2), 179-211.
- Hochreiter, S., & Schmidhuber, J. (1997). [Long short-term memory](https://www.bioinf.jku.at/publications/older/2604.pdf). Neural Computation, 9(8), 1735–1780.
- Cho, K., et al. (2014). [Learning phrase representations using RNN encoder-decoder for statistical machine translation](https://arxiv.org/abs/1406.1078). EMNLP.
- Graves, A. (2012). [Supervised Sequence Labelling with Recurrent Neural Networks](https://www.cs.toronto.edu/~graves/preprint.pdf). Studies in Computational Intelligence.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). [Deep Learning](https://www.deeplearningbook.org/). MIT Press. Chapter 10.
- Karpathy, A. (2015). [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Blog post.