# Recurrent Neural Networks (RNNs)

## Problem Type
**Recurrent Neural Networks (RNNs)** are primarily used for:
- **Sequential Data Processing** (e.g., time series, text)
- **Supervised** learning
- **Applications**: Language Modeling, Speech Recognition, Machine Translation, Time Series Prediction, and more.

### How RNNs Work
- **Sequential processing:**
  - RNNs process input sequences one element at a time, maintaining a hidden state that carries information about the previous elements.
- **Hidden state:**
  - The hidden state is updated at each time step based on the current input and the previous hidden state, allowing the model to capture temporal dependencies.
- **Shared parameters:**
  - The same parameters (weights) are shared across all time steps, enabling the network to generalize across different parts of the sequence.
- **Backpropagation Through Time (BPTT):**
  - RNNs are trained using BPTT, which extends traditional backpropagation to sequences by unrolling the network in time and computing gradients through the entire sequence.
- **Vanishing/exploding gradients:**
  - RNNs can suffer from vanishing or exploding gradients when dealing with long sequences, making it difficult to learn long-term dependencies.
- **Variants:**
  - LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) are popular variants that address the vanishing gradient problem by using gates to control the flow of information.

### Key Tuning Metrics
- **`hidden_size`:**
  - **Description:** Number of units in the RNN’s hidden layer.
  - **Impact:** Larger hidden sizes allow the model to capture more complex patterns but increase computational cost and risk of overfitting.
  - **Default:** Varies; typically ranges from `128` to `512`.
- **`num_layers`:**
  - **Description:** Number of stacked RNN layers.
  - **Impact:** More layers can capture deeper temporal dependencies but may require more regularization to prevent overfitting.
  - **Default:** `1` (can be increased for deeper models).
- **`learning_rate`:**
  - **Description:** Step size for updating weights during training.
  - **Impact:** Higher values speed up training but may cause instability; lower values provide more stable convergence but slow down training.
  - **Default:** `0.001` (varies with optimizer).
- **`dropout_rate`:**
  - **Description:** Fraction of units to drop during training to prevent overfitting.
  - **Impact:** Helps in regularization; typical values range from `0.2` to `0.5`.
  - **Default:** `0.0` (no dropout) but often set to `0.2-0.5` in practice.
- **`sequence_length`:**
  - **Description:** Length of input sequences processed by the RNN.
  - **Impact:** Longer sequences can capture more context but increase the risk of vanishing gradients and computational cost.
  - **Default:** Varies depending on the problem.
- **`batch_size`:**
  - **Description:** Number of sequences processed in parallel during training.
  - **Impact:** Larger batch sizes improve training stability but require more memory.
  - **Default:** Typically `32` or `64`.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Captures temporal dependencies in sequential data     | Suffers from vanishing/exploding gradients, especially with long sequences |
| Effective for tasks involving time series, text, etc. | Computationally expensive, particularly for long sequences or deep networks |
| Supports variable-length sequences                    | Difficult to train and tune due to complex dependencies in time |
| LSTM and GRU variants mitigate gradient problems      | Sequential processing limits parallelization compared to CNNs |
| Can handle inputs of varying lengths and sequences    | Prone to overfitting without careful regularization    |

### Evaluation Metrics
- **Accuracy (Classification):**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better; values above 0.85 indicate strong model performance.
  - **Bad Value:** Below 0.5 suggests poor model performance.
- **Precision (Classification):**
  - **Description:** Proportion of true positives among all positive predictions.
  - **Good Value:** Higher values indicate fewer false positives, especially important in imbalanced datasets.
  - **Bad Value:** Low values suggest many false positives.
- **Recall (Classification):**
  - **Description:** Proportion of actual positives correctly identified.
  - **Good Value:** Higher values indicate fewer false negatives, important in recall-sensitive applications.
  - **Bad Value:** Low values suggest many false negatives.
- **F1 Score (Classification):**
  - **Description:** Harmonic mean of Precision and Recall.
  - **Good Value:** Higher values indicate a good balance between Precision and Recall.
  - **Bad Value:** Low values suggest a poor balance between Precision and Recall.
- **Perplexity (Language Modeling):**
  - **Description:** Measures the uncertainty in predicting the next word in a sequence; lower perplexity indicates better performance.
  - **Good Value:** Lower is better; values vary depending on the dataset, but the model should consistently reduce perplexity over time.
  - **Bad Value:** High perplexity indicates poor predictive capability.
- **Mean Squared Error (MSE) (Regression):**
  - **Description:** Measures the average squared difference between predicted and actual values.
  - **Good Value:** Lower is better; values close to `0` indicate high accuracy.
  - **Bad Value:** Higher values suggest the model’s predictions deviate significantly from the actual values.
- **AUC-ROC (Classification):**
  - **Description:** Measures the model's ability to distinguish between classes across all thresholds.
  - **Good Value:** Values closer to 1 indicate strong separability between classes.
  - **Bad Value:** Values near 0.5 suggest random guessing.



In [None]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"  # Suppresses INFO and WARNING messages
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import tensorflow as tf
from keras.datasets import imdb
from keras.layers import Dense, Embedding, SimpleRNN
from keras.models import Sequential
from keras.preprocessing import sequence
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.layers import Dense, Dropout, Embedding, SimpleRNN
from tensorflow.keras.models import Sequential

In [None]:
# Load the IMDB dataset
# Parameters
max_features = 10000  # number of words to consider as features
maxlen = 500  # cut texts after this number of words
max_features = 10000  # Vocabulary size
hidden_size = 128
num_layers = 2
learning_rate = 0.001
dropout_rate = 0.5
sequence_length = 100
batch_size = 32

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)

In [None]:
# Build the model
model = Sequential()
model.add(Embedding(max_features, 32))
for _ in range(num_layers - 1):
    model.add(SimpleRNN(hidden_size, return_sequences=True))
    model.add(Dropout(dropout_rate))
model.add(SimpleRNN(hidden_size))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation="sigmoid"))

# Compile the model
optimizer = tf.keras.optimizers.RMSprop(learning_rate=learning_rate)
model.compile(
    optimizer=optimizer,
    loss="binary_crossentropy",
    metrics=["acc"],
)

# Train the model
history = model.fit(X_train, y_train, epochs=1, batch_size=batch_size, validation_split=0.2)


In [None]:
score = model.evaluate(X_test, y_test, verbose=2)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

In [None]:
predictions = model.predict(X_test)
predictions = np.round(predictions).flatten()  # Convert predictions to labels

# Print the classification report
print("Classification Report:")
print(classification_report(y_test, predictions))

In [None]:
cm = confusion_matrix(y_test, predictions)

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()