  # **Ultimate Guide to RNNs for Text Generation**

## **1. Introduction**

### **Objective**
To build a Recurrent Neural Network (RNN) using TensorFlow/Keras for text generation, demonstrating how RNNs handle sequential data to predict and generate text sequences.

### **Key Learnings**
- How RNNs process sequential data (text).
- Building and training an RNN model with LSTM layers.
- Techniques for text generation using trained models.

---

## **2. Metadata and Dataset Overview**

### **Dataset Used**
- **Dataset Name**: Shakespeare's Works (Text Corpus)
- **Source**: [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/shakespeare)
- **Description**: A text corpus of Shakespeare's plays and poems.

### **Acknowledgement**
The dataset is publicly available via TensorFlow Datasets and is widely used for educational purposes in NLP.

---

## **3. Loading and Exploring the Dataset**


### **Code: Load the Dataset**

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.models import Sequential
import numpy as np
import os


In [None]:
# Load the Shakespeare dataset
path_to_file = tf.keras.utils.get_file(
    'shakespeare.txt',
    'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
)

# Read the text
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
print(f'Length of text: {len(text)} characters')


Length of text: 1115394 characters


### **Explanation**
- The dataset is downloaded from a URL using TensorFlow utilities.
- The text is loaded as a single string, and its length is printed.
- **Output**: The text contains ~1.1 million characters.


In [None]:
print(text[:500])  # Print the first 500 characters

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


## **4. Data Preprocessing**

In [None]:

# Extract unique characters and create mappings
vocab = sorted(set(text))
char2idx = {char: idx for idx, char in enumerate(vocab)}
idx2char = {idx: char for idx, char in enumerate(vocab)}

print(f'Unique characters: {len(vocab)}')


Unique characters: 65


### **Interpretation**
- The vocabulary consists of **65 unique characters** (letters, punctuation, and symbols).
- `char2idx` and `idx2char` are dictionaries for converting characters to integers and vice versa.



### **Code: Convert Text to Numerical Indices**

In [None]:
# Convert the entire text to numerical indices
text_as_int = np.array([char2idx[char] for char in text])
print(f'Text sample (numerical): {text_as_int[:10]}')


Text sample (numerical): [18 47 56 57 58  1 15 47 58 47]


### **Explanation**
- Each character in the text is replaced with its corresponding integer index.


### **Code: Create Training Sequences**

In [None]:
# Define sequence length and batch size
seq_length = 100
examples_per_epoch = len(text) // (seq_length + 1)

# Create training sequences
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)

# Split sequences into input and target
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)


### **Interpretation**
- **Sequence Length**: 100 characters per sequence.
- Each sequence is split into:
  - **Input**: First 100 characters.
  - **Target**: Next 100 characters (shifted by one character).
- The model will learn to predict the next character given the previous sequence.


## **5. Building the RNN Model**

### **Key Concepts**
1. **Recurrent Neural Networks (RNNs)**:
   - Designed for sequential data.
   - Maintain a hidden state that captures information from previous steps.
2. **Long Short-Term Memory (LSTM)**:
   - A type of RNN that mitigates the vanishing gradient problem.
   - Uses gates to control information flow.




### **Mathematical Intutition**

#### **1. Forget Gate**
- **Equation**:  
  `f_t = sigmoid(W_f * [h_prev, x_t] + b_f)`  
- **Explanation**:  
  - **`f_t`**: "Forget gate" output (values between 0 and 1).  
  - **`W_f`**: Weights for the forget gate.  
  - **`h_prev`**: Hidden state from the previous timestep.  
  - **`x_t`**: Current input.  
  - **`b_f`**: Bias term for the forget gate.  

---

#### **2. Input Gate**
- **Equation**:  
  `i_t = sigmoid(W_i * [h_prev, x_t] + b_i)`  
  `C_tilde_t = tanh(W_C * [h_prev, x_t] + b_C)`  
- **Explanation**:  
  - **`i_t`**: "Input gate" output (values between 0 and 1).  
  - **`C_tilde_t`**: Candidate cell state (values between -1 and 1).  

---

#### **3. Update Cell State**
- **Equation**:  
  `C_t = f_t * C_prev + i_t * C_tilde_t`  
- **Explanation**:  
  - **`C_t`**: New cell state (long-term memory).  
  - **`C_prev`**: Cell state from the previous timestep.  
  - **`*`**: Element-wise multiplication.  

---

#### **4. Output Gate**
- **Equation**:  
  `o_t = sigmoid(W_o * [h_prev, x_t] + b_o)`  
  `h_t = o_t * tanh(C_t)`  
- **Explanation**:  
  - **`o_t`**: "Output gate" output (values between 0 and 1).  
  - **`h_t`**: New hidden state (short-term memory).  

---

### **What Do These Equations Do?**
1. **Forget Gate**: Decides what to remove from long-term memory (`C_prev`).  
   - Example: If `f_t` is 0.2 for a piece of information, it retains 20% of it.  
2. **Input Gate**: Decides what new information to add to long-term memory.  
   - `i_t` acts as a filter, and `C_tilde_t` holds candidate values.  
3. **Update Cell State**: Combines old memory (after forgetting) and new memory.  
4. **Output Gate**: Decides what part of the updated memory to output as `h_t`.  

---

### **Summary**
- LSTMs use **three gates** (forget, input, output) to manage information flow.  
- The **cell state** (`C_t`) acts as long-term memory, while the **hidden state** (`h_t`) is short-term memory.  
- **Sigmoid** (0-1) and **tanh** (-1 to 1) functions control how much information is added/removed.  




### **Code: Define the Model**

In [None]:
# Hyperparameters
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024
batch_size = 64

# Regularized model
model = Sequential([
    Embedding(vocab_size, embedding_dim),
    LSTM(rnn_units, return_sequences=True, dropout=0.2),  # Add dropout
    Dense(vocab_size)
])

# Compile the model

model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
)
model.summary()

## **6. Training the Model**

### **Code: Prepare Batches**


In [None]:
# Batch and shuffle the dataset
dataset = dataset.shuffle(10000).batch(batch_size, drop_remainder=True)

### **Explanation**
- **Embedding Layer**: Converts character indices into dense vectors.
- **LSTM Layer**: 1024 units with stateful processing to retain context across batches.
- **Dense Layer**: Outputs logits for each character in the vocabulary.
- **Model Summary**: Shows the architecture and parameter count (~4 million).


### **Code: Train the Model**

In [None]:
# Train for more epochs
history = model.fit(
    dataset,
    epochs=30  # Increase from 10 to 30
)

Epoch 1/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 86ms/step - loss: 3.1745
Epoch 2/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 78ms/step - loss: 2.0972
Epoch 3/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 75ms/step - loss: 1.8141
Epoch 4/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 74ms/step - loss: 1.6460
Epoch 5/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 77ms/step - loss: 1.5410
Epoch 6/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 79ms/step - loss: 1.4712
Epoch 7/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 77ms/step - loss: 1.4195
Epoch 8/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 75ms/step - loss: 1.3765
Epoch 9/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 76ms/step - loss: 1.3458
Epoch 10/30
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15

## **7. Text Generation**

### **Code: Generate Text Function**

In [None]:
def generate_text(model, start_string, num_generate=1000):
    input_eval = [char2idx[char] for char in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    text_generated = []
    temperature = 0.7  # Lower temperature for less randomness

    for _ in range(num_generate):
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        input_eval = tf.expand_dims([predicted_id], 0)
        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

### **Explanation**
- **Temperature**: Controls the randomness of predictions. Lower values make the output more deterministic.
- The model predicts the next character iteratively, using its previous output as input.


### **Code: Generate Sample Text**


In [None]:
# Generate text starting with "ROMEO"
print(generate_text(model, start_string="ROMEO"))


ROMEO:


This, dweande is angandes y tht t thou, ma be thand thator me thes peer wer t yorsome, f redor:
Princou omy, t ma thas fof,
Sears oound the I myour yourariny bend, thisthe spodeeamy tur ath t ticat ananouspamat pr ghe t an y ll tout ape, an h in f isend ther g ant me youror ar the g t ghoouth winomournouthyond hars hathasit by he te
ARALI'O:
I oind heree f ofustous t wee thareseser nde tute he t sesowind anoter he matlfors thapenorst t st were br fare bl s thest tr hes I lfoure catit y.
TE matht INCHAn yout thes ben her m tucethinoous cand masiouger st theno thouthe; y, hatr amyotheweat by se! bor t thand t br s wet he acat iosiome thesed trakeatha athare mo hanound thout ang heshakee th! thar s!
My to ber tos t mas,
AUnondecoof s s t, INAS:
HEO:
Torerely the thicuickin t s y willous our g ge
Than f ake l ved t han yon t, h thous we iscke irdatiend wilindw t t, wat bre g'swnd pou ang'
INE:
Tis ayoreat co hat fousthal ingerO:
OUSPHe t bl d le, lar he ind mome d t t, s nther l t

### **Interpretation**
- The generated text mimics Shakespearean language and structure.
- The model learns context (e.g., character names, poetic formatting).


## **8. Advantages and Disadvantages**

### **Advantages**
- **Sequential Data Handling**: RNNs excel at processing sequences (text, time series).
- **Context Retention**: LSTMs capture long-term dependencies in text.
- **Creative Output**: Can generate novel text that mimics the training data.

### **Disadvantages**
- **Computational Cost**: Training LSTMs is resource-intensive.
- **Vanishing Gradients**: Basic RNNs struggle with long sequences (mitigated by LSTMs).
- **Overfitting**: Can memorize training data if not regularized.

---

## **9. Conclusion**

### **Key Learnings**
- RNNs process sequential data by maintaining a hidden state.
- LSTMs address the vanishing gradient problem with gated mechanisms.
- Text generation involves iterative prediction of the next character.

### **Next Steps**
- Experiment with **GRUs** (Gated Recurrent Units) for efficiency.
- Use **Transformer models** (e.g., GPT-2) for more coherent long-form text.
- Fine-tune hyperparameters (sequence length, temperature) for better results.

---

## **10. References**
- TensorFlow RNN Guide: [Text Generation with RNNs](https://www.tensorflow.org/text/tutorials/text_generation)
- Hochreiter & Schmidhuber (1997): [LSTM Paper](https://www.bioinf.jku.at/publications/older/2604.pdf)
- Shakespeare Dataset: [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/shakespeare)
