### 5.8: Recurrent Neural Networks (RNNs) & LSTMs for Text

#### Key Points

- **Model Architecture:** A character-level LSTM recurrent neural network using TensorFlow and Keras, trained to predict the next character in a sequence from Shakespeare's works. This enables text generation that mimics Shakespeare's style, though results vary with training duration and data preprocessing.
- **Dataset Handling:** Used the provided Tiny Shakespeare dataset (~1 million characters), cleaned by replacing newlines with spaces (as per your code), leading to prose-like output rather than verse. Research suggests preserving structure improves poetic rhythm, but this follows your setup.
- **Training and Evaluation:** Split into 90% train/10% test; trained for 20 epochs with categorical cross-entropy loss and accuracy metrics. Typical final test loss is around 1.5–2.0 (perplexity 4–7), indicating moderate predictive power—coherent words and phrases emerge, but full sentences may lack depth without more epochs or data.
- **Generation:** The model generates samples starting from a seed phrase, using temperature sampling for creativity. It seems likely that outputs will capture Shakespearean vocabulary and patterns, though they may include invented words or inconsistencies due to the character's simplicity.
- **Performance Insights:** Evidence leans toward solid learning of short-term dependencies (e.g., spelling, basic grammar), but long-range context (e.g., plot coherence) is limited in small models. Overfitting is common; dropout helps mitigate this.

#### Implementation Steps

1. **Data Preparation:** Convert text to integer sequences, create input-target pairs (sequence length 100), and split into train/test.
2. **Model Building:** Embedding layer for character vectors, two LSTM layers for sequence modeling, dense output for predictions.
3. **Training:** Fit the model, monitoring loss and accuracy.
4. **Evaluation:** Compute test loss, accuracy, and perplexity.
5. **Generation:** Use the trained model to produce a sample, with temperature for variability.

In [None]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("Built with CUDA:", tf.test.is_built_with_cuda())
print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print("Device list:", tf.config.list_logical_devices('GPU'))

TensorFlow version: 2.15.1
Built with CUDA: True
Num GPUs Available: 1
Device list: [LogicalDevice(name='/device:GPU:0', device_type='GPU')]


In [4]:
import requests

# Load and prepare the dataset
def load_shakespeare_text():
    url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
    print("Downloading Shakespeare text...")
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception("Failed to download text")
    text = response.text
    print(f"Downloaded {len(text):,} characters")
    return text

text = load_shakespeare_text()

# Clean
text = text.replace('\n', ' ').replace('  ', ' ').strip()
print(f"Cleaned length: {len(text):,} characters")

# Vocabulary
chars = sorted(list(set(text)))
n_vocab = len(chars)
char2idx = {c: i for i, c in enumerate(chars)}
idx2char = {i: c for c, i in char2idx.items()}

print(f"Vocabulary size: {n_vocab}")
print("First 80 chars:", text[:80])
print("Sample chars:", ''.join(chars[:40]), "...")

Downloading Shakespeare text...
Downloaded 1,115,394 characters
Cleaned length: 1,108,157 characters
Vocabulary size: 64
First 80 chars: First Citizen: Before we proceed any further, hear me speak. All: Speak, speak. 
Sample chars:  !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZab ...


#### Prepare Data Sequences and Train/Test Split

Convert the text to integer indices, then create input sequences (X) of length 100 and corresponding targets (y) as the next character. This frames the problem as supervised learning: given 100 chars, predict the 101st. Use a step size of 3 to reduce overlap and computation while retaining diversity. Split into train (90%) and test (10%) sets for evaluation. This step uses NumPy for array manipulation. Source: Sequence creation mirrors the TensorFlow tutorial's dataset pipeline, adapted to NumPy for simplicity; train/test split follows standard ML practices from Keras documentation.

In [5]:
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

seq_length = 100  # Sequence length for input
step = 3  # Step size to slide window over text

# Convert text to integers
text_as_int = np.array([char2idx[c] for c in text])

# Create inputs and targets
inputs = []
targets = []
for i in range(0, len(text_as_int) - seq_length, step):
    inputs.append(text_as_int[i:i + seq_length])
    targets.append(text_as_int[i + seq_length])

X = np.array(inputs)
y = to_categorical(targets, num_classes=n_vocab)  # One-hot encode targets

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")

Training samples: 332417
Test samples: 36936


#### Build the LSTM Model

**Define a sequential Keras model:** Embedding turns integer inputs into dense vectors (256 dims), two LSTM layers capture sequential dependencies (512 units each, with dropout for regularization), and a dense layer outputs probabilities over the vocabulary. The first LSTM returns sequences to feed into the second. Compile with categorical cross-entropy (suitable for multi-class prediction) and Adam optimizer. Source: Model architecture adapted from the Medium guide on Shakespeare LSTM generator (uses stacked LSTMs with dropout); embedding and LSTM params align with TensorFlow's RNN tutorial recommendations.

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout

model = Sequential()
model.add(Embedding(n_vocab, 256, input_length=seq_length))
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(512))
model.add(Dropout(0.2))
model.add(Dense(n_vocab, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 100, 256)          16384     
                                                                 
 lstm (LSTM)                 (None, 100, 512)          1574912   
                                                                 
 dropout (Dropout)           (None, 100, 512)          0         
                                                                 
 lstm_1 (LSTM)               (None, 512)               2099200   
                                                                 
 dropout_1 (Dropout)         (None, 512)               0         
                                                                 
 dense (Dense)               (None, 64)                32832     
                                                                 
Total params: 3723328 (14.20 MB)
Trainable params: 37233

In [7]:
# Train the model

history = model.fit(X_train, y_train, epochs=20, batch_size=64, validation_data=(X_test, y_test))

2026-02-21 01:22:24.982434: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 265933600 exceeds 10% of free system memory.
2026-02-21 01:22:25.311834: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 265933600 exceeds 10% of free system memory.


Epoch 1/20


2026-02-21 01:22:27.608161: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8904
2026-02-21 01:22:28.333736: I external/local_xla/xla/service/service.cc:168] XLA service 0x7f5c544b5fc0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2026-02-21 01:22:28.333774: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce GTX 1050 Ti, Compute Capability 6.1
2026-02-21 01:22:28.339881: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1771654948.430869 1265831 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


#### Evaluate Metrics

Compute test loss and accuracy, then derive perplexity. Lower perplexity means better uncertainty handling. Typical values: If test loss is 1.6, perplexity ~5 indicates the model is about 5x as uncertain as a perfect model. Source: Perplexity calculation from Machine Learning Mastery LSTM text gen post; evaluation aligns with Keras metrics in the sources.

In [10]:
test_loss, test_acc = model.evaluate(X_test, y_test)
perplexity = np.exp(test_loss)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f}")

Test Loss: 1.4326
Test Accuracy: 0.5768


In [9]:
# Generate Poetry Sample
def generate_text(model, char2idx, idx2char, seed_text, n_chars=500, temperature=0.7):
    """ Define a function to generate text: Start with a seed, predict next characters autoregressively using temperature-scaled sampling 
    (lower temperature = more predictable; higher = more creative). Generate 500 characters as a "poetry sample" (prose-like due to cleaning). 
    Run with a Shakespearean seed."""
    
    generated = seed_text
    for _ in range(n_chars):
        x_pred = np.zeros((1, seq_length))
        seq = generated[-seq_length:]  # Take last seq_length chars
        for t, char in enumerate(seq):
            if char in char2idx:  # Handle if seed has unseen chars
                x_pred[0, t] = char2idx[char]
        preds = model.predict(x_pred, verbose=0)[0]
        preds = np.log(preds) / temperature
        exp_preds = np.exp(preds)
        preds = exp_preds / np.sum(exp_preds)
        next_index = np.random.choice(len(preds), p=preds)
        next_char = idx2char[next_index]
        generated += next_char
    return generated

# Example usage
seed = "Shall I compare thee to a summer's day "
sample = generate_text(model, char2idx, idx2char, seed, n_chars=500, temperature=0.7)
print("Generated Poetry Sample:\n", sample)

  preds = np.log(preds) / temperature


Generated Poetry Sample:
 Shall I compare thee to a summer's day biuvIbisvlepliubbalaaauwRwlAhdI-vlIauspIjJK-ATKiFftnnlAyaOLOG ARIEL: Good heart-place with man at a king when he has not for love. SICINIUS: No, I say; that thou respects with the king's life of this way The time for the great boar when they be mountly weeping her man. Gremio, the commons the county; And he is forget so far and triumphant on his trime, Stands yet forgive the hands. Give me to make my way of my incrain to have the people. BAPTISTA: But these are regains; methinks it not the time 
