# ✍️ Chapter 16: NLP with RNNs & Attention — Practical Guide

This notebook provides hands-on, executable code snippets covering:
- Character-level RNNs for text generation
- Sentiment analysis with LSTM
- Encoder-Decoder models for translation
- Attention mechanisms and Transformers

Feel free to run and modify the code to deepen your understanding!

## I. Generating Shakespearean Text with a Character RNN

We'll train a character-level RNN to generate text in the style of Shakespeare.

### A. Create the Training Dataset

In [2]:
import tensorflow as tf
import requests

# Step 1: Download the text from TensorFlow's public URL
url = 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
response = requests.get(url)

# Step 2: Save it locally as 'shakespeare.txt'
with open('shakespeare.txt', 'w', encoding='utf-8') as f:
    f.write(response.text)

print("✅ Shakespeare text downloaded and saved as 'shakespeare.txt'")

# Step 3: Load the text
with open('shakespeare.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# Step 4: Create vocabulary
vocab = sorted(set(text))
char2idx = {c: i for i, c in enumerate(vocab)}
idx2char = {i: c for i, c in enumerate(vocab)}

# Step 5: Convert entire text to integers
text_as_int = tf.constant([char2idx[c] for c in text])


2025-06-20 03:26:40.369646: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-20 03:26:40.750052: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-20 03:26:41.026495: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750379201.285139    1123 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750379201.359152    1123 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1750379201.944354    1123 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

✅ Shakespeare text downloaded and saved as 'shakespeare.txt'


2025-06-20 03:26:54.437855: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


### B. Split into Sequences & Create Batches

In [3]:
seq_length = 100
examples_per_epoch = len(text) // seq_length

# Create dataset of characters
char_ds = tf.data.Dataset.from_tensor_slices(text_as_int)

# Batch characters into sequences of length + 1 (for input and target)
sequences = char_ds.batch(seq_length + 1, drop_remainder=True)

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

# Map to get input-target pairs
dataset = sequences.map(split_input_target)

# Shuffle and batch the dataset
BATCH_SIZE = 64
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

### C. Build & Train the Char-RNN Model

In [None]:
from tensorflow.keras import layers, models

vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024
BATCH_SIZE = 64  # Define this first

model = models.Sequential([
    layers.Input(batch_shape=(BATCH_SIZE, None)),  # <-- Define input shape here
    layers.Embedding(vocab_size, embedding_dim),
    layers.LSTM(rnn_units, return_sequences=True, stateful=True),
    layers.Dense(vocab_size)
])

model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

# Optional: print model summary
model.summary()

# Train the model
EPOCHS = 10
history = model.fit(dataset, epochs=EPOCHS)

Epoch 1/10
[1m 16/172[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m8:39[0m 3s/step - loss: 4.1211

### D. Generate Fake Shakespearean Text

In [None]:
def generate_text(model, start_string, num_generate=1000):
    # Convert start string to indices
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)
    text_generated = []

    # Reset states at start
    model.reset_states()
    for _ in range(num_generate):
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)
        # Sample from the distribution
        predicted_id = tf.random.categorical(predictions / 1.0, num_samples=1)[-1,0].numpy()
        # Pass the predicted id as the next input
        input_eval = tf.expand_dims([predicted_id], 0)
        text_generated.append(idx2char[predicted_id])

    return start_string + ''.join(text_generated)

print(generate_text(model, start_string="ROMEO: "))

### E. Notes on Stateful RNNs

- To maintain continuity across batches, set `stateful=True` in the LSTM layer.
- When doing so, handle batch resets carefully between epochs.


## II. Sentiment Analysis

Classify movie reviews as positive or negative using sequence data.

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

max_features = 10000  # Vocabulary size
maxlen = 500  # Sequence length

# Load dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to maxlen
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

### A. Build & Train the Model

In [None]:
model = models.Sequential([
    layers.Embedding(max_features, 128, input_length=maxlen),
    layers.LSTM(64),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train for a few epochs
model.fit(x_train, y_train, epochs=3, batch_size=64, validation_split=0.2)

### B. Evaluate the Model

In [None]:
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test accuracy: {accuracy:.2f}")

### A. Masking in Embedding Layer

- Use `mask_zero=True` to ignore padding tokens during training.

```python
layers.Embedding(max_features, 128, mask_zero=True)
```

### B. Using Pretrained Word Embeddings (e.g., GloVe)

In [None]:
import numpy as np

# Load GloVe embeddings
glove_path = 'glove.6B.100d.txt'  # Ensure this file is available
glove_embeddings = {}
with open(glove_path, 'r', encoding='utf-8') as f:
    for line in f:
        parts = line.split()
        word = parts[0]
        vector = np.array(parts[1:], dtype='float32')
        glove_embeddings[word] = vector

# To use these embeddings, build an embedding matrix aligned with your tokenizer.
# For simplicity, code to create this matrix is omitted here.

## III. Encoder–Decoder for Neural Machine Translation

Translate English to French using a Seq2Seq model.

In [None]:
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding
from tensorflow.keras.models import Model

# Placeholder variables for vocab sizes
num_eng_tokens = 10000  # Adjust as per your data
num_french_tokens = 10000  # Adjust as per your data

# Encoder
encoder_inputs = Input(shape=(None,), name='encoder_input')
encoder_embedding = Embedding(num_eng_tokens, 256, name='encoder_embedding')(encoder_inputs)
encoder_lstm = LSTM(256, return_state=True, name='encoder_lstm')
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,), name='decoder_input')
decoder_embedding = Embedding(num_french_tokens, 256, name='decoder_embedding')(decoder_inputs)
decoder_lstm = LSTM(256, return_sequences=True, return_state=True, name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = Dense(num_french_tokens, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model
seq2seq_model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

seq2seq_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# To train, prepare your data accordingly.
# seq2seq_model.fit([encoder_input_data, decoder_input_data], decoder_target_data, ...)

### A. Bidirectional RNNs for Contextual Encoding

In [None]:
from tensorflow.keras.layers import Bidirectional

bidirectional_layer = Bidirectional(LSTM(256))
# Use in your model as needed

### B. Beam Search for Improved Decoding

- Implementing beam search improves the quality of sequence generation by considering multiple candidate sequences at each step.
- Frameworks like TensorFlow Addons or custom implementations can be used.


## IV. Attention Mechanisms

### A. Visual Attention (Bahdanau / Luong)

In [None]:
# Example: Computing attention weights
import tensorflow.keras.backend as K

def bahdanau_attention(hidden_states, encoder_outputs):
    # hidden_states: decoder hidden state
    # encoder_outputs: all encoder outputs
    # Implementation details omitted for brevity
    pass

### B. Transformer: Attention Is All You Need

In [None]:
from tensorflow.keras.layers import MultiHeadAttention

attention_layer = MultiHeadAttention(num_heads=8, key_dim=64)
# Example usage:
query = ...  # shape: (batch_size, seq_len_q, depth)
key = ...    # shape: (batch_size, seq_len_k, depth)
value = ...  # shape: (batch_size, seq_len_v, depth)
output = attention_layer(query=query, key=key, value=value)

## V. Recent Innovations in NLP

- Large models like GPT, BERT, and Transformer-based architectures.
- Pretraining tasks such as masked language modeling and next-sentence prediction.
- Fine-tuning on downstream tasks for state-of-the-art performance.


## Summary

- Character RNNs can generate stylistic text.
- Sequence models with masking and pretrained embeddings improve classification.
- Attention mechanisms enhance translation and understanding.
- Transformers have revolutionized NLP.
- Pretrained models like GPT and BERT are now standard.


## Exercises

1. Train a character RNN with different temperature settings to generate diverse styles.
2. Compare performance of LSTM vs GRU on sentiment analysis.
3. Implement Bahdanau attention in a custom Seq2Seq model.
4. Build a small Transformer for English-French translation.
5. Fine-tune a pretrained HuggingFace model for sentiment classification.
