<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        box-sizing: border-box;
        max-width: 100%;
        overflow-wrap: break-word;
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;
        color: #f38ba8;                  /* Soft Red/Pink */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        white-space: pre-wrap;
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
    <h2>Chapter 4.4: The Creative Machine (Text Generation)</h2>
    <p>
        <strong>Objective</strong>: In Chapter 4.3, we classified existing text (Many-to-One). Now, we will make the model <strong>generate</strong> new text (Many-to-Many). We will build a Character-Level RNN that learns to write like Shakespeare.
    </p>
    <p><strong>The Math of Generation (Autoregression)</strong>:</p>
    <p>We model the probability of the next character  <i>x<sub>t+1</sub></i> given the current character <i>x<sub>t</sub></i> and the hidden state <i>h<sub>t</sub></i> (memory of the past):</p>

$$ P(x_{t+1} | x_t, h_t) = \text{Softmax}(W \cdot h_t + b) $$
<p><strong>Key Concepts</strong>:</p>
    <ul>
        <li><strong>Character-Level Modeling</strong>: Instead of a vocabulary of 50,000 words, we use ~65 characters (the alphabet). This is computationally cheaper for learning concepts.</li>
        <li><strong>Teacher Forcing</strong>: During training, we feed the <em>correct</em> next character as input for the next step, regardless of what the model predicted.</li>
        <li><strong>Temperature Sampling</strong>: A mathematical trick to control "creativity" by scaling the logits before Softmax.</li>
    </ul>
</div>

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np

# Check for acceleration
device = torch.device("mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: mps


<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        box-sizing: border-box;
        max-width: 100%;
        overflow-wrap: break-word;
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;
        color: #f38ba8;                  /* Soft Red/Pink */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        white-space: pre-wrap;
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
    <h2>Step 1: The Dataset (Tiny Shakespeare)</h2>
    <p>To train a language model, we simply take a sequence of text and offset it by one.</p>
    <ul>
        <li><strong>Input</strong>: "Hell" (indices 0, 1, 2, 3)</li>
        <li><strong>Target</strong>: "ello" (indices 1, 2, 3, 4)</li>
    </ul>
    <p>This means for the input 'H', the correct label is 'e'. For input 'e', the label is 'l', and so on.</p>
</div>

In [2]:
# A tiny snippet of Shakespeare for training
# (In a real scenario, this would be a much larger text file)
text_data = """
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou, contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thyself thy foe, to thy sweet self too cruel.
"""

# 1. Create Character Vocabulary
chars = sorted(list(set(text_data)))
vocab_size = len(chars)

# 2. Mappings (Character to Integer and vice versa)
char_to_ix = {ch: i for i, ch in enumerate(chars)}
ix_to_char = {i: ch for i, ch in enumerate(chars)}

print(f"Total Characters in text: {len(text_data)}")
print(f"Unique Vocabulary Size: {vocab_size}")
print(f"Sample Mapping: {list(char_to_ix.items())[:5]}")

# 3. Custom Dataset Class
class ShakespeareDataset(Dataset):
    def __init__(self, text, char_to_ix, seq_length=20):
        self.text = text
        self.char_to_ix = char_to_ix
        self.seq_length = seq_length

    def __len__(self):
        # We can extract (Length - Window_Size) sequences
        return len(self.text) - self.seq_length

    def __getitem__(self, idx):
        # Grab a chunk of text
        chunk = self.text[idx : idx + self.seq_length + 1]
        
        # Convert to integers
        encoded = [self.char_to_ix[c] for c in chunk]
        
        # Input: 0 to End-1 (e.g., "Hell")
        # Target: 1 to End   (e.g., "ello")
        input_seq = torch.tensor(encoded[:-1], dtype=torch.long)
        target_seq = torch.tensor(encoded[1:], dtype=torch.long)
        
        return input_seq, target_seq

# Hyperparameters
SEQ_LEN = 30
BATCH_SIZE = 16

dataset = ShakespeareDataset(text_data, char_to_ix, SEQ_LEN)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)

# Sanity Check
x_batch, y_batch = next(iter(dataloader))
print(f"Input Shape: {x_batch.shape}") # [Batch, Seq_Len]
print(f"Input Example: {[ix_to_char[i.item()] for i in x_batch[0][:5]]}")
print(f"Target Example: {[ix_to_char[i.item()] for i in y_batch[0][:5]]}")

Total Characters in text: 353
Unique Vocabulary Size: 34
Sample Mapping: [('\n', 0), (' ', 1), ("'", 2), (',', 3), ('-', 4)]
Input Shape: torch.Size([16, 30])
Input Example: ['t', 'e', 'd', ' ', 't']
Target Example: ['e', 'd', ' ', 't', 'o']


<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        box-sizing: border-box;
        max-width: 100%;
        overflow-wrap: break-word;
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;
        color: #f38ba8;                  /* Soft Red/Pink */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        white-space: pre-wrap;
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
    <h2>Step 2: The GRU Model</h2>
    <p>We use a <strong>GRU (Gated Recurrent Unit)</strong>. It is similar to an LSTM but uses fewer gates (Update and Reset), making it faster.</p>
    <p><strong>Architecture Details</strong>:</p>
    <ol>
        <li><strong>Embedding</strong>: Maps integer IDs to vectors ($E \in \mathbb{R}^{V \times D}$).</li>
        <li><strong>GRU Layer</strong>: Processes the sequence. Returns <em>Output</em> (features for every step) and <em>Hidden</em> (final memory state).</li>
        <li><strong>Linear Head</strong>: Projects the GRU features back to vocabulary size to predict the next character logits.</li>
    </ol>
    <p>Unlike text classification where we only used the <em>last</em> hidden state, here we use the output from <strong>every time step</strong> because we make a prediction for every character.</p>
</div>

In [3]:
class TextGenerator(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super(TextGenerator, self).__init__()
        
        # 1. Embedding
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        
        # 2. GRU Layer
        # batch_first=True ensures input shape is (Batch, Seq, Feature)
        self.gru = nn.GRU(embed_dim, hidden_dim, batch_first=True)
        
        # 3. Output Layer
        # Transforms hidden features into probabilities for the next char
        self.fc = nn.Linear(hidden_dim, vocab_size)
        
    def forward(self, x, hidden=None):
        # x shape: [batch, seq_len]
        
        embeds = self.embedding(x) # [batch, seq_len, embed_dim]
        
        # Run RNN
        # If 'hidden' is None, PyTorch automatically initializes it to 0s
        output, hidden = self.gru(embeds, hidden)
        
        # output shape: [batch, seq_len, hidden_dim]
        
        # Predict the next character for every step in the sequence
        prediction = self.fc(output) # [batch, seq_len, vocab_size]
        
        return prediction, hidden

# Model Config
EMBED_DIM = 32
HIDDEN_DIM = 64

model = TextGenerator(vocab_size, EMBED_DIM, HIDDEN_DIM).to(device)
print(model)

TextGenerator(
  (embedding): Embedding(34, 32)
  (gru): GRU(32, 64, batch_first=True)
  (fc): Linear(in_features=64, out_features=34, bias=True)
)


<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        box-sizing: border-box;
        max-width: 100%;
        overflow-wrap: break-word;
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;
        color: #f38ba8;                  /* Soft Red/Pink */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        white-space: pre-wrap;
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
    <h2>Step 3: Training</h2>
    <p>We use <strong>CrossEntropyLoss</strong>. There is a small shape mismatch we need to handle:</p>
    <ul>
        <li>Model Output: <span class=\"code-inline\">[Batch, Seq_Len, Vocab_Size]</span></li>
        <li>Target: <span class=\"code-inline\">[Batch, Seq_Len]</span></li>
    </ul>
    <p>PyTorch's CrossEntropyLoss expects 2D inputs <span class=\"code-inline\">(N, Classes)</span>. So, we must <strong>flatten</strong> the Batch and Sequence dimensions together before calculating loss.</p>
</div>

In [4]:
optimizer = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

epochs = 100

print("--- Starting Training ---")
for epoch in range(epochs):
    epoch_loss = 0
    model.train()
    
    for x, y in dataloader:
        x, y = x.to(device), y.to(device)
        
        optimizer.zero_grad()
        
        # Forward pass
        # We ignore hidden state here (stateless training between batches)
        predictions, _ = model(x) 
        
        # Reshape for Loss
        # Flattening Batch and Sequence length into one long dimension
        # Preds: [Batch*Seq, Vocab], Targets: [Batch*Seq]
        loss = criterion(predictions.reshape(-1, vocab_size), y.reshape(-1))
        
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
    if (epoch+1) % 10 == 0:
        print(f"Epoch {epoch+1}/{epochs} | Loss: {epoch_loss/len(dataloader):.4f}")

--- Starting Training ---
Epoch 10/100 | Loss: 0.1297
Epoch 20/100 | Loss: 0.1137
Epoch 30/100 | Loss: 0.1066
Epoch 40/100 | Loss: 0.1061
Epoch 50/100 | Loss: 0.1032
Epoch 60/100 | Loss: 0.1027
Epoch 70/100 | Loss: 0.1013
Epoch 80/100 | Loss: 0.1011
Epoch 90/100 | Loss: 0.1017
Epoch 100/100 | Loss: 0.1005


<style>
    /* Main container style */
    .note-box {
        background-color: #1e1e2e;       /* Dark Blue-Grey Background */
        color: #cdd6f4;                  /* Soft White Text */
        border-left: 6px solid #89b4fa;  /* Blue Accent Border */
        border-radius: 8px;
        padding: 20px;
        margin: 20px 0;
        font-family: system-ui, -apple-system, sans-serif;
        line-height: 1.6;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.2);
        box-sizing: border-box;
        max-width: 100%;
        overflow-wrap: break-word;
    }
    
    /* Header style */
    .note-box h2 {
        color: #89b4fa;                  /* Blue Header */
        margin-top: 0;
        margin-bottom: 15px;
        font-size: 1.6rem;
        font-weight: 600;
        border-bottom: 1px solid #45475a;
        padding-bottom: 10px;
    }

    /* Important keywords */
    .note-box strong {
        color: #f9e2af;                  /* Soft Gold/Yellow */
        font-weight: 600;
    }

    /* Inline code snippets */
    .note-box .code-inline {
        background-color: #313244;
        color: #f38ba8;                  /* Soft Red/Pink */
        padding: 2px 6px;
        border-radius: 4px;
        font-family: 'Menlo', 'Consolas', monospace;
        font-size: 0.9em;
        border: 1px solid #45475a;
        white-space: pre-wrap;
    }

    /* Lists */
    .note-box ul {
        padding-left: 20px;
        margin: 10px 0;
    }
    .note-box li {
        margin-bottom: 8px;
    }
</style>
<div class="note-box">
    <h2>Step 4: Generation with Temperature</h2>
    <p>During generation, we feed the model's own output back as input for the next step. To prevent the model from getting stuck in loops (e.g., "the the the"), we use <strong>Temperature Sampling</strong>.</p>
    <p>The math involves dividing the logits ($z$) by a temperature $T$ before the Softmax:</p>

$$ P_i = \frac{e^{z_i / T}}{\sum e^{z_j / T}} $$

<ul>
        <li><strong>High T (> 1.0)</strong>: Flattens the distribution. Low probability characters get boosted. <em>Result: Random, creative, prone to typos.</em></li>
        <li><strong>Low T (< 1.0)</strong>: Sharpens the distribution. High probability characters get boosted. <em>Result: Conservative, repetitive, safe.</em></li>
    </ul>
</div>

In [5]:
def generate_text(start_str, length=100, temperature=1.0):
    model.eval()
    
    # 1. Initialize with start string
    input_idxs = [char_to_ix.get(c, 0) for c in start_str]
    input_tensor = torch.tensor(input_idxs).unsqueeze(0).to(device) # [1, seq_len]
    
    # We maintain the hidden state throughout generation
    hidden = None
    generated_text = start_str
    
    with torch.no_grad():
        # 2. Process the seed text to build up 'context' (memory)
        # We pass the whole sequence, but we only care about the FINAL hidden state
        output, hidden = model(input_tensor, hidden)
        
        # Take the logits for the very last character in the sequence
        last_logits = output[:, -1, :]
        
        for _ in range(length):
            # 3. Apply Temperature
            # Divide logits by temp. 
            # If temp is small, big numbers get bigger (peaks sharper).
            logits = last_logits / temperature
            probs = torch.softmax(logits, dim=1)
            
            # 4. Sample from the distribution (Weighted Random)
            # We don't use argmax, or it would be deterministic
            next_char_idx = torch.multinomial(probs, 1).item()
            next_char = ix_to_char[next_char_idx]
            
            generated_text += next_char
            
            # 5. Prepare next input
            # The predicted char becomes the input for the next step
            input_tensor = torch.tensor([[next_char_idx]]).to(device)
            
            # Pass new input + OLD hidden state
            output, hidden = model(input_tensor, hidden)
            last_logits = output[:, -1, :]
            
    print(f"--- Temp {temperature} ---\n{generated_text}\n")

# Test
print("Generating from seed 'But ':\n")
generate_text("But ", length=200, temperature=0.5) # Safe
generate_text("But ", length=200, temperature=1.2) # Chaotic

Generating from seed 'But ':

--- Temp 0.5 ---
But thou, contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thyself thy foe, to thy sweet self too cruel.
Making a famine whe

--- Temp 1.2 ---
But as the riper should by time decease,
His tender heir might bear his memory:
But thou, contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where a

