# Baseline LM Sampling Analysis (CPU)

This notebook implements a **simple, end-to-end baseline pipeline** in one place:
- load TinyStories,
- train/use a BPE tokenizer,
- build 256-token chunks,
- train a small decoder-only Transformer,
- run sanity checks,
- save artifacts.

It keeps the core tokenizer design from `tokenizer.py`, but allows small in-notebook adjustments without editing the original file.

## 1) Set Up Notebook Environment and MCP Jupyter Workflow

This notebook is designed to be executed cell-by-cell (including via MCP notebook tools).

`RUN_MODE` controls scale:
- `quick` → development on ~100k stories
- `full` → baseline submission run on full train split

In [1]:
# Optional: uncomment if running in a fresh environment
# %pip install -q datasets torch tqdm tokenizers

import os
import json
import math
import time
import random
import platform
import subprocess
from pathlib import Path
from dataclasses import dataclass
from collections import defaultdict

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from datasets import load_dataset
from tqdm.auto import tqdm

from tokenizers import Tokenizer
from tokenizers.models import BPE
from tokenizers.trainers import BpeTrainer
from tokenizers.pre_tokenizers import ByteLevel
from tokenizers.decoders import ByteLevel as ByteLevelDecoder
from tokenizers.processors import TemplateProcessing

RUN_MODE = "full"  # "quick" or "full"
WORKSPACE = Path.cwd()
ARTIFACTS_DIR = WORKSPACE / "artifacts"
ARTIFACTS_DIR.mkdir(exist_ok=True)

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

if torch.backends.mps.is_available() and torch.backends.mps.is_built():
    DEVICE = torch.device("mps")
else:
    DEVICE = torch.device("cpu")
    torch.set_num_threads(max(1, os.cpu_count() // 2))

print(f"Device: {DEVICE}")
print(f"MPS available: {torch.backends.mps.is_available()}")
print(f"Run mode: {RUN_MODE}")
print(f"Workspace: {WORKSPACE}")

Device: mps
MPS available: True
Run mode: full
Workspace: /Users/m3/Documents/uni/s26/genai/lm_sampling_analysis


## 2) Load the Tokenizer Backend and Inspect Its Internal Structure

We use Hugging Face `tokenizers` BPE and inspect the internal components:
- model (BPE),
- pre-tokenizer (ByteLevel),
- decoder,
- learned vocabulary/special token mappings after training.

In [2]:
base_tok = Tokenizer(BPE(unk_token="<UNK>"))
base_tok.pre_tokenizer = ByteLevel(add_prefix_space=False)
base_tok.decoder = ByteLevelDecoder()

print("Tokenizer backend:", type(base_tok).__name__)
print("Model:", type(base_tok.model).__name__)
print("Pre-tokenizer:", type(base_tok.pre_tokenizer).__name__)
print("Decoder:", type(base_tok.decoder).__name__)

Tokenizer backend: Tokenizer
Model: BPE
Pre-tokenizer: ByteLevel
Decoder: ByteLevel


## 3) Create an In-Notebook Tokenizer Variant (No Source File Changes)

This variant preserves the original BPE principle (byte-level base + merge sequence), while adding notebook-only helpers:
- save/load,
- special-token registration,
- convenience encode/decode wrappers.

In [3]:
class HFNotebookBPETokenizer:
    def __init__(self):
        self.tokenizer = Tokenizer(BPE(unk_token="<UNK>"))
        self.tokenizer.pre_tokenizer = ByteLevel(add_prefix_space=False)
        self.tokenizer.decoder = ByteLevelDecoder()
        self.special_tokens = {}

    def train(self, corpus_sents, vocab_size=10_000, special_tokens=("<PAD>", "<EOS>", "<UNK>")):
        trainer = BpeTrainer(vocab_size=vocab_size, special_tokens=list(special_tokens), show_progress=True)
        self.tokenizer.train_from_iterator(corpus_sents, trainer=trainer)

        vocab = self.tokenizer.get_vocab()
        self.special_tokens = {token: int(vocab[token]) for token in special_tokens if token in vocab}

        if "<EOS>" in self.special_tokens:
            self.tokenizer.post_processor = TemplateProcessing(
                single="$A <EOS>",
                pair="$A <EOS> $B:1 <EOS>:1",
                special_tokens=[("<EOS>", self.special_tokens["<EOS>"])],
            )

    @property
    def vocabulary(self):
        return set(self.tokenizer.get_vocab().values())

    def encode(self, text: str, add_eos: bool = False):
        if add_eos:
            return self.tokenizer.encode(text).ids
        ids = self.tokenizer.encode(text).ids
        if "<EOS>" in self.special_tokens and ids and ids[-1] == self.special_tokens["<EOS>"]:
            return ids[:-1]
        return ids

    def encode_batch_with_eos(self, texts):
        return [enc.ids for enc in self.tokenizer.encode_batch(texts)]

    def decode_ids(self, ids):
        return self.tokenizer.decode(ids, skip_special_tokens=False)

    def to_state(self):
        return {
            "tokenizer_json": self.tokenizer.to_str(),
            "special_tokens": self.special_tokens,
        }

    @classmethod
    def from_state(cls, state):
        tok = cls()
        tok.tokenizer = Tokenizer.from_str(state["tokenizer_json"])
        tok.special_tokens = {k: int(v) for k, v in state.get("special_tokens", {}).items()}
        return tok

    def save(self, path: Path):
        path = Path(path)
        path.write_text(self.tokenizer.to_str(), encoding="utf-8")

## 4) Apply Minimal Tokenization Adjustments While Preserving Core Principles

We train BPE using the same core algorithm, then add minimal localized behavior:
- register `<PAD>` and `<EOS>` after merge training,
- append `<EOS>` at sequence boundaries in preprocessing.

In [4]:
MAX_STORIES_QUICK = 100_000
TOKENIZER_VOCAB_SIZE = 10_000
CONTEXT_LEN = 256

print("Loading TinyStories train split...")
train_ds = load_dataset("roneneldan/TinyStories", split="train")

if RUN_MODE == "quick":
    train_ds = train_ds.select(range(min(MAX_STORIES_QUICK, len(train_ds))))

train_texts = [row["text"] for row in train_ds]
print(f"Stories used: {len(train_texts):,}")

tok = HFNotebookBPETokenizer()
start = time.time()
tok.train(train_texts, vocab_size=TOKENIZER_VOCAB_SIZE, special_tokens=("<PAD>", "<EOS>", "<UNK>"))
end = time.time()

print(f"Tokenizer trained in {(end-start)/60:.2f} min")
print("Vocabulary size:", len(tok.vocabulary))
print("Special tokens:", tok.special_tokens)

Loading TinyStories train split...
Stories used: 2,119,719



Tokenizer trained in 2.26 min
Vocabulary size: 10000
Special tokens: {'<PAD>': 0, '<EOS>': 1, '<UNK>': 2}


## 5) Implement Invariant Checks for Vocabulary, Special Tokens, and Normalization

In [5]:
assert len(tok.vocabulary) <= TOKENIZER_VOCAB_SIZE, "Vocab unexpectedly exceeds target"
assert "<PAD>" in tok.special_tokens and "<EOS>" in tok.special_tokens and "<UNK>" in tok.special_tokens

sample = "Once upon a time, a little cat played with a red ball."
enc = tok.encode(sample, add_eos=True)
dec = tok.decode_ids(enc)

print("Encoded length:", len(enc))
print("Decoded preview:", dec[:120])
print("Invariant checks passed ✅")

Encoded length: 15
Decoded preview: Once upon a time, a little cat played with a red ball.<EOS>
Invariant checks passed ✅


## 6) Integrate the Tokenizer Variant into the Implementation Pipeline

The pipeline below uses the notebook tokenizer variant for both training data preparation and later generation/inference usage.

In [6]:
def encode_story_with_eos(text: str, tokenizer):
    return tokenizer.encode(text, add_eos=True)

def flatten_encoded_stories(texts, tokenizer, batch_size=1024):
    all_ids = []
    for start_idx in tqdm(range(0, len(texts), batch_size), desc="Encoding stories"):
        batch = texts[start_idx:start_idx + batch_size]
        batch_ids = tokenizer.encode_batch_with_eos(batch)
        for ids in batch_ids:
            all_ids.extend(ids)
    return all_ids

all_token_ids = flatten_encoded_stories(train_texts, tok)
print("Total tokens:", f"{len(all_token_ids):,}")
print("First 20 token ids:", all_token_ids[:20])

Encoding stories:   0%|          | 0/2071 [00:00<?, ?it/s]

Total tokens: 464,966,457
First 20 token ids: [317, 253, 14, 156, 294, 343, 397, 261, 492, 156, 3629, 213, 206, 655, 16, 210, 600, 201, 179, 2859]


## 7) Encode Dataset Samples and Build Batching Utilities

In [7]:
def build_chunks(token_ids, context_len=256):
    usable = (len(token_ids) // context_len) * context_len
    token_ids = token_ids[:usable]
    chunks = np.array(token_ids, dtype=np.int64).reshape(-1, context_len)
    return chunks

chunks = build_chunks(all_token_ids, context_len=CONTEXT_LEN)
print("Chunks shape:", chunks.shape)

split_idx = int(0.95 * len(chunks))
train_chunks = chunks[:split_idx]
val_chunks = chunks[split_idx:]
print("Train chunks:", train_chunks.shape, "Val chunks:", val_chunks.shape)

PAD_ID = tok.special_tokens["<PAD>"]
EOS_ID = tok.special_tokens["<EOS>"]

class LMDataset(Dataset):
    def __init__(self, array_2d):
        self.data = torch.tensor(array_2d, dtype=torch.long)

    def __len__(self):
        return self.data.shape[0]

    def __getitem__(self, idx):
        x = self.data[idx, :-1]
        y = self.data[idx, 1:]
        attention_mask = torch.ones_like(x)
        return x, y, attention_mask

BATCH_SIZE = 8
GRAD_ACCUM_STEPS = 4  # effective batch size = 32

train_loader = DataLoader(LMDataset(train_chunks), batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
val_loader = DataLoader(LMDataset(val_chunks), batch_size=BATCH_SIZE, shuffle=False, num_workers=0)

x0, y0, m0 = next(iter(train_loader))
print("Batch x/y/mask shapes:", x0.shape, y0.shape, m0.shape)

Chunks shape: (1816275, 256)
Train chunks: (1725461, 256) Val chunks: (90814, 256)
Batch x/y/mask shapes: torch.Size([8, 255]) torch.Size([8, 255]) torch.Size([8, 255])


In [8]:
class CausalSelfAttention(nn.Module):
    def __init__(self, d_model, n_heads, dropout):
        super().__init__()
        assert d_model % n_heads == 0
        self.n_heads = n_heads
        self.head_dim = d_model // n_heads
        self.qkv = nn.Linear(d_model, 3 * d_model)
        self.proj = nn.Linear(d_model, d_model)
        self.drop = nn.Dropout(dropout)

    def forward(self, x):
        B, T, C = x.shape
        qkv = self.qkv(x)
        q, k, v = qkv.chunk(3, dim=-1)

        q = q.view(B, T, self.n_heads, self.head_dim).transpose(1, 2)
        k = k.view(B, T, self.n_heads, self.head_dim).transpose(1, 2)
        v = v.view(B, T, self.n_heads, self.head_dim).transpose(1, 2)

        scores = (q @ k.transpose(-2, -1)) / math.sqrt(self.head_dim)
        mask = torch.tril(torch.ones(T, T, device=x.device)).bool()
        scores = scores.masked_fill(~mask, torch.finfo(scores.dtype).min)
        attn = F.softmax(scores, dim=-1)
        attn = self.drop(attn)

        out = attn @ v
        out = out.transpose(1, 2).contiguous().view(B, T, C)
        return self.proj(out)

class TransformerBlock(nn.Module):
    def __init__(self, d_model, n_heads, ff_dim, dropout):
        super().__init__()
        self.ln1 = nn.LayerNorm(d_model)
        self.attn = CausalSelfAttention(d_model, n_heads, dropout)
        self.ln2 = nn.LayerNorm(d_model)
        self.ff = nn.Sequential(
            nn.Linear(d_model, ff_dim),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(ff_dim, d_model),
            nn.Dropout(dropout),
        )

    def forward(self, x):
        x = x + self.attn(self.ln1(x))
        x = x + self.ff(self.ln2(x))
        return x

class TinyGPT(nn.Module):
    def __init__(self, vocab_size, context_len=256, d_model=256, n_heads=4, n_layers=4, ff_dim=1024, dropout=0.1):
        super().__init__()
        self.context_len = context_len
        self.tok_emb = nn.Embedding(vocab_size, d_model)
        self.pos_emb = nn.Embedding(context_len, d_model)
        self.drop = nn.Dropout(dropout)
        self.blocks = nn.ModuleList([
            TransformerBlock(d_model, n_heads, ff_dim, dropout) for _ in range(n_layers)
        ])
        self.ln_f = nn.LayerNorm(d_model)
        self.lm_head = nn.Linear(d_model, vocab_size, bias=False)

    def forward(self, input_ids):
        B, T = input_ids.shape
        pos = torch.arange(T, device=input_ids.device).unsqueeze(0)
        x = self.tok_emb(input_ids) + self.pos_emb(pos)
        x = self.drop(x)
        for block in self.blocks:
            x = block(x)
        x = self.ln_f(x)
        logits = self.lm_head(x)
        return logits

VOCAB_SIZE_MODEL = max(tok.vocabulary) + 1
model = TinyGPT(
    vocab_size=VOCAB_SIZE_MODEL,
    context_len=CONTEXT_LEN - 1,
    d_model=256,
    n_heads=4,
    n_layers=4,
    ff_dim=1024,
    dropout=0.1,
).to(DEVICE)

n_params = sum(p.numel() for p in model.parameters())
print(f"Model params: {n_params/1e6:.2f}M")

Model params: 8.34M


## 8) Run End-to-End Smoke Tests in Notebook Cells

This section validates tokenization → batching → model forward/backward on CPU.

In [9]:
@torch.no_grad()
def evaluate_loss(model, loader, max_batches=50):
    model.eval()
    losses = []
    for i, (x, y, _) in enumerate(loader):
        if i >= max_batches:
            break
        x, y = x.to(DEVICE), y.to(DEVICE)
        logits = model(x)
        loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), y.reshape(-1))
        losses.append(loss.item())
    model.train()
    return float(np.mean(losses)) if losses else float("nan")

def train_steps(model, loader, optimizer, max_steps=200, grad_accum_steps=1):
    model.train()
    losses = []
    step = 0
    optimizer.zero_grad(set_to_none=True)

    for x, y, _ in loader:
        x, y = x.to(DEVICE), y.to(DEVICE)
        logits = model(x)
        loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), y.reshape(-1))
        (loss / grad_accum_steps).backward()

        if (step + 1) % grad_accum_steps == 0:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            optimizer.zero_grad(set_to_none=True)

        losses.append(loss.item())
        step += 1
        if step >= max_steps:
            break

    return losses

# Forward-pass smoke test
x_smoke, y_smoke, _ = next(iter(train_loader))
x_smoke = x_smoke.to(DEVICE)
with torch.no_grad():
    logits_smoke = model(x_smoke)
print("Smoke logits shape:", tuple(logits_smoke.shape))

# Overfit sanity check on 512 chunks
sanity_chunks = train_chunks[:512]
sanity_loader = DataLoader(LMDataset(sanity_chunks), batch_size=BATCH_SIZE, shuffle=True, num_workers=0)

sanity_model = TinyGPT(
    vocab_size=VOCAB_SIZE_MODEL,
    context_len=CONTEXT_LEN - 1,
    d_model=256,
    n_heads=4,
    n_layers=4,
    ff_dim=1024,
    dropout=0.1,
).to(DEVICE)

sanity_opt = torch.optim.AdamW(sanity_model.parameters(), lr=3e-4)
sanity_losses = train_steps(sanity_model, sanity_loader, sanity_opt, max_steps=250, grad_accum_steps=GRAD_ACCUM_STEPS)

print(f"Sanity initial loss: {sanity_losses[0]:.4f}")
print(f"Sanity final loss:   {sanity_losses[-1]:.4f}")
print("Sanity check pass:", sanity_losses[-1] < sanity_losses[0])

Smoke logits shape: (8, 255, 10000)
Sanity initial loss: 9.3805
Sanity final loss:   7.3497
Sanity check pass: True


In [10]:
# Short baseline train run (extend max_steps for stronger baseline)
main_opt = torch.optim.AdamW(model.parameters(), lr=3e-4)
train_losses = train_steps(model, train_loader, main_opt, max_steps=10000, grad_accum_steps=GRAD_ACCUM_STEPS)
val_loss = evaluate_loss(model, val_loader, max_batches=40)
val_ppl = math.exp(val_loss) if np.isfinite(val_loss) else float("inf")

print(f"Main train initial loss: {train_losses[0]:.4f}")
print(f"Main train final loss:   {train_losses[-1]:.4f}")
print(f"Validation loss:         {val_loss:.4f}")
print(f"Validation perplexity:   {val_ppl:.2f}")

Main train initial loss: 9.3417
Main train final loss:   2.8716
Validation loss:         2.7249
Validation perplexity:   15.26


## 9) Compare Original vs Modified Tokenizer Outputs on Target Cases

In [11]:
cases = [
    "Once upon a time, a little cat was happy.",
    "The sun is bright, and the kids can play outside!",
    "I have 2 apples, you have 3 apples."
]

for text in cases:
    ids_no_eos = tok.encode(text, add_eos=False)
    ids_with_eos = tok.encode(text, add_eos=True)

    print("=" * 80)
    print("TEXT:", text)
    print("len no_eos:", len(ids_no_eos), "| len with_eos:", len(ids_with_eos))
    print("no_eos first ids:", ids_no_eos[:20])
    print("with_eos first ids:", ids_with_eos[:20])
    print("decode with_eos preview:", tok.decode_ids(ids_with_eos)[:120])

TEXT: Once upon a time, a little cat was happy.
len no_eos: 11 | len with_eos: 12
no_eos first ids: [328, 344, 156, 293, 14, 156, 294, 598, 179, 301, 16]
with_eos first ids: [328, 344, 156, 293, 14, 156, 294, 598, 179, 301, 16, 1]
decode with_eos preview: Once upon a time, a little cat was happy.<EOS>
TEXT: The sun is bright, and the kids can play outside!
len no_eos: 12 | len with_eos: 13
no_eos first ids: [270, 630, 304, 1012, 14, 162, 160, 1195, 368, 256, 575, 3]
with_eos first ids: [270, 630, 304, 1012, 14, 162, 160, 1195, 368, 256, 575, 3, 1]
decode with_eos preview: The sun is bright, and the kids can play outside!<EOS>
TEXT: I have 2 apples, you have 3 apples.
len no_eos: 10 | len with_eos: 11
no_eos first ids: [43, 359, 7348, 2061, 14, 243, 359, 1869, 2061, 16]
with_eos first ids: [43, 359, 7348, 2061, 14, 243, 359, 1869, 2061, 16, 1]
decode with_eos preview: I have 2 apples, you have 3 apples.<EOS>


## 10) Persist Artifacts and Reproducibility Metadata

In [12]:
tokenizer_state_path = ARTIFACTS_DIR / "tokenizer_state.json"
model_ckpt_path = ARTIFACTS_DIR / "tinygpt_baseline.pt"
run_meta_path = ARTIFACTS_DIR / "run_metadata.json"

tokenizer_state = tok.to_state()
with open(tokenizer_state_path, "w", encoding="utf-8") as f:
    json.dump(tokenizer_state, f)

tok.save(ARTIFACTS_DIR / "tokenizer.json")

torch.save(
    {
        "model_state_dict": model.state_dict(),
        "config": {
            "vocab_size": VOCAB_SIZE_MODEL,
            "context_len": CONTEXT_LEN - 1,
            "d_model": 256,
            "n_heads": 4,
            "n_layers": 4,
            "ff_dim": 1024,
            "dropout": 0.1,
        },
    },
    model_ckpt_path,
)

def safe_cmd(cmd):
    try:
        return subprocess.check_output(cmd, shell=True, text=True).strip()
    except Exception:
        return None

metadata = {
    "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
    "seed": SEED,
    "run_mode": RUN_MODE,
    "platform": platform.platform(),
    "python_version": platform.python_version(),
    "torch_version": torch.__version__,
    "numpy_version": np.__version__,
    "dataset": "roneneldan/TinyStories",
    "stories_used": len(train_texts),
    "context_len": CONTEXT_LEN,
    "tokenizer_vocab_target": TOKENIZER_VOCAB_SIZE,
    "tokenizer_vocab_actual": len(tok.vocabulary),
    "special_tokens": tok.special_tokens,
    "model_params": int(sum(p.numel() for p in model.parameters())),
    "train_loss_initial": float(train_losses[0]) if len(train_losses) else None,
    "train_loss_final": float(train_losses[-1]) if len(train_losses) else None,
    "val_loss": float(val_loss),
    "val_ppl": float(val_ppl),
    "git_commit": safe_cmd("git rev-parse --short HEAD"),
    "workspace": str(WORKSPACE),
}

with open(run_meta_path, "w", encoding="utf-8") as f:
    json.dump(metadata, f, indent=2)

print("Saved:")
print("-", tokenizer_state_path)
print("-", ARTIFACTS_DIR / "tokenizer.json")
print("-", model_ckpt_path)
print("-", run_meta_path)

Saved:
- /Users/m3/Documents/uni/s26/genai/lm_sampling_analysis/artifacts/tokenizer_state.json
- /Users/m3/Documents/uni/s26/genai/lm_sampling_analysis/artifacts/tokenizer.json
- /Users/m3/Documents/uni/s26/genai/lm_sampling_analysis/artifacts/tinygpt_baseline.pt
- /Users/m3/Documents/uni/s26/genai/lm_sampling_analysis/artifacts/run_metadata.json


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
fatal: not a git repository (or any of the parent directories): .git


## 11) Generate Sample Stories

These cells generate a few readable stories from the trained model using prompt-based decoding.

In [13]:
@torch.no_grad()
def generate_story(
    model,
    tokenizer,
    prompt,
    max_new_tokens=120,
    temperature=0.9,
    top_k=40,
):
    model.eval()

    eos_id = tokenizer.special_tokens.get("<EOS>")
    input_ids = tokenizer.encode(prompt, add_eos=False)
    if len(input_ids) == 0:
        input_ids = [eos_id] if eos_id is not None else [0]

    x = torch.tensor([input_ids], dtype=torch.long, device=DEVICE)

    for _ in range(max_new_tokens):
        x_cond = x[:, -model.context_len:]
        logits = model(x_cond)
        next_logits = logits[:, -1, :]

        if temperature <= 0:
            next_token = torch.argmax(next_logits, dim=-1, keepdim=True)
        else:
            next_logits = next_logits / temperature
            if top_k is not None and top_k > 0:
                top_k = min(top_k, next_logits.shape[-1])
                values, _ = torch.topk(next_logits, top_k)
                cutoff = values[:, -1].unsqueeze(-1)
                next_logits = torch.where(
                    next_logits < cutoff,
                    torch.full_like(next_logits, torch.finfo(next_logits.dtype).min),
                    next_logits,
                )
            probs = F.softmax(next_logits, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)

        x = torch.cat([x, next_token], dim=1)

        if eos_id is not None and next_token.item() == eos_id:
            break

    output_ids = x[0].tolist()

    if eos_id is not None and eos_id in output_ids:
        eos_pos = output_ids.index(eos_id)
        output_ids = output_ids[:eos_pos]

    return tokenizer.decode_ids(output_ids)


In [14]:
sample_prompts = [
    "Once upon a time",
    "The little cat",
    "In a small red house",
    "Tom and Anna went to the park",
    "The sun was warm and"
]

for i, prompt in enumerate(sample_prompts, start=1):
    story = generate_story(
        model=model,
        tokenizer=tok,
        prompt=prompt,
        max_new_tokens=120,
        temperature=0.9,
        top_k=40,
    )

    print(f"\n{'='*90}")
    print(f"Story {i} | Prompt: {prompt!r}")
    print('-'*90)
    print(story)



Story 1 | Prompt: 'Once upon a time'
------------------------------------------------------------------------------------------
Once upon a time, there was a little boy who wanted to help. Every day they would play the park for a walk.

One day, he saw a big pile of animals, white, blue, the clouds. He wanted to play with it, so he asked his friend if he could help his friends to play with.

When his friend, the man said, "The man, can I have some blocks. We are very creative." Sammy smiled and said, "Yes, please! It's very fun!" They played together every day and became fun.

Story 2 | Prompt: 'The little cat'
------------------------------------------------------------------------------------------
The little cat said, "I'm afraid of fun!"

The two friends were happy and they both laughed with their work together. They all were happy because they had a great day and enjoyed spending time, but it's time to go on the slide. 

But one day, a big bird named Billy came to find the rabbit