# Unified LLM Generation Pipeline

This notebook provides an interface to the unified generation pipeline.
All logic lives in `generation.py` - this notebook is for interactive testing and batch runs.

The pipeline handles both Paleolithic and Holocene people automatically based on the sampled birth year.

In [1]:
import dill
from tqdm import tqdm

from generation import (
    generate_person,
    generate_batch,
    generate_batch_parallel,
    # Individual steps if needed
    generate_geography,
    generate_demographics,
    generate_structured_incidents,
    generate_historical_context,
    generate_name,
    generate_narrative_plan,
    generate_narrative,
    run_pipeline,
    reset_to_stage
)
from llm_utils import GenerationContext, extract_json

from person import sample_year, sample_person, Person

import copy

# Batch Generation

In [3]:
'''# cost prior, $3.62 -> 10.52 = $0.23 per story
test_people = generate_batch_parallel(n=30, model="gpt-5.2", workers=30)'''

Sampling 30 people...
Generating 30 people with 30 parallel workers...


100%|███████████████████████████████████████████| 30/30 [05:24<00:00, 10.82s/it]

Done: 30 generated





In [4]:
'''with open('test_examples.pkl', 'wb') as f:
    dill.dump(test_people,f)
%run export.py test_examples.pkl'''

Loading people from test_examples.pkl...
Found 30 people
Removing existing markdown files from ../_lives...
  Removed 37 files
  Exported 10 people...
  Exported 20 people...
  Exported 30 people...

Successfully exported 30 people to ../_lives


In [7]:
# =============================================================================
# Test V2 naming on problematic cases
# =============================================================================

# Load existing people to test on
with open('test_examples.pkl', 'rb') as f:
    test_people = dill.load(f)

# Find interesting test cases by birth year and ethnicity
def summarize_person(p):
    eth = p.demographics.get('ethnicity', 'unknown')[:50]
    lang = p.demographics.get('language', 'unknown')[:40]
    return f"{p.birth_year_str} | {p.sex} | {eth}... | {lang}..."

print("Available test cases:")
for i, p in enumerate(test_people):
    print(f"  [{i}] {summarize_person(p)}")

Available test cases:
  [0] 1995 AD | M | Han (汉族)... | Southwestern Mandarin (home vernacular) ...
  [1] 1791 AD | F | Kannada-speaking Deccan village culture (Karnataka... | Kannada + Telugu...
  [2] 975 AD | M | Local Adivasi/forest communities of the northern W... | Western Indo-Aryan tribal language (Bhil...
  [3] 1857 AD | F | Mestiza / chola (mixed ancestry; culturally Andean... | Spanish and Quechua (bilingual, Spanish-...
  [4] 1669 AD | M | Nilotic or Maa-speaking pastoralist peoples (Maasa... | Maa (Eastern Nilotic) + multiple Bantu l...
  [5] 2954 BC | M | Pre-ceramic Atlantic Forest hunter-gatherer (inlan... | Unknown/extinct local language isolate o...
  [6] 1920 AD | F | Ukrainian (East Slavic, Right-Bank rural peasant c... | Ukrainian (local Right-Bank/Central Ukra...
  [7] 440 AD | M | Tamil (Tamilakam; Kaveri delta/Chola-country Tamil... | Dravidian (Southern; local Tamilakam ver...
  [8] 679 AD | F | Andhra (Telugu-speaking; Vengi/Andhra desa local r... | South-Centr