# Unified LLM Generation Pipeline

This notebook provides an interface to the unified generation pipeline.
All logic lives in `generation.py` - this notebook is for interactive testing and batch runs.

The pipeline handles both Paleolithic and Holocene people automatically based on the sampled birth year.

In [1]:
import dill
from tqdm import tqdm

from generation import (
    generate_person,
    generate_batch,
    generate_batch_parallel,
    # Individual steps if needed
    generate_geography,
    generate_demographics,
    generate_structured_incidents,
    generate_historical_context,
    generate_name,
    generate_narrative_plan,
    generate_narrative,
    run_pipeline,
    reset_to_stage
)
from llm_utils import GenerationContext, extract_json

from person import sample_year, sample_person, Person

import copy

In [4]:
person = sample_person()

In [5]:
print(person.to_prompt_string())

Era: Holocene
Birth year: 797 AD
Birth date: August 6, 797 AD
Age at death: 50
Death date: December 11, 847 AD
Sex: F
Lifestyle: Rural
Sexual orientation: heterosexual
Handedness: right
Height (% rank): 47
Physical attractiveness (% rank): 75
Birthplace latitude: 14.24
Birthplace longitude: 98.01
Birthplace modern region: Tanintharyi
Birthplace modern country: Myanmar
Birthplace modern address: ရေဖြူမြို့နယ်, ထားဝယ်ခရိုင်, တနင်္သာရီတိုင်း, မြန်မာ
Birthplace altitude: 42
Birthplace biome: None
Birthplace ecotype: None
Birthplace mean temp: 26.980499267578125
Birthplace warmest month temp: 35.14432907104492
Birthplace coldest month temp: 19.30927848815918
Birthplace annual precipitation: 5358.0
Birthplace map url: https://www.google.com/maps/place/14.24,98.01/@14.24,98.01,5z
Birthplace population density: 3.559999942779541
Birthplace cropland percentage: 0.61
Birthplace pasture percentage: 0.0
Birthplace rangeland percentage: 0.0
Birthplace urban_area percentage: 0.0
Birthplace irrigated

In [6]:
person.attractiveness_percentile = 99

In [7]:
run_pipeline(person, model='gpt-5.2', quiet=False, show_cost=True)

Era: Holocene
Year: 797 AD
Sex: F
Age at death: 50
Location: Myanmar

Step 2: Generating demographics...
Generating 17 demographic attributes...
  ethnicity... -> Local Monic coastal population (Tavoy/Dawei area; Mon/Tavoyan continuum)
  language... -> Monic (Austroasiatic) only
  religion... -> Other minority practice (e.g., strong orientation to trade-associated cults or unusual local sects) within a Monic context
  number_of_siblings... -> 1
    Siblings: #1 of 2
  household_structure_at_birth... -> Two-parent nuclear household (mother and father in a recognized union) with no other co-resident kin
  household_social_status... -> Free commoners (ordinary rural household: smallholders, fishers, foragers; not elite)
  father_occupation... -> Mixed subsistence (smallholder cultivation plus seasonal fishing/forest product gathering)
  mother_occupation... -> Fish processing/gleaning focus (drying/salting fish, shellfish gathering, shoreline harvesting) alongside household duties
  menta

<person.Person at 0x186b33d10>

In [9]:
print(person.messages)



# Batch Generation

In [2]:
'''# cost prio, $5.03 (prev $1.24)
test_people_2 = generate_batch_parallel(n=20, model="gpt-5.2", workers=20)'''

'# cost prio, $5.03 (prev $1.24)\ntest_people_2 = generate_batch_parallel(n=20, model="gpt-5.2", workers=20)'

In [3]:
'''with open('test_examples_2.pkl', 'wb') as f:
    dill.dump(test_people_2 + olds,f)
%run export.py test_examples_2.pkl'''

"with open('test_examples_2.pkl', 'wb') as f:\n    dill.dump(test_people_2 + olds,f)\n%run export.py test_examples_2.pkl"

In [2]:
with open('test_examples_2.pkl', 'rb') as f:
       examples = dill.load(f)

In [3]:
test = copy.deepcopy(examples[0])

In [4]:
reset_to_stage(test, 'narrative_plan')
ctx = GenerationContext(model="gpt-5.2", quiet=False, show_cost=True)
generate_narrative_plan(test, ctx)
generate_narrative(test, ctx)
ctx.finish()

Generating narrative plan (adult)...
  Plan: 5 siblings, 0 children, 4 life phases
Generating narrative (adult)...
  Generated 1456 words

=== Cost Summary: gpt-5.2 ===
Requests: 2
Input tokens: 27,450
  Cached: 24,064 (87.7%)
Output tokens: 4,263
Total cost: $0.0499
Avg per request: 13725 in, 2132 out


In [5]:
test.narrative_plan

{'siblings': [{'name': 'Devadatta',
   'sex': 'M',
   'birth_year': -314,
   'death_year': -255,
   'death_age': 59,
   'narrative_role': 'Eldest surviving brother; steady, authoritative presence who increasingly speaks for the household in village matters. Rudrasena respects his competence but feels constrained by his control over land decisions.'},
  {'name': 'Somadatta',
   'sex': 'M',
   'birth_year': -312,
   'death_year': -312,
   'death_age': 0,
   'narrative_role': 'Infant brother who died soon after birth; remembered only through the mother’s occasional quiet mention during rites for the dead and anniversaries.'},
  {'name': 'Dhanapala',
   'sex': 'M',
   'birth_year': -309,
   'death_year': -245,
   'death_age': 64,
   'narrative_role': 'Second surviving older brother; practical and sharp-tongued, good with cattle and bargaining. Often teases Rudrasena for being quiet, but later becomes an ally during the debt years.'},
  {'name': 'Haridatta',
   'sex': 'M',
   'birth_year': 

In [6]:
print(test.narrative)

Rudrasena was born in a hamlet of the Upper Gangetic plain, south of the main river routes, where Mauryan officers counted fields and assessed dues and the village headman carried messages and demands between the countryside and the towns. His family spoke an Indo-Aryan vernacular and lived by the plough. At home they honored protective spirits of place and lineage and made household offerings—water poured at a threshold, a lamp set near the cooking fire, a little rice and ghee placed aside—along with the rites a Brahmin performed for major moments.

His father, Brahmadatta, kept two oxen for the plough and treated the cattle shed as a second hearth. He could be sharp with his sons and he watched their work. His mother, Subhadra, ran the inside of the household: the grinding stone, the pots, the grain bins, the ropes of firewood drying under the eaves. She remembered losses. Two of her earlier sons had died as infants—Somadatta and Haridatta—and she spoke their names only during offeri

In [8]:
for i in test.messages:
    print(i['content'])
    print('\n\n---\n\n')

This is part of a "random lives" project - simulating randomly selecting a person from human history.

For this historical person, you will be asked demographic questions. For each question, provide a probability distribution
representing how common each option was among people matching this person's known characteristics
(birth time, location, age, sex, personality, lifestyle, etc.). Each question should be answered conditional on all previously
generated information.

YOUR GOAL: Estimate the TRUE HISTORICAL FREQUENCIES, not what seems interesting or diverse.
- If 90% of people in this demographic had characteristic X, assign it 90% probability
- Focus on ordinary people, not exceptional individuals or elites
- Boring and repetitive answers are often historically correct
- Some personality extremes can represent substantial limitations and strongly influence a person's life trajectory

TECHNICAL REQUIREMENTS:
- Probabilities must sum to exactly 1.0
- Categories should be mutually excl

# Narrative Prompt Experimentation (V2)

In [4]:
for person in examples[:10]:
    reset_to_stage(person, 'narrative')
    
    ctx = GenerationContext(model="gpt-5.2", quiet=False, show_cost=True)
    generate_narrative(person, ctx)
    ctx.finish()

Generating narrative (adult)...
  Generated 1564 words

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 15,659
Output tokens: 1,972
Total cost: $0.0393
Avg per request: 15659 in, 1972 out
Generating narrative (adult)...
  Generated 1464 words

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 17,464
  Cached: 14,976 (85.8%)
Output tokens: 1,855
Total cost: $0.0235
Avg per request: 17464 in, 1855 out
Generating narrative (adult)...
  Generated 1583 words

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 16,832
  Cached: 14,976 (89.0%)
Output tokens: 2,032
Total cost: $0.0245
Avg per request: 16832 in, 2032 out
Generating narrative (infant)...
  Generated 266 words

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 6,899
Output tokens: 329
Total cost: $0.0119
Avg per request: 6899 in, 329 out
Generating narrative (adult)...
  Generated 1473 words

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 17,912
  Cached: 16,000 (89.3%)
Output tokens: 1,842
Total 

In [5]:
with open('test_examples_3.pkl', 'wb') as f:
    dill.dump(examples,f)
%run export.py test_examples_3.pkl

Loading people from test_examples_3.pkl...
Found 37 people
Removing existing markdown files from ../_lives...
  Removed 37 files
  Exported 10 people...
  Exported 20 people...
  Exported 30 people...

Successfully exported 37 people to ../_lives
