# Unified LLM Generation Pipeline

This notebook provides an interface to the unified generation pipeline.
All logic lives in `generation.py` - this notebook is for interactive testing and batch runs.

The pipeline handles both Paleolithic and Holocene people automatically based on the sampled birth year.

In [1]:
import dill
from tqdm import tqdm

from generation import (
    generate_person,
    generate_batch,
    # Individual steps if needed
    generate_geography,
    generate_demographics,
    generate_life_events,
    generate_narrative,
    quality_check,
)
from llm_utils import GenerationContext

from person import sample_person, Person

import copy

# Batch Generation

In [10]:
'''quick_test = generate_batch(n=1, model="gpt-5.2", quiet=False, show_cost=True)'''

'quick_test = generate_batch(n=1, model="gpt-5.2", quiet=False, show_cost=True)'

In [10]:
#test_people = generate_batch(n=10, model="gpt-5.2", quiet=False, show_cost=True)

 10%|████▎                                      | 1/10 [02:40<24:07, 160.87s/it]

  1: Paleolithic, 26360 BC, Age 58 - 1157 words


 20%|████████▌                                  | 2/10 [06:18<25:55, 194.46s/it]

  2: Holocene, 470 AD, Age 27 - 1367 words


 30%|████████████▉                              | 3/10 [07:57<17:34, 150.60s/it]

  3: Holocene, 2296 BC, Age 0 - 317 words


 40%|█████████████████▏                         | 4/10 [10:27<15:02, 150.41s/it]

  4: Holocene, 2005 AD, Age alive - 1014 words


 50%|█████████████████████▌                     | 5/10 [14:07<14:37, 175.45s/it]

  5: Holocene, 3388 BC, Age 51 - 1100 words


 60%|█████████████████████████▊                 | 6/10 [17:36<12:27, 187.00s/it]

  6: Holocene, 1255 AD, Age 43 - 1268 words


 70%|██████████████████████████████             | 7/10 [20:06<08:44, 174.71s/it]

  7: Paleolithic, 36308 BC, Age 81 - 1263 words


 80%|██████████████████████████████████▍        | 8/10 [21:29<04:51, 145.50s/it]

  8: Holocene, 365 BC, Age 0 - 334 words


 90%|██████████████████████████████████████▋    | 9/10 [22:48<02:04, 124.72s/it]

  9: Holocene, 1936 AD, Age 0 - 263 words


100%|██████████████████████████████████████████| 10/10 [23:59<00:00, 143.98s/it]

  10: Holocene, 2021 AD, Age alive - 359 words





In [11]:
#with open('test_examples.pkl', 'wb') as f:
#    dill.dump(test_people,f)

In [2]:
with open('test_examples.pkl', 'rb') as f:
    pickled_people = dill.load(f)

In [13]:
pickled_people[9].to_dict()

{'Era': 'Holocene',
 'Birth year': '2021 AD',
 'Birth date': 'January 15, 2021 AD',
 'Age at death': 'alive',
 'Death date': None,
 'Sex': 'F',
 'Lifestyle': 'Urban',
 'Birthplace latitude': 26.98,
 'Birthplace longitude': -101.49,
 'Birthplace modern region': 'Coahuila',
 'Birthplace modern country': 'Mexico',
 'Birthplace modern address': 'La Cruz, Frontera, Coahuila, México',
 'Birthplace altitude': 562,
 'Birthplace biome': 'Deserts & Xeric Shrublands',
 'Birthplace ecotype': 'Chihuahuan desert',
 'Birthplace mean temp': 21.824875,
 'Birthplace warmest month temp': 35.846,
 'Birthplace coldest month temp': 4.576,
 'Birthplace annual precipitation': 360.0,
 'Birthplace map url': 'https://www.google.com/maps/@26.98,-101.49,8z',
 'Birthplace population density': 456.73,
 'Birthplace cropland percentage': 8.51,
 'Birthplace pasture percentage': 0.0,
 'Birthplace rangeland percentage': 64.02,
 'Birthplace urban_area percentage': 7.84,
 'Birthplace irrigated_not_rice percentage': 5.47,
 

In [3]:
'''for i in pickled_people:
    print(i.to_prompt_string())
    print(i.events)
    print('\n')
    print(i.narrative,'\n\n\n\n\n')'''

"for i in pickled_people:\n    print(i.to_prompt_string())\n    print(i.events)\n    print('\n')\n    print(i.narrative,'\n\n\n\n\n')"

## Step-by-Step Generation

For debugging or custom workflows, you can run each step individually.

In [4]:
example_num = 15
persons = [sample_person() for _ in range(example_num)]
ctx = GenerationContext(model='gpt-5.2', quiet=False, show_cost=True)

In [5]:
# testing mental disorder condition
for p,i in zip(persons,range(example_num)):
    if p.years_lived() > 5:
        print('\n\nPerson',i)
        print(p.to_dict())
        #if p.era == 'Paleolithic':
        #    generate_geography(p, ctx)
        generate_demographics(p, ctx)

#with open('test_examples.pkl', 'wb') as f:
#    dill.dump(persons,f)



Person 0
{'Era': 'Holocene', 'Birth year': '2013 AD', 'Age at death': 'alive', 'Sex': 'M', 'Lifestyle': 'Rural', 'Birthplace latitude': -10.33, 'Birthplace longitude': 18.05, 'Birthplace modern region': 'Malange', 'Birthplace modern country': 'Angola', 'Birthplace modern address': 'Quitapa, Malanje, Angola', 'Birthplace altitude': 1222, 'Birthplace biome': 'Tropical & Subtropical Grasslands, Savannas & Shrublands', 'Birthplace ecotype': 'Angolan Miombo woodlands', 'Birthplace mean temp': 20.757334, 'Birthplace warmest month temp': 29.972, 'Birthplace coldest month temp': 9.799, 'Birthplace annual precipitation': 1399.0, 'Birthplace map url': 'https://www.google.com/maps/place/10.33S+18.05E', 'Birthplace population density': 23.55, 'Birthplace cropland percentage': 0.11, 'Birthplace pasture percentage': 8.88, 'Birthplace rangeland percentage': 0.06, 'Birthplace urban_area percentage': 0.12, 'Birthplace irrigated_not_rice percentage': 0.0, 'Birthplace irrigated_rice percentage': 0.0, '

In [12]:
for i in persons:
    if i.years_lived()>5:
        print(i.messages[:-14])

[{'role': 'user', 'content': 'This is part of a "random lives" project - simulating randomly selecting a person from human history.\n\nFor this historical person, you will be asked demographic questions. For each question, provide a probability distribution\nrepresenting how common each option was among people matching this person\'s known characteristics\n(birth time, location, age, sex, personality, lifestyle, etc.). Each question should be answered conditional on all previously\ngenerated information.\n\nYOUR GOAL: Estimate the TRUE HISTORICAL FREQUENCIES, not what seems interesting or diverse.\n- If 90% of people in this demographic had characteristic X, assign it 90% probability\n- Focus on ordinary people, not exceptional individuals or elites\n- Boring and repetitive answers are often historically correct\n- Some personality extremes can represent substantial limitations and strongly influence a person\'s life trajectory\n\nTECHNICAL REQUIREMENTS:\n- Probabilities must sum to ex

In [8]:
#with open('test_examples_mental.pkl', 'wb') as f:
#    dill.dump(persons,f)

In [13]:
for i in persons:
    if i.years_lived()>5:
        generate_life_events(i, ctx)
        generate_narrative(i, ctx)
        quality_check(i, ctx)

Generating potential life events...
  Generated 10 potential events
  Sampled 4 events
  Final events: 4
Generating narrative (adolescent)...
  Generated 855 words
Running quality check...
  Fixed 4 issues
Generating potential life events...
  Generated 12 potential events
  Sampled 4 events
  Final events: 4
Generating narrative (adult)...
  Generated 1699 words
Running quality check...
  Fixed 8 issues
Generating potential life events...
  Generated 11 potential events
  Sampled 2 events
  Final events: 2
Generating narrative (adult)...
  Generated 1615 words
Running quality check...
  Fixed 7 issues
Generating potential life events...
  Generated 10 potential events
  Sampled 1 events
  Final events: 1
Generating narrative (adult)...
  Generated 1539 words
Running quality check...
  Fixed 6 issues
Generating potential life events...
  Generated 11 potential events
  Sampled 9 events
  Final events: 9
Generating narrative (adult)...
  Generated 1433 words
Running quality check...
  F

In [14]:
for i in persons:
    if i.years_lived()>5:
        print(i.to_prompt_string())
        print(i.events)
        print('\n')
        print(i.narrative)
        print('\n\n\n\n')

Era: Holocene
Birth year: 2013 AD
Age at death: alive
Sex: M
Lifestyle: Rural
Birthplace latitude: -10.33
Birthplace longitude: 18.05
Birthplace modern region: Malange
Birthplace modern country: Angola
Birthplace modern address: Quitapa, Malanje, Angola
Birthplace altitude: 1222
Birthplace biome: Tropical & Subtropical Grasslands, Savannas & Shrublands
Birthplace ecotype: Angolan Miombo woodlands
Birthplace mean temp: 20.757333755493164
Birthplace warmest month temp: 29.972000122070312
Birthplace coldest month temp: 9.798999786376953
Birthplace annual precipitation: 1399.0
Birthplace map url: https://www.google.com/maps/place/10.33S+18.05E
Birthplace population density: 23.549999237060547
Birthplace cropland percentage: 0.11
Birthplace pasture percentage: 8.88
Birthplace rangeland percentage: 0.06
Birthplace urban_area percentage: 0.12
Birthplace irrigated_not_rice percentage: 0.0
Birthplace irrigated_rice percentage: 0.0
Birthplace rainfed_not_rice percentage: 0.11
Birthplace rainfed_

In [4]:
'''with open('test_examples.pkl', 'rb') as f:
    pickled_people = dill.load(f)
working_people = pickled_people[:19]

ctx = GenerationContext(model='gpt-5.2', quiet=False, show_cost=True)
for i in working_people:
    generate_life_events(i, ctx)
    
with open('test_examples_events.pkl', 'wb') as f:
    dill.dump(working_people,f)'''

"with open('test_examples.pkl', 'rb') as f:\n    pickled_people = dill.load(f)\nworking_people = pickled_people[:19]\n\nctx = GenerationContext(model='gpt-5.2', quiet=False, show_cost=True)\nfor i in working_people:\n    generate_life_events(i, ctx)\n    \nwith open('test_examples_events.pkl', 'wb') as f:\n    dill.dump(working_people,f)"

In [5]:
with open('test_examples_events.pkl', 'rb') as f:
    pickled_people = dill.load(f)

In [6]:
ctx = GenerationContext(model='gpt-5.2', quiet=False, show_cost=True)
for i in pickled_people:
    generate_narrative(i, ctx)

Generating narrative (adult)...
  Generated 1694 words
Generating narrative (adult)...
  Generated 1689 words
Generating narrative (adolescent)...
  Generated 906 words
Generating narrative (infant)...
  Generated 295 words
Generating narrative (child)...
  Generated 361 words
Generating narrative (adolescent)...
  Generated 911 words
Generating narrative (adult)...
  Generated 1691 words
Generating narrative (adult)...
  Generated 1610 words
Generating narrative (infant)...
  Generated 253 words
Generating narrative (infant)...
  Generated 276 words
Generating narrative (infant)...
  Generated 266 words
Generating narrative (child)...
  Generated 402 words
Generating narrative (adult)...
  Generated 1435 words
Generating narrative (adult)...
  Generated 1749 words
Generating narrative (infant)...
  Generated 289 words
Generating narrative (child)...
  Generated 375 words
Generating narrative (adult)...
  Generated 1441 words
Generating narrative (infant)...
  Generated 298 words
Gener

In [7]:
ctx = GenerationContext(model='gpt-5.2', quiet=False, show_cost=True)
qc_people = []
for i in pickled_people:
    i2 = copy.deepcopy(i)
    quality_check(i2, ctx)
    qc_people += [i2]

Running quality check...
  Fixed 7 issues
Running quality check...
  Fixed 7 issues
Running quality check...
  Fixed 5 issues
Running quality check...
  Fixed 5 issues
Running quality check...
  Fixed 4 issues
Running quality check...
  Fixed 6 issues
Running quality check...
  Fixed 6 issues
Running quality check...
  Fixed 6 issues
Running quality check...
  Fixed 4 issues
Running quality check...
  Fixed 5 issues
Running quality check...
  Fixed 6 issues
Running quality check...
  Fixed 6 issues
Running quality check...
  Fixed 6 issues
Running quality check...
  Fixed 8 issues
Running quality check...
  Fixed 6 issues
Running quality check...
  Fixed 4 issues
Running quality check...
  Fixed 5 issues
Running quality check...
  Fixed 5 issues
Running quality check...
  Fixed 5 issues


In [8]:
for i in qc_people:
    print(i.to_prompt_string())
    print(i.events)
    print('\n')
    print(i.narrative)
    print('\n\n\n\n')

Era: Holocene
Birth year: 427 AD
Age at death: 38
Sex: M
Lifestyle: Rural
Birthplace latitude: 21.99
Birthplace longitude: 45.03
Birthplace modern region: Ar Riyād
Birthplace modern country: Saudi Arabia
Birthplace modern address: محافظة وادي الدواسر, منطقة الرياض, السعودية
Birthplace altitude: 791
Birthplace biome: Deserts & Xeric Shrublands
Birthplace ecotype: Arabian Desert and East Sahero-Arabian xeric shrublands
Birthplace mean temp: 28.01245880126953
Birthplace warmest month temp: 44.28900146484375
Birthplace coldest month temp: 10.133000373840332
Birthplace annual precipitation: 108.0
Birthplace map url: https://www.google.com/maps/place/21.99N+45.03E
Birthplace population density: 0.12
Birthplace cropland percentage: 0.05
Birthplace pasture percentage: 0.0
Birthplace rangeland percentage: 0.0
Birthplace urban_area percentage: 0.0
Birthplace irrigated_not_rice percentage: 0.0
Birthplace irrigated_rice percentage: 0.0
Birthplace rainfed_not_rice percentage: 0.05
Birthplace rainfe

In [9]:
qc_people[0].messages

[{'role': 'user',
  'content': 'This is part of a "random lives" project - simulating randomly selecting a person from human history.\n\nFor this historical person, you will be asked demographic questions. For each question, provide a probability distribution\nrepresenting how common each option was among people matching this person\'s known characteristics\n(birth time, location, age, sex, lifestyle, etc.). Each question should be answered conditional on all previously\ngenerated information.\n\nYOUR GOAL: Estimate the TRUE HISTORICAL FREQUENCIES, not what seems interesting or diverse.\n- If 90% of people in this demographic had characteristic X, assign it 90% probability\n- Focus on ordinary people, not exceptional individuals or elites\n- Boring and repetitive answers are often historically correct\n\nTECHNICAL REQUIREMENTS:\n- Probabilities must sum to exactly 1.0\n- Categories should be mutually exclusive\n- IMPORTANT: You must condition on ALL previously provided information.\n\n