# Unified LLM Generation Pipeline

This notebook provides an interface to the unified generation pipeline.
All logic lives in `generation.py` - this notebook is for interactive testing and batch runs.

The pipeline handles both Paleolithic and Holocene people automatically based on the sampled birth year.

In [1]:
import dill
from tqdm import tqdm

from generation import (
    generate_person,
    generate_batch,
    generate_batch_parallel,
    # Individual steps if needed
    generate_geography,
    generate_demographics,
    generate_structured_incidents,
    generate_historical_context,
    generate_name,
    generate_narrative_plan,
    generate_narrative,
    run_pipeline,
    reset_to_stage
)
from llm_utils import GenerationContext, extract_json

from person import sample_year, sample_person, Person

import copy

# Batch Generation

In [4]:
'''test_people = generate_batch_parallel(n=150, model="gpt-5.2", workers=50)

Sampling 150 people...
Generating 150 people with 50 parallel workers...


100%|█████████████████████████████████████████| 150/150 [13:15<00:00,  5.30s/it]

Done: 150 generated





In [7]:
'''with open('batch2_0100_0249.pkl', 'wb') as f:
    dill.dump(test_people,f)
%run export.py batch2_0100_0249.pkl --start-index 100

Loading people from batch2_0100_0249.pkl...
Found 150 people
Removing existing markdown files from ../_lives_pending...
  Removed 7 files
  Exported 10 people...
  Exported 20 people...
  Exported 30 people...
  Exported 40 people...
  Exported 50 people...
  Exported 60 people...
  Exported 70 people...
  Exported 80 people...
  Exported 90 people...
  Exported 100 people...
  Exported 110 people...
  Exported 120 people...
  Exported 130 people...
  Exported 140 people...
  Exported 150 people...

Successfully exported 150 people to ../_lives_pending
  Index range: 0100 - 0249


# Checking existing generation

In [4]:
with open('batch1_0000_0099.pkl', 'rb') as f:
    people = dill.load(f)

In [7]:
print(people[49].geo_messages)

[{'role': 'user', 'content': 'For a person born in Africa around 168147 BC, provide a probability distribution over sub-regions where they might have lived.\n\nThe sub-regions should be:\n- MUTUALLY EXCLUSIVE and EXHAUSTIVE for inhabited areas\n- Weighted by estimated POPULATION DISTRIBUTION, not archaeological fame\n- Use simple, short names for the different sub-regions. Do not use parentheticals.\n\nConsider:\n- Population density patterns\n- Ice sheets during glacial periods\n- Lower sea levels exposing land bridges and coastlines\n\nTECHNICAL REQUIREMENTS:\n- Probabilities must sum to exactly 1.0\n- Provide 5-10 sub-regions\n\nFORMAT:\nBrief reasoning, then as JSON:\n{"subregions": {"subregion name 1": probability1, ...}}'}, {'role': 'assistant', 'content': 'Around 168,000 years ago (late Middle Pleistocene, Marine Isotope Stage 6), Africa’s population would have been small and patchy, concentrated where water and predictable resources existed. Likely higher densities were along w

In [5]:
for i in range(100):
    print(i)
    sample_person()

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
