# Unified LLM Generation Pipeline

This notebook provides an interface to the unified generation pipeline.
All logic lives in `generation.py` - this notebook is for interactive testing and batch runs.

The pipeline handles both Paleolithic and Holocene people automatically based on the sampled birth year.

In [1]:
import dill
from tqdm import tqdm

from generation import (
    generate_person,
    generate_batch,
    generate_batch_parallel,
    # Individual steps if needed
    generate_geography,
    generate_demographics,
    generate_structured_incidents,
    generate_historical_context,
    generate_name,
    generate_narrative_plan,
    generate_narrative,
    quality_check,
    reset_to_stage
)
from llm_utils import GenerationContext, extract_json

from person import sample_person, Person

import copy

# Batch Generation

In [2]:
'''# cost prio, $5.03 (prev $1.24)
test_people_2 = generate_batch_parallel(n=20, model="gpt-5.2", workers=20)'''

'# cost prio, $5.03 (prev $1.24)\ntest_people_2 = generate_batch_parallel(n=20, model="gpt-5.2", workers=20)'

In [3]:
'''with open('test_examples_2.pkl', 'wb') as f:
    dill.dump(test_people_2 + olds,f)
%run export.py test_examples_2.pkl'''

"with open('test_examples_2.pkl', 'wb') as f:\n    dill.dump(test_people_2 + olds,f)\n%run export.py test_examples_2.pkl"

In [4]:
with open('test_examples_2.pkl', 'rb') as f:
       examples = dill.load(f)

# Narrative Prompt Experimentation (V2)

In [25]:
NARRATIVE_BASE_PROMPT_V2 = '''Write a narrative biography for this historical person.

TASK:
- Use the provided name
- Weave demographic details in naturally; you don't need to include everything
- Give names to recurring people (family, spouse, close associates); minor one-off figures can remain unnamed
- Avoid anachronisms

VOICE:
- Plain contemporary English, no subheadings
- Omniscient narrator: state facts confidently, no hedging ("likely", "probably", "perhaps", "may have")
- When data presents options or ranges, make concrete choices
- Vary sentence rhythm. Mix lengths. Fragments sometimes.

TIME
- Write as continuous narrative, not discrete blocks, following chronological order
- Vary how you mark time passing: ages, seasons, life events, relative time, or dates. For people born before 1000 BC, prefer ages and relative time over absolute dates.
- Discrete events should be assigned to specific times. If the narrative plan is not specific, choose an option and make the narrative specific. 
- Vary paragraph openings. Avoid too frequently beginning with temporal phrases, and avoid redundant temporal phrases

PROSE STYLE:
- Write actively and directly. State facts plainly and concretely.
- Avoid passive, abstract, or distanced descriptions.
- Include specific details so the reader can see what you mean.
- No figurative language (no metaphors, similes, or personification)
- No archaic inversions, poetic flourishes, or proverbs
- State what happened. Do not comment on it, interpret it, or frame it poetically.
- Do not write lines that exist to sound wise or poignant. If a sentence is reaching for literary effect, cut it or replace it with a plain one.
- Describe what was there, not what wasn't. Avoid defining people or situations by negatives.

HISTORICAL INTEGRATION:
- Include historical framing throughout so readers can follow
- Assume an intelligent reader who can look things up but isn't a specialist
- Early in the narrative, briefly orient the reader to the political and cultural situation: what polity or power structure governed this area, what ethnic/linguistic group the person belonged to, and how that world related to larger historical forces. Keep it short and concrete—a sentence or two, not a paragraph of background.

PERSONALITY:
- Show traits through action, not summary
- Do not name personality traits. Show the behavior and let the reader infer the trait.
- Don't soften negative traits—low agreeableness causes friction, low conscientiousness causes real failures, low intelligence shows in limited understanding and poor decision-making

AVOID THESE PHRASES:
"life went on", "work continued", "people remembered", "was known for", "in those days", "as was common", "like so many", "he suffered", "it was not X, it was Y", "A did not do X. A did Y", "no kings ruled"
'''

HUMAN_PARTICULARITY_PROMPT_V2 = '''
HUMAN PARTICULARITY:
Include 2-4 specific human details that bring the person to life:
- Friendships: Not just family/spouse, but people they chose to spend time with, who made them laugh or who they trusted
- Small pleasures: What they enjoyed - a particular food, a time of day, a seasonal activity, hobbies, stories they told, gambling, singing, a place they liked to sit
- Habits or routines: Morning rituals, how they did their work, where they went when troubled
- Things that annoyed them or they avoided
- Sources of quiet pride or satisfaction
- Humor: Moments of teasing, jokes, laughter
These details should feel plausible for the time, place, and personality, but must be made specific and idiosyncratic.
'''

AGE_PROMPTS_V2 = {
    "infant": """
LENGTH: 150-300 words

FOCUS:
- The infant cannot express personality—focus on parents, household, circumstances
- Can be told entirely from the parents' perspective
- A few vivid details about the household and the infant's brief life
""",

    "child": """
LENGTH: 200-400 words

FOCUS:
- Personality can show in limited, age-appropriate ways (a habit, a preference, how they played)
- Focus on a few vivid moments rather than a full arc
- Show the household and family context
""",

    "adolescent": """
LENGTH: 400-700 words

FOCUS:
- Show emerging adult roles and relationships
- Personality should be visible through behavior and choices
- Include family dynamics and any work or responsibilities
""",

    "adult": """
LENGTH: 600-1000 words

FOCUS:
- Mosaic of everyday episodes
- Include relationships and family changes
- Show work, community, and how they navigated their world
"""
}

ALIVE_PROMPT_V2 = """
ENDING:
- End in an ordinary moment, not on a cliffhanger or dramatic note
- The narrative must end no later than late 2025—do not project events into the future
- End with a present-tense snapshot of their current life
"""

DEAD_PROMPT_V2 = """
ENDING:
- Include the death concretely
- Do not dwell on aftermath or sentimentalize
- Avoid: "breathing slowed", "fever rose/burned", "eyes closed", "slipped away", "grew weaker", "stopped breathing"
"""

In [26]:
def _age_category_v2(person):
    age = person.age_at_death if person.age_at_death != "alive" else 30
    if age < 3: return "infant"
    if age < 11: return "child"
    if age < 19: return "adolescent"
    return "adult"

def generate_narrative_v2(person, ctx):
    age_cat = _age_category_v2(person)
    
    full_prompt = NARRATIVE_BASE_PROMPT_V2
    if age_cat in ["adolescent", "adult"]:
        full_prompt += HUMAN_PARTICULARITY_PROMPT_V2
    full_prompt += AGE_PROMPTS_V2[age_cat]
    full_prompt += ALIVE_PROMPT_V2 if person.is_alive() else DEAD_PROMPT_V2
    
    if hasattr(person, 'narrative_plan') and person.narrative_plan:
        full_prompt += "\n\nIMPORTANT: Follow the narrative plan you created earlier. Use the exact timeline for siblings, children, and incidents. The plan ensures temporal consistency."
    
    if hasattr(person, 'structured_incidents') and person.structured_incidents:
        incidents_str = "\n".join(f"- {e.get('event', '')} ({e.get('timing', 'unknown')})"
                                  for e in person.structured_incidents)
        full_prompt += "\n\nPersonal incidents to incorporate:\n" + incidents_str
    
    if hasattr(person, 'historical_context') and person.historical_context:
        context_str = "\n".join(f"- {e.get('event', '')} ({e.get('timing', 'unknown')})"
                                for e in person.historical_context)
        full_prompt += "\n\nHistorical context to incorporate:\n" + context_str
    
    person.messages.append({"role": "user", "content": full_prompt})
    person.narrative = ctx.call(person.messages)
    person.messages.append({"role": "assistant", "content": person.narrative})

In [None]:
for person in examples[:5]:
    reset_to_stage(person, 'narrative')
    
    ctx = GenerationContext(model="gpt-5.2", quiet=False, show_cost=True)
    generate_narrative_v2(person, ctx)
    ctx.finish()


=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 15,399
  Cached: 13,952 (90.6%)
Output tokens: 1,794
Total cost: $0.0215
Avg per request: 15399 in, 1794 out

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 17,204
  Cached: 14,976 (87.0%)
Output tokens: 1,981
Total cost: $0.0245
Avg per request: 17204 in, 1981 out

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 16,572
  Cached: 14,976 (90.4%)
Output tokens: 1,924
Total cost: $0.0231
Avg per request: 16572 in, 1924 out

=== Cost Summary: gpt-5.2 ===
Requests: 1
Input tokens: 6,643
  Cached: 5,760 (86.7%)
Output tokens: 292
Total cost: $0.0047
Avg per request: 6643 in, 292 out


In [None]:
with open('test_examples_3.pkl', 'wb') as f:
    dill.dump(examples,f)
%run export.py test_examples_3.pkl