# 01 - Single Agent from Personal Data

Can we create an LLM persona from personal data that responds to surveys like the real person?

This notebook tests the idea with a simple approach:
1. Load personal context (CV, writing samples, etc.)
2. Create a persona from that context
3. Ask it questions via a survey
4. Check if the answers match reality (using interactive widgets)
5. Test with fresh questions (blind comparison)
6. Compare CV-based persona vs free-form text persona
7. Test response consistency (same question, different phrasings)

In [1]:
# Setup - ensure we can import from src
import sys
sys.path.insert(0, '../src')

from centuria.models import Persona, Question, Survey
from centuria.persona import create_persona
from centuria.survey import ask_question, run_survey, estimate_survey_cost

## Step 1: Load Context

The persona needs information about the person. More relevant context = better responses. Lets try with my CV:

In [2]:
from centuria.data import load_files

my_context = load_files(['../data/personal/cv.pdf'])

In [3]:
# Create the persona
persona = create_persona(
    name="My Persona",
    context=my_context
)

print(f"Created persona: {persona.name}")
print(f"Context length: {len(persona.context.split())} words")

Created persona: My Persona
Context length: 1149 words


## Understanding the Prompts

Before asking questions, let's see what prompts are being sent to the LLM. This is where prompt engineering improvements can be made to create more accurate personas.

In [4]:
from centuria.survey import (
    SYSTEM_TEMPLATE,
    USER_TEMPLATE_SINGLE_SELECT,
    USER_TEMPLATE_OPEN_ENDED,
    build_system_prompt,
    build_user_prompt,
)

print("=" * 60)
print("SYSTEM PROMPT TEMPLATE")
print("=" * 60)
print(SYSTEM_TEMPLATE)
print("\n")
print("=" * 60)
print("USER PROMPT TEMPLATE (Single Select)")
print("=" * 60)
print(USER_TEMPLATE_SINGLE_SELECT)
print("\n")
print("=" * 60)
print("USER PROMPT TEMPLATE (Open Ended)")
print("=" * 60)
print(USER_TEMPLATE_OPEN_ENDED)

SYSTEM PROMPT TEMPLATE
You are role-playing as {name}. Answer all questions as this person would, based on the context provided.

<context>
{context}
</context>

Guidelines:
- Respond authentically as this person based on the context
- Draw on the context to inform your answers, preferences, and opinions
- If the context doesn't cover something, make reasonable inferences consistent with what you know about this person
- Stay in character throughout


USER PROMPT TEMPLATE (Single Select)
Question: {question}

Options: {options}

Reply with ONLY the option you choose, nothing else.


USER PROMPT TEMPLATE (Open Ended)
Question: {question}

Provide a brief response.


### Example: What the LLM actually sees

Here's what the filled-in prompts look like for this persona. The system prompt contains the persona context, while the user prompt contains just the question.

**Areas for optimization:**
- System prompt: Role-playing instructions, context formatting, guidelines for handling gaps
- User prompt: Question framing, response format instructions
- Context: What personal data to include, how to structure it

In [5]:
# Example question to demonstrate the prompts
example_q = Question(
    id="example",
    text="Which programming language do you prefer?",
    question_type="single_select",
    options=["Python", "JavaScript", "Rust", "Go"]
)

system_prompt = build_system_prompt(persona)
user_prompt = build_user_prompt(example_q)

print("=" * 60)
print("SYSTEM PROMPT (sent once per conversation)")
print("=" * 60)
print(system_prompt[:500] + "..." if len(system_prompt) > 500 else system_prompt)
print(f"\n[... {len(system_prompt)} total characters ...]")
print("\n")
print("=" * 60)
print("USER PROMPT (sent for each question)")
print("=" * 60)
print(user_prompt)

SYSTEM PROMPT (sent once per conversation)
You are role-playing as My Persona. Answer all questions as this person would, based on the context provided.

<context>
SEAN GREAVES Email: seanwgreaves@gmail.com 
GitHub: github.com/ribenamaplesyrup 
Portfolio: seangreaves.xyz 
EMPLOYMENT  
 
APPLIED AI ENGINEER - THE AUTONOMY INSTITUTE (MAY 2023 - PRESENT) 
A leading think-tank developing data-driven tools for sustainable economic planning  
• Led the institute's strategic exploration of generative AI, establishing a new applied AI research c...

[... 8625 total characters ...]


USER PROMPT (sent for each question)
Question: Which programming language do you prefer?

Options: Python, JavaScript, Rust, Go

Reply with ONLY the option you choose, nothing else.


## Step 2: Ask Single Questions

Test the persona with individual questions before running a full survey.

In [6]:
# Single select question
q1 = Question(
    id="q1",
    text="Which programming language do you prefer?",
    question_type="single_select",
    options=["Python", "JavaScript", "Rust", "Go"]
)

response = await ask_question(persona, q1)
print(f"Question: {q1.text}")
print(f"Response: {response.response}")

Question: Which programming language do you prefer?
Response: Python


Now what about an open ended question?

In [7]:
# Open-ended question
q2 = Question(
    id="q2",
    text="What motivates you in your work? Answer in less than 20 words.",
    question_type="open_ended"
)

response = await ask_question(persona, q2)
print(f"Question: {q2.text}")
print(f"Response: {response.response}")

Question: What motivates you in your work? Answer in less than 20 words.
Response: Pioneering AI applications to drive impactful societal change and enhance transparency fuels my passion and motivation.


A crucial piece of data we've skipped over here is **cost**. Its one of the key components within the business case for why we might build around using AI persona's over human personas. If cost was super high, we might opt to use humans, but of course cost is very low!

In [8]:
print(f"\n--- Cost ---")
print(f"Tokens: {response.prompt_tokens:,} prompt + {response.completion_tokens:,} completion")
print(f"Cost: ${response.cost:.6f}")


--- Cost ---
Tokens: 1,808 prompt + 19 completion
Cost: $0.002630


However its perhaps more powerful to estimate how much a query would cost before we send it. That way we can clearly scope the cost of running a survey on a specific number of agents...

In [9]:
# Estimate cost for a single question BEFORE sending it
from centuria.llm import estimate_cost

system = build_system_prompt(persona)
user = build_user_prompt(q2)

estimate = estimate_cost(user, system=system, estimated_completion_tokens=20)

print("COST ESTIMATE (before sending)")
print("="*60)
print(f"Prompt tokens:               {estimate.prompt_tokens:,}")
print(f"Est. completion tokens:      {estimate.completion_tokens:,}")
print(f"Est. cost:                   ${estimate.cost:.6f}")

# Now actually run it and compare
actual_response = await ask_question(persona, q2)

print(f"\nACTUAL COST (after sending)")
print("="*60)
print(f"Prompt tokens:               {actual_response.prompt_tokens:,}")
print(f"Completion tokens:           {actual_response.completion_tokens:,}")
print(f"Actual cost:                 ${actual_response.cost:.6f}")

COST ESTIMATE (before sending)
Prompt tokens:               1,808
Est. completion tokens:      20
Est. cost:                   $0.004760

ACTUAL COST (after sending)
Prompt tokens:               1,808
Completion tokens:           16
Actual cost:                 $0.002440


**Why is the actual cost lower than the estimate?**

The estimate assumes full price for all prompt tokens, but API providers like Anthropic and OpenAI use **prompt caching** server-side. Since the same system prompt (your CV context) was already sent earlier in this notebook, the provider caches that prefix and charges a reduced rate for subsequent requests.

This is good news for the use case: running the same persona against many questions means the context gets cached, and each additional question costs less than the first.

**Note:** This is different from LiteLLM's local caching, which can return identical results at zero cost if you re-run the exact same prompt. The provider-side caching still makes API calls but at reduced input token rates.

## Step 3: Run a Survey

Run multiple questions together as a survey.

In [10]:
# Define the survey
mini_survey = Survey(
    id="persona_validation",
    name="Persona Validation Survey",
    questions=[
        # === CV-INFERABLE QUESTIONS ===
        Question(
            id="career_priority",
            text="What matters most to you in a job?",
            question_type="single_select",
            options=["Impact on society", "Financial compensation", "Learning opportunities", "Work-life balance", "Autonomy and creativity"]
        ),
        Question(
            id="tech_stance",
            text="How do you feel about AI's impact on employment?",
            question_type="single_select",
            options=["Net positive - creates more jobs than it destroys", "Net negative - mass displacement is coming", "Neutral - it will transform jobs but balance out", "Too early to tell"]
        ),
        Question(
            id="work_style",
            text="How do you prefer to approach complex problems?",
            question_type="single_select",
            options=["Deep solo research then collaborate", "Immediate team brainstorming", "Build a prototype first, discuss later", "Map out the theory before any implementation"]
        ),
        Question(
            id="industry_interest",
            text="Which sector would you most like to work in?",
            question_type="single_select",
            options=["Climate/sustainability tech", "Healthcare/biotech", "Finance/fintech", "Government/public sector", "Pure research/academia"]
        ),
        Question(
            id="skill_development",
            text="If you had 3 months to learn anything, what would you prioritise?",
            question_type="single_select",
            options=["Technical depth (new frameworks, languages)", "Domain expertise (economics, biology, etc.)", "Leadership and management skills", "Creative skills (writing, design)", "Starting a business"]
        ),
        
        # === CULTURAL/POLITICAL QUESTIONS ===
        Question(
            id="ubi_stance",
            text="What's your view on Universal Basic Income?",
            question_type="single_select",
            options=["Strongly support - essential for the future", "Cautiously support - worth piloting", "Skeptical - prefer targeted interventions", "Oppose - undermines work incentives"]
        ),
        Question(
            id="privacy_tradeoff",
            text="How do you feel about trading personal data for free services?",
            question_type="single_select",
            options=["Acceptable - fair exchange", "Uncomfortable but unavoidable", "Strongly oppose - privacy is sacred", "Depends entirely on what data and what service"]
        ),
        Question(
            id="institutions_trust",
            text="Which institution do you trust most to act in the public interest?",
            question_type="single_select",
            options=["National government", "Local government", "Large corporations", "Non-profits and NGOs", "Academic institutions", "None of the above"]
        ),
        Question(
            id="optimism_future",
            text="How do you feel about the next 20 years for humanity?",
            question_type="single_select",
            options=["Very optimistic - best time to be alive", "Cautiously optimistic - progress will continue", "Anxious - major challenges ahead", "Pessimistic - decline seems likely"]
        ),
        Question(
            id="controversy_take",
            text="Which controversial opinion are you most sympathetic to?",
            question_type="single_select",
            options=["Most meetings should be emails", "Remote work is strictly better than office work", "Cryptocurrency has no legitimate use case", "Social media does more harm than good", "Economic growth should not be the primary goal"]
        ),
    ]
)

# Estimate cost for the full survey
survey_estimate = estimate_survey_cost(persona, mini_survey, num_agents=1)

print("SURVEY COST ESTIMATE")
print("="*60)
print(f"Questions:                   {len(mini_survey.questions)}")
print(f"Prompt tokens per agent:     {survey_estimate.prompt_tokens:,}")
print(f"Est. completion tokens:      {survey_estimate.completion_tokens:,}")
print(f"Est. cost per agent:         ${survey_estimate.cost_per_agent:.4f}")
print()
print("Scale projections:")
for n in [10, 100, 1000]:
    scaled = estimate_survey_cost(persona, mini_survey, num_agents=n)
    print(f"  {n:>5} agents: ${scaled.total_cost:.2f}")

SURVEY COST ESTIMATE
Questions:                   10
Prompt tokens per agent:     18,417
Est. completion tokens:      50
Est. cost per agent:         $0.0469

Scale projections:
     10 agents: $0.47
    100 agents: $4.69
   1000 agents: $46.92


In [11]:
# Run the survey (this is where the API calls happen)
survey_response = await run_survey(persona, mini_survey)

print(f"Survey: {mini_survey.name}")
print(f"\n{'='*60}")
print("CV-INFERABLE QUESTIONS")
print('='*60)
for r in survey_response.responses[:5]:
    q = next(q for q in mini_survey.questions if q.id == r.question_id)
    print(f"\nQ: {q.text}")
    print(f"A: {r.response}")

print(f"\n{'='*60}")
print("CULTURAL/POLITICAL QUESTIONS (harder to infer)")
print('='*60)
for r in survey_response.responses[5:]:
    q = next(q for q in mini_survey.questions if q.id == r.question_id)
    print(f"\nQ: {q.text}")
    print(f"A: {r.response}")

print(f"\n{'='*60}")
print("ACTUAL COST")
print('='*60)
print(f"Total tokens: {survey_response.total_tokens:,}")
print(f"Total cost:   ${survey_response.total_cost:.4f}")

Survey: Persona Validation Survey

CV-INFERABLE QUESTIONS

Q: What matters most to you in a job?
A: Impact on society

Q: How do you feel about AI's impact on employment?
A: Neutral - it will transform jobs but balance out

Q: How do you prefer to approach complex problems?
A: Deep solo research then collaborate

Q: Which sector would you most like to work in?
A: Climate/sustainability tech

Q: If you had 3 months to learn anything, what would you prioritise?
A: Domain expertise (economics, biology, etc.)

CULTURAL/POLITICAL QUESTIONS (harder to infer)

Q: What's your view on Universal Basic Income?
A: Cautiously support - worth piloting

Q: How do you feel about trading personal data for free services?
A: Depends entirely on what data and what service

Q: Which institution do you trust most to act in the public interest?
A: Non-profits and NGOs

Q: How do you feel about the next 20 years for humanity?
A: Cautiously optimistic - progress will continue

Q: Which controversial opinion ar

## Step 4: Check Accuracy

Compare the persona's answers to your actual answers using the interactive widgets below.

In [12]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Build question lookup and persona answers lookup
question_lookup = {q.id: q for q in mini_survey.questions}
persona_answers = {r.question_id: r.response for r in survey_response.responses}

# Create dropdown widgets for each question
answer_widgets = {}
widget_containers = []

cv_questions = ["career_priority", "tech_stance", "work_style", "industry_interest", "skill_development"]

# CV-inferable questions section
cv_header = widgets.HTML("<h3>CV-Inferable Questions</h3>")
widget_containers.append(cv_header)

for qid in cv_questions:
    q = question_lookup[qid]
    label = widgets.HTML(f"<b>{q.text}</b>")
    dropdown = widgets.Dropdown(
        options=["-- Select your answer --"] + q.options,
        value="-- Select your answer --",
        layout=widgets.Layout(width='400px')
    )
    answer_widgets[qid] = dropdown
    widget_containers.append(widgets.VBox([label, dropdown], layout=widgets.Layout(margin='0 0 15px 0')))

# Cultural/political questions section
cultural_questions = ["ubi_stance", "privacy_tradeoff", "institutions_trust", "optimism_future", "controversy_take"]
cultural_header = widgets.HTML("<h3>Cultural/Political Questions</h3>")
widget_containers.append(cultural_header)

for qid in cultural_questions:
    q = question_lookup[qid]
    label = widgets.HTML(f"<b>{q.text}</b>")
    dropdown = widgets.Dropdown(
        options=["-- Select your answer --"] + q.options,
        value="-- Select your answer --",
        layout=widgets.Layout(width='400px')
    )
    answer_widgets[qid] = dropdown
    widget_containers.append(widgets.VBox([label, dropdown], layout=widgets.Layout(margin='0 0 15px 0')))

# Submit button and output area
submit_button = widgets.Button(
    description="Check Accuracy",
    button_style='primary',
    layout=widgets.Layout(width='150px', margin='20px 0')
)
output_area = widgets.Output()

def on_submit(b):
    with output_area:
        clear_output()
        
        # Collect answers
        my_answers = {}
        unanswered = []
        for qid, w in answer_widgets.items():
            if w.value != "-- Select your answer --":
                my_answers[qid] = w.value
            else:
                unanswered.append(question_lookup[qid].text)
        
        if unanswered:
            print("Please answer all questions before submitting.")
            print(f"\nUnanswered: {len(unanswered)} questions")
            return
        
        # Calculate accuracy
        cv_correct = 0
        cv_total = 0
        cultural_correct = 0
        cultural_total = 0
        
        print("COMPARISON: Persona vs Reality")
        print("=" * 60)
        
        for qid in cv_questions + cultural_questions:
            q = question_lookup[qid]
            persona_answer = persona_answers[qid]
            actual_answer = my_answers[qid]
            match = persona_answer == actual_answer
            
            if qid in cv_questions:
                cv_total += 1
                if match:
                    cv_correct += 1
            else:
                cultural_total += 1
                if match:
                    cultural_correct += 1
            
            status = "✓" if match else "✗"
            print(f"\n{status} {q.text}")
            print(f"   Persona: {persona_answer}")
            print(f"   Actual:  {actual_answer}")
        
        print(f"\n{'=' * 60}")
        print("ACCURACY SUMMARY")
        print("=" * 60)
        print(f"CV-inferable questions:  {cv_correct}/{cv_total} ({cv_correct/cv_total:.0%})")
        print(f"Cultural/political:      {cultural_correct}/{cultural_total} ({cultural_correct/cultural_total:.0%})")
        total = cv_correct + cultural_correct
        all_total = cv_total + cultural_total
        print(f"Overall:                 {total}/{all_total} ({total/all_total:.0%})")

submit_button.on_click(on_submit)

# Display everything
display(widgets.VBox(widget_containers + [submit_button, output_area]))

VBox(children=(HTML(value='<h3>CV-Inferable Questions</h3>'), VBox(children=(HTML(value='<b>What matters most …

## Step 5: Fresh Questions

Let's test the persona with a new set of questions you haven't seen yet. Answer these yourself first, then run the persona and compare.

In [13]:
# Define fresh questions
fresh_questions = [
    Question(
        id="decision_making",
        text="When making important decisions, do you rely more on data or intuition?",
        question_type="single_select",
        options=["Strongly data-driven", "Mostly data with some intuition", "Equal balance", "Mostly intuition with some data", "Strongly intuition-driven"]
    ),
    Question(
        id="team_role",
        text="In a team project, which role do you naturally gravitate towards?",
        question_type="single_select",
        options=["The leader who coordinates", "The ideas person who innovates", "The executor who gets things done", "The analyst who evaluates options", "The communicator who keeps everyone aligned"]
    ),
    Question(
        id="learning_style",
        text="How do you prefer to learn a new technical skill?",
        question_type="single_select",
        options=["Read documentation thoroughly first", "Jump in and learn by doing", "Watch tutorials and videos", "Take a structured course", "Learn from a mentor or colleague"]
    ),
    Question(
        id="risk_appetite",
        text="How would you describe your appetite for career risk?",
        question_type="single_select",
        options=["Very risk-averse - stability is key", "Somewhat cautious - calculated risks only", "Moderate - willing to take reasonable chances", "Risk-tolerant - growth requires discomfort", "Risk-seeking - high risk, high reward"]
    ),
    Question(
        id="work_environment",
        text="What type of work environment brings out your best work?",
        question_type="single_select",
        options=["Quiet, focused solo time", "Collaborative open spaces", "Flexible mix of both", "High-pressure deadlines", "Autonomous with minimal oversight"]
    ),
    Question(
        id="tech_adoption",
        text="How quickly do you adopt new technologies or tools?",
        question_type="single_select",
        options=["Early adopter - try everything new", "Early majority - adopt once proven useful", "Late majority - wait until it's standard", "Laggard - only when absolutely necessary"]
    ),
    Question(
        id="conflict_resolution",
        text="How do you typically handle disagreements at work?",
        question_type="single_select",
        options=["Direct confrontation to resolve quickly", "Seek compromise and middle ground", "Avoid conflict, let things settle naturally", "Escalate to a third party if needed", "Use data and evidence to settle disputes"]
    ),
    Question(
        id="success_definition",
        text="How do you primarily define professional success?",
        question_type="single_select",
        options=["Impact and contribution to society", "Financial achievement and security", "Recognition and reputation in your field", "Personal growth and learning", "Work-life balance and wellbeing"]
    ),
]

fresh_survey = Survey(
    id="fresh_validation",
    name="Fresh Validation Survey",
    questions=fresh_questions
)

print(f"Created {len(fresh_questions)} fresh questions for validation")

Created 8 fresh questions for validation


### Your Answers First

Answer these questions yourself **before** seeing the persona's responses. This prevents bias from seeing what the AI chose.

In [14]:
# Create widgets for your answers (answer BEFORE running the persona)
fresh_answer_widgets = {}
fresh_widget_containers = []

fresh_header = widgets.HTML("<h3>Answer these questions as yourself:</h3>")
fresh_widget_containers.append(fresh_header)

for q in fresh_questions:
    label = widgets.HTML(f"<b>{q.text}</b>")
    dropdown = widgets.Dropdown(
        options=["-- Select your answer --"] + q.options,
        value="-- Select your answer --",
        layout=widgets.Layout(width='400px')
    )
    fresh_answer_widgets[q.id] = dropdown
    fresh_widget_containers.append(widgets.VBox([label, dropdown], layout=widgets.Layout(margin='0 0 15px 0')))

# Lock answers button
lock_button = widgets.Button(
    description="Lock My Answers",
    button_style='success',
    layout=widgets.Layout(width='150px', margin='20px 0')
)
lock_output = widgets.Output()

user_fresh_answers = {}

def on_lock(b):
    with lock_output:
        clear_output()
        unanswered = []
        for qid, w in fresh_answer_widgets.items():
            if w.value == "-- Select your answer --":
                q = next(q for q in fresh_questions if q.id == qid)
                unanswered.append(q.text)
            else:
                user_fresh_answers[qid] = w.value
        
        if unanswered:
            print(f"Please answer all questions first. ({len(unanswered)} remaining)")
            return
        
        # Disable all dropdowns
        for w in fresh_answer_widgets.values():
            w.disabled = True
        lock_button.disabled = True
        
        print("✓ Answers locked! Now run the next cell to see the persona's responses.")

lock_button.on_click(on_lock)

display(widgets.VBox(fresh_widget_containers + [lock_button, lock_output]))

VBox(children=(HTML(value='<h3>Answer these questions as yourself:</h3>'), VBox(children=(HTML(value='<b>When …

### Run the Persona

After locking your answers, run this cell to see how the persona responds to the same questions.

In [16]:
# Run the fresh survey with the persona
if not user_fresh_answers:
    print("Please lock your answers in the previous cell first!")
else:
    print("Running persona on fresh questions...")
    fresh_response = await run_survey(persona, fresh_survey)
    
    # Build comparison
    fresh_persona_answers = {r.question_id: r.response for r in fresh_response.responses}
    
    correct = 0
    total = len(fresh_questions)
    
    print("\nCOMPARISON: Fresh Questions")
    print("=" * 60)
    
    for q in fresh_questions:
        persona_ans = fresh_persona_answers[q.id]
        actual_ans = user_fresh_answers[q.id]
        match = persona_ans == actual_ans
        if match:
            correct += 1
        
        status = "✓" if match else "✗"
        print(f"\n{status} {q.text}")
        print(f"   Persona: {persona_ans}")
        print(f"   Actual:  {actual_ans}")
    
    print(f"\n{'=' * 60}")
    print("FRESH QUESTIONS ACCURACY")
    print("=" * 60)
    print(f"Correct: {correct}/{total} ({correct/total:.0%})")
    print(f"\nCost: ${fresh_response.total_cost:.4f}")

Running persona on fresh questions...

COMPARISON: Fresh Questions

✗ When making important decisions, do you rely more on data or intuition?
   Persona: Mostly data with some intuition
   Actual:  Equal balance

✓ In a team project, which role do you naturally gravitate towards?
   Persona: The ideas person who innovates
   Actual:  The ideas person who innovates

✗ How do you prefer to learn a new technical skill?
   Persona: Jump in and learn by doing
   Actual:  Watch tutorials and videos

✗ How would you describe your appetite for career risk?
   Persona: Moderate - willing to take reasonable chances
   Actual:  Somewhat cautious - calculated risks only

✓ What type of work environment brings out your best work?
   Persona: Flexible mix of both
   Actual:  Flexible mix of both

✗ How quickly do you adopt new technologies or tools?
   Persona: Early adopter - try everything new
   Actual:  Late majority - wait until it's standard

✗ How do you typically handle disagreements at work

## Results Summary

How well did the personas match reality?

**Key observations:**
- CV-inferable questions (career, work style, skills) should have higher accuracy since the context contains relevant information
- Cultural/political questions test whether the persona can extrapolate from limited context
- Fresh questions provide a blind test to avoid confirmation bias
- The CV vs free-form comparison tests whether structured data produces better personas than unstructured self-description
- Response consistency measures how stable the persona's answers are across different phrasings of the same question

See `01_notes.md` for detailed commentary on limitations and improvements.

## Step 6: Free-form Context Comparison

How much does structured context (like a CV) matter compared to free-form text? 

In this step, you'll write a brief description of yourself in your own words. We'll create a new persona from this text and compare its accuracy against the CV-based persona on the same questions.

**Hypothesis:** The CV provides structured, factual information that may lead to higher accuracy on career-related questions, while free-form text might capture personality and values better.

In [17]:
# Free-form text input widget
freeform_header = widgets.HTML("""
<h3>Describe yourself in your own words</h3>
<p style="color: #666;">Write a few paragraphs about yourself - your background, interests, values, 
work style, opinions, etc. Don't look at the survey questions while writing this. 
Aim for 100-300 words.</p>
""")

freeform_textarea = widgets.Textarea(
    placeholder='Write about yourself here...\n\nFor example:\n- What do you do for work?\n- What are your interests and passions?\n- What values guide your decisions?\n- How would friends describe you?\n- What are your views on technology, society, work?',
    layout=widgets.Layout(width='600px', height='250px')
)

word_count_label = widgets.HTML("<p><i>Word count: 0</i></p>")

def update_word_count(change):
    words = len(change['new'].split())
    word_count_label.value = f"<p><i>Word count: {words}</i></p>"

freeform_textarea.observe(update_word_count, names='value')

# Store the context when submitted
freeform_context = {"text": ""}

submit_freeform_button = widgets.Button(
    description="Create Persona",
    button_style='primary',
    layout=widgets.Layout(width='150px', margin='10px 0')
)
freeform_output = widgets.Output()

def on_submit_freeform(b):
    with freeform_output:
        clear_output()
        text = freeform_textarea.value.strip()
        
        if len(text.split()) < 20:
            print("Please write at least 20 words to create a meaningful persona.")
            return
        
        freeform_context["text"] = text
        freeform_textarea.disabled = True
        submit_freeform_button.disabled = True
        
        print(f"✓ Context saved! ({len(text.split())} words)")
        print("\nRun the next cell to create the persona and compare accuracy.")

submit_freeform_button.on_click(on_submit_freeform)

display(widgets.VBox([
    freeform_header,
    freeform_textarea,
    word_count_label,
    submit_freeform_button,
    freeform_output
]))

VBox(children=(HTML(value='\n<h3>Describe yourself in your own words</h3>\n<p style="color: #666;">Write a few…

In [18]:
# Create persona from free-form text and run the survey
if not freeform_context["text"]:
    print("Please submit your free-form description in the previous cell first!")
else:
    # Create the free-form persona
    freeform_persona = create_persona(
        name="Free-form Persona",
        context=freeform_context["text"]
    )
    
    print(f"Created persona from free-form text")
    print(f"Context length: {len(freeform_persona.context.split())} words")
    print(f"(CV persona had: {len(persona.context.split())} words)")
    print("\nRunning survey on free-form persona...")
    
    # Run the same mini_survey on the free-form persona
    freeform_survey_response = await run_survey(freeform_persona, mini_survey)
    
    print("✓ Survey complete!")

Created persona from free-form text
Context length: 20 words
(CV persona had: 1149 words)

Running survey on free-form persona...
✓ Survey complete!


### Compare Accuracy: CV vs Free-form

Now let's compare how well each persona matched your actual answers. You'll need to have completed Step 4 first (where you submitted your actual answers).

In [20]:
# Compare CV persona vs Free-form persona accuracy
# First, collect the user's actual answers from the widgets in Step 4
user_actual_answers = {}
for qid, w in answer_widgets.items():
    if w.value != "-- Select your answer --":
        user_actual_answers[qid] = w.value

if not user_actual_answers:
    print("Please complete Step 4 first (submit your actual answers using the widgets).")
elif 'freeform_survey_response' not in dir():
    print("Please run the previous cell first to survey the free-form persona.")
else:
    # Build answer lookups
    cv_answers = {r.question_id: r.response for r in survey_response.responses}
    freeform_answers = {r.question_id: r.response for r in freeform_survey_response.responses}
    
    # Calculate accuracy for each
    cv_cv_correct = 0  # CV persona on CV-inferable questions
    cv_cultural_correct = 0
    ff_cv_correct = 0  # Free-form persona on CV-inferable questions  
    ff_cultural_correct = 0
    
    print("SIDE-BY-SIDE COMPARISON")
    print("=" * 80)
    print(f"{'Question':<35} {'Your Answer':<20} {'CV':<8} {'Free-form':<8}")
    print("-" * 80)
    
    all_questions = cv_questions + cultural_questions
    
    for qid in all_questions:
        if qid not in user_actual_answers:
            continue
            
        q = question_lookup[qid]
        actual = user_actual_answers[qid]
        cv_ans = cv_answers[qid]
        ff_ans = freeform_answers[qid]
        
        cv_match = cv_ans == actual
        ff_match = ff_ans == actual
        
        # Track scores
        if qid in cv_questions:
            if cv_match: cv_cv_correct += 1
            if ff_match: ff_cv_correct += 1
        else:
            if cv_match: cv_cultural_correct += 1
            if ff_match: ff_cultural_correct += 1
        
        cv_status = "✓" if cv_match else "✗"
        ff_status = "✓" if ff_match else "✗"
        
        # Truncate question text for display
        q_short = q.text[:33] + ".." if len(q.text) > 35 else q.text
        actual_short = actual[:18] + ".." if len(actual) > 20 else actual
        
        print(f"{q_short:<35} {actual_short:<20} {cv_status:<8} {ff_status:<8}")
    
    # Summary statistics
    cv_total_cv = len([q for q in cv_questions if q in user_actual_answers])
    cv_total_cultural = len([q for q in cultural_questions if q in user_actual_answers])
    
    print("\n" + "=" * 80)
    print("ACCURACY SUMMARY")
    print("=" * 80)
    print(f"{'Category':<25} {'CV Persona':<20} {'Free-form Persona':<20}")
    print("-" * 80)
    
    if cv_total_cv > 0:
        cv_pct = cv_cv_correct / cv_total_cv
        ff_pct = ff_cv_correct / cv_total_cv
        print(f"{'CV-inferable questions':<25} {cv_cv_correct}/{cv_total_cv} ({cv_pct:.0%}){'':<8} {ff_cv_correct}/{cv_total_cv} ({ff_pct:.0%})")
    
    if cv_total_cultural > 0:
        cv_pct = cv_cultural_correct / cv_total_cultural
        ff_pct = ff_cultural_correct / cv_total_cultural
        print(f"{'Cultural/political':<25} {cv_cultural_correct}/{cv_total_cultural} ({cv_pct:.0%}){'':<8} {ff_cultural_correct}/{cv_total_cultural} ({ff_pct:.0%})")
    
    total_cv = cv_cv_correct + cv_cultural_correct
    total_ff = ff_cv_correct + ff_cultural_correct
    total_q = cv_total_cv + cv_total_cultural
    
    if total_q > 0:
        print("-" * 80)
        cv_pct = total_cv / total_q
        ff_pct = total_ff / total_q
        print(f"{'OVERALL':<25} {total_cv}/{total_q} ({cv_pct:.0%}){'':<8} {total_ff}/{total_q} ({ff_pct:.0%})")
        
        # Verdict
        print("\n" + "=" * 80)
        print("VERDICT")
        print("=" * 80)
        if total_cv > total_ff:
            diff = total_cv - total_ff
            print(f"CV persona was MORE accurate by {diff} question(s)")
        elif total_ff > total_cv:
            diff = total_ff - total_cv
            print(f"Free-form persona was MORE accurate by {diff} question(s)")
        else:
            print("Both personas achieved the SAME accuracy!")
        
        print(f"\nContext comparison:")
        print(f"  CV context:        {len(persona.context.split()):,} words")
        print(f"  Free-form context: {len(freeform_persona.context.split()):,} words")

SIDE-BY-SIDE COMPARISON
Question                            Your Answer          CV       Free-form
--------------------------------------------------------------------------------
What matters most to you in a job?  Impact on society    ✓        ✓       
How do you feel about AI's impact.. Net negative - mas.. ✗        ✗       
How do you prefer to approach com.. Immediate team bra.. ✗        ✗       
Which sector would you most like .. Healthcare/biotech   ✗        ✗       
If you had 3 months to learn anyt.. Domain expertise (.. ✓        ✗       
What's your view on Universal Bas.. Strongly support -.. ✗        ✗       
How do you feel about trading per.. Acceptable - fair .. ✗        ✗       
Which institution do you trust mo.. National government  ✗        ✗       
How do you feel about the next 20.. Very optimistic - .. ✗        ✗       
Which controversial opinion are y.. Most meetings shou.. ✗        ✓       

ACCURACY SUMMARY
Category                  CV Persona           Free

## Step 7: Response Consistency Test

A reliable persona should give consistent answers when asked the same question in different ways. This step tests **response consistency** - the percentage of times the persona gives the same answer when the same underlying question is rephrased.

We'll:
1. Define 10 base questions
2. Create 10 phrasings of each question (same meaning, different wording)
3. Run all 100 questions through the persona
4. Calculate consistency per question and overall

**Why this matters:** Low consistency suggests the persona's answers are sensitive to question wording rather than reflecting stable preferences. High consistency indicates robust, reliable responses.

In [21]:
# Define 10 base questions, each with 10 different phrasings
# All phrasings share the same options to enable direct comparison

consistency_questions = {
    "work_motivation": {
        "options": ["Money and financial security", "Making a positive impact", "Personal growth and learning", "Recognition and status", "Work-life balance"],
        "phrasings": [
            "What motivates you most in your career?",
            "What's the primary driver of your professional life?",
            "When choosing a job, what matters most to you?",
            "What gets you out of bed for work each morning?",
            "What's your main motivation at work?",
            "What do you value most in your professional life?",
            "What's the biggest factor in your career satisfaction?",
            "What drives your career decisions?",
            "What's most important to you in a job?",
            "What keeps you engaged in your work?",
        ]
    },
    "weekend_preference": {
        "options": ["Socializing with friends", "Quiet time alone", "Outdoor activities", "Creative hobbies", "Productive tasks and errands"],
        "phrasings": [
            "How do you prefer to spend your weekends?",
            "What's your ideal weekend activity?",
            "When Saturday comes, what do you usually do?",
            "How do you typically unwind on weekends?",
            "What's your go-to weekend plan?",
            "How do you like to spend your free time on weekends?",
            "What does a perfect weekend look like for you?",
            "When you have time off, how do you spend it?",
            "What's your preferred way to enjoy the weekend?",
            "How do you usually occupy your weekends?",
        ]
    },
    "conflict_approach": {
        "options": ["Address it directly and immediately", "Take time to cool off first", "Seek a mediator or third party", "Avoid confrontation if possible", "Focus on finding compromise"],
        "phrasings": [
            "How do you typically handle conflict?",
            "What's your approach when disagreements arise?",
            "How do you deal with confrontation?",
            "When conflict occurs, what do you do?",
            "What's your conflict resolution style?",
            "How do you respond to disagreements?",
            "What's your strategy for handling disputes?",
            "When you're in conflict, how do you react?",
            "How do you navigate disagreements with others?",
            "What's your usual response to conflict situations?",
        ]
    },
    "learning_method": {
        "options": ["Reading books and articles", "Hands-on experimentation", "Video tutorials and courses", "Discussion with experts", "Structured formal education"],
        "phrasings": [
            "How do you prefer to learn new things?",
            "What's your ideal learning method?",
            "How do you best absorb new information?",
            "What's your preferred way to acquire new skills?",
            "How do you typically approach learning?",
            "What learning style works best for you?",
            "How do you like to pick up new knowledge?",
            "What's your go-to method for learning?",
            "How do you prefer to be taught new things?",
            "What's your most effective learning approach?",
        ]
    },
    "decision_style": {
        "options": ["Careful analysis of all options", "Trust my gut instinct", "Seek advice from others", "Make quick decisions and adapt", "Delay until absolutely necessary"],
        "phrasings": [
            "How do you make important decisions?",
            "What's your decision-making style?",
            "How do you approach big choices?",
            "What's your process for making decisions?",
            "How do you typically decide on important matters?",
            "What's your approach to decision-making?",
            "How do you handle major decisions?",
            "What's your strategy when facing tough choices?",
            "How do you go about making significant decisions?",
            "What's your method for reaching important decisions?",
        ]
    },
    "stress_response": {
        "options": ["Exercise or physical activity", "Talk to friends or family", "Spend time alone to recharge", "Distract myself with entertainment", "Work through it systematically"],
        "phrasings": [
            "How do you cope with stress?",
            "What do you do when you're stressed?",
            "How do you handle stressful situations?",
            "What's your go-to stress relief method?",
            "How do you manage stress in your life?",
            "What helps you deal with stress?",
            "How do you typically respond to stress?",
            "What's your strategy for handling stress?",
            "How do you unwind when stressed?",
            "What do you do to relieve stress?",
        ]
    },
    "team_preference": {
        "options": ["Lead and direct the team", "Collaborate as an equal member", "Work independently within the team", "Support and help others succeed", "Focus on specialized expertise"],
        "phrasings": [
            "What role do you prefer in a team?",
            "How do you like to work in group settings?",
            "What's your preferred team dynamic?",
            "How do you typically function in a team?",
            "What role suits you best in collaborative work?",
            "How do you prefer to contribute to a team?",
            "What's your ideal position in a group project?",
            "How do you like to participate in team efforts?",
            "What team role do you gravitate towards?",
            "How do you prefer to engage in teamwork?",
        ]
    },
    "change_attitude": {
        "options": ["Embrace it enthusiastically", "Accept it cautiously", "Resist unless necessary", "Adapt quickly and move on", "Analyze before responding"],
        "phrasings": [
            "How do you respond to change?",
            "What's your attitude towards change?",
            "How do you handle unexpected changes?",
            "What's your reaction when things change?",
            "How do you deal with change in your life?",
            "What's your approach to handling change?",
            "How do you typically respond to new situations?",
            "What's your attitude when facing change?",
            "How do you adapt to changes?",
            "What's your typical response to change?",
        ]
    },
    "success_measure": {
        "options": ["Financial achievement", "Personal happiness", "Impact on others", "Professional recognition", "Work-life balance"],
        "phrasings": [
            "How do you measure success?",
            "What does success mean to you?",
            "How do you define personal success?",
            "What's your measure of a successful life?",
            "How do you know when you've succeeded?",
            "What indicates success to you?",
            "How do you evaluate your own success?",
            "What's your personal definition of success?",
            "How do you judge whether you're successful?",
            "What does being successful look like to you?",
        ]
    },
    "communication_style": {
        "options": ["Direct and to the point", "Diplomatic and tactful", "Detailed and thorough", "Casual and friendly", "Formal and professional"],
        "phrasings": [
            "How would you describe your communication style?",
            "What's your preferred way of communicating?",
            "How do you typically express yourself?",
            "What's your communication approach?",
            "How do you prefer to convey information?",
            "What's your style when communicating with others?",
            "How do you usually communicate?",
            "What communication style fits you best?",
            "How would others describe your communication?",
            "What's your natural way of communicating?",
        ]
    },
}

# Count total questions
total_base = len(consistency_questions)
total_variations = sum(len(q["phrasings"]) for q in consistency_questions.values())
print(f"Consistency test configuration:")
print(f"  Base questions:     {total_base}")
print(f"  Phrasings each:     {len(list(consistency_questions.values())[0]['phrasings'])}")
print(f"  Total API calls:    {total_variations}")

Consistency test configuration:
  Base questions:     10
  Phrasings each:     10
  Total API calls:    100


In [22]:
# Build 10 surveys, each containing one phrasing of each base question
# This groups questions by survey rather than by base question to reduce prompt caching effects

consistency_surveys = []

for survey_idx in range(10):
    questions = []
    for base_id, q_data in consistency_questions.items():
        questions.append(Question(
            id=f"{base_id}_v{survey_idx}",
            text=q_data["phrasings"][survey_idx],
            question_type="single_select",
            options=q_data["options"]
        ))
    
    consistency_surveys.append(Survey(
        id=f"consistency_survey_{survey_idx}",
        name=f"Consistency Survey {survey_idx + 1}",
        questions=questions
    ))

print(f"Created {len(consistency_surveys)} surveys with {len(consistency_surveys[0].questions)} questions each")

# Estimate cost
total_questions = sum(len(s.questions) for s in consistency_surveys)
single_estimate = estimate_survey_cost(persona, consistency_surveys[0], num_agents=1)
print(f"\nEstimated cost: ${single_estimate.cost_per_agent * 10:.4f} for all {total_questions} questions")

Created 10 surveys with 10 questions each

Estimated cost: $0.4672 for all 100 questions


In [23]:
# Run all consistency surveys
print("Running consistency test (100 questions across 10 surveys)...")
print("=" * 60)

consistency_results = {}  # base_question_id -> list of responses
total_cost = 0

for i, survey in enumerate(consistency_surveys):
    print(f"Survey {i + 1}/10...", end=" ", flush=True)
    response = await run_survey(persona, survey)
    total_cost += response.total_cost
    
    # Collect responses by base question
    for r in response.responses:
        # Extract base question id (remove _v0, _v1, etc.)
        base_id = r.question_id.rsplit("_v", 1)[0]
        if base_id not in consistency_results:
            consistency_results[base_id] = []
        consistency_results[base_id].append(r.response)
    
    print(f"done (${response.total_cost:.4f})")

print("=" * 60)
print(f"Total cost: ${total_cost:.4f}")

Running consistency test (100 questions across 10 surveys)...
Survey 1/10... done ($0.0254)
Survey 2/10... done ($0.0254)
Survey 3/10... done ($0.0254)
Survey 4/10... done ($0.0254)
Survey 5/10... done ($0.0275)
Survey 6/10... done ($0.0254)
Survey 7/10... done ($0.0254)
Survey 8/10... done ($0.0254)
Survey 9/10... done ($0.0254)
Survey 10/10... done ($0.0254)
Total cost: $0.2563


In [24]:
# Calculate consistency metrics
from collections import Counter

print("RESPONSE CONSISTENCY ANALYSIS")
print("=" * 80)

question_consistency = {}
detailed_results = []

for base_id, responses in consistency_results.items():
    # Count frequency of each response
    counter = Counter(responses)
    most_common_response, most_common_count = counter.most_common(1)[0]
    
    # Consistency = % of responses that match the most common response
    consistency = most_common_count / len(responses)
    question_consistency[base_id] = consistency
    
    # Get the first phrasing as the "base question" label
    base_question = consistency_questions[base_id]["phrasings"][0]
    
    detailed_results.append({
        "base_id": base_id,
        "question": base_question,
        "consistency": consistency,
        "most_common": most_common_response,
        "distribution": dict(counter),
        "total": len(responses)
    })

# Sort by consistency (lowest first to highlight problem areas)
detailed_results.sort(key=lambda x: x["consistency"])

# Display results
print(f"\n{'Question':<45} {'Consistency':>12} {'Most Common Response':<25}")
print("-" * 80)

for r in detailed_results:
    q_short = r["question"][:43] + ".." if len(r["question"]) > 45 else r["question"]
    mc_short = r["most_common"][:23] + ".." if len(r["most_common"]) > 25 else r["most_common"]
    pct = f"{r['consistency']:.0%}"
    print(f"{q_short:<45} {pct:>12} {mc_short:<25}")

# Overall consistency
overall_consistency = sum(question_consistency.values()) / len(question_consistency)

print("\n" + "=" * 80)
print("CONSISTENCY SUMMARY")
print("=" * 80)
print(f"Overall consistency:     {overall_consistency:.1%}")
print(f"Most consistent:         {max(question_consistency.values()):.0%} ({max(question_consistency, key=question_consistency.get)})")
print(f"Least consistent:        {min(question_consistency.values()):.0%} ({min(question_consistency, key=question_consistency.get)})")

# Interpretation
print("\n" + "=" * 80)
print("INTERPRETATION")
print("=" * 80)
if overall_consistency >= 0.9:
    print("Excellent consistency - the persona gives highly stable responses.")
elif overall_consistency >= 0.7:
    print("Good consistency - responses are reasonably stable with some variation.")
elif overall_consistency >= 0.5:
    print("Moderate consistency - notable variation in responses to rephrased questions.")
else:
    print("Low consistency - responses vary significantly based on question wording.")

RESPONSE CONSISTENCY ANALYSIS

Question                                       Consistency Most Common Response     
--------------------------------------------------------------------------------
What role do you prefer in a team?                     60% Collaborate as an equal..
How do you respond to change?                          70% Analyze before responding
How do you cope with stress?                           80% Work through it systema..
How do you typically handle conflict?                  90% Focus on finding compro..
How would you describe your communication s..          90% Detailed and thorough    
What motivates you most in your career?               100% Making a positive impact 
How do you prefer to spend your weekends?             100% Creative hobbies         
How do you prefer to learn new things?                100% Hands-on experimentation 
How do you make important decisions?                  100% Careful analysis of all..
How do you measure success?           

### Consistency Insights

**What affects consistency?**
- Questions with clear, factual answers (based on CV content) tend to be more consistent
- Abstract or value-based questions may show more variation
- Questions where multiple options could reasonably apply tend to be less consistent

**Implications for survey design:**
- High consistency questions are more reliable for population-level insights
- Low consistency questions may need clearer framing or should be weighted differently
- Consider running multiple phrasings and taking the modal response for important questions

In [25]:
# Detailed breakdown - show response distribution for each question
print("DETAILED RESPONSE DISTRIBUTIONS")
print("=" * 80)
print("Shows all responses given across the 10 phrasings of each question.\n")

for r in detailed_results:
    print(f"Q: {r['question']}")
    print(f"   Consistency: {r['consistency']:.0%}")
    print(f"   Responses:")
    
    # Sort by frequency
    sorted_dist = sorted(r['distribution'].items(), key=lambda x: -x[1])
    for response, count in sorted_dist:
        bar = "█" * count
        print(f"      {count:>2}x | {bar:<10} | {response}")
    print()

DETAILED RESPONSE DISTRIBUTIONS
Shows all responses given across the 10 phrasings of each question.

Q: What role do you prefer in a team?
   Consistency: 60%
   Responses:
       6x | ██████     | Collaborate as an equal member
       3x | ███        | Lead and direct the team
       1x | █          | Focus on specialized expertise

Q: How do you respond to change?
   Consistency: 70%
   Responses:
       7x | ███████    | Analyze before responding
       2x | ██         | Adapt quickly and move on
       1x | █          | Embrace it enthusiastically

Q: How do you cope with stress?
   Consistency: 80%
   Responses:
       8x | ████████   | Work through it systematically
       1x | █          | Exercise or physical activity
       1x | █          | Spend time alone to recharge

Q: How do you typically handle conflict?
   Consistency: 90%
   Responses:
       9x | █████████  | Focus on finding compromise
       1x | █          | Address it directly and immediately

Q: How would you de