<a target="_blank" href="https://colab.research.google.com/github/okareo-ai/okareo-python-sdk/blob/main/examples/voice_simulation.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Build, Measure, and Iterate Voice Agents with Okareo Simulations

Simulate and evaluate a **voice agent** in a controlled, repeatable **multi-turn** conversation. A **Voice Target** (the agent under test) speaks with a **Driver** (a simulated user). This notebook walks through **setup**, **configuration**, **running a simulation**, and **inspecting audio responses, transcripts, and timing events**.

## Objectives

After completing this notebook, you will be able to:

- **Configure** a **Voice Target** for multi-turn evaluation.
- **Define** a **Driver** that simulates a user in a voice conversation.
- **Create** a **Scenario** that provides seed cases for the interaction.
- **Launch** a **Simulation** and capture **audio responses** and **transcripts**.
- **Inspect** qualitative audio characteristics and quantitative **timing signals** (e.g., **time to first audio**, **end-to-end synthesis time**, **output duration**).
- **Iterate** on parameters, **compare runs**, and **save artifacts** for follow-up analysis.


> üìù *If you‚Äôre new to Okareo Simulations, read the **Primer** embedded below and/or our in-depth documentation: [https://docs.okareo.com/docs/simulation/introduction](https://docs.okareo.com/docs/simulation/introduction)*

---

## üìö Primer snapshot

<img src="./simulation-diagram.png" alt="My Image" style="width:50%;">

1. **Target** ‚Äì defines the *voice agent* we are testing.
2. **Driver** ‚Äì defines a *simulated user*.
3. **Scenario** ‚Äì per‚Äëtest inputs + the expected behavior description.
4. **Simulation‚ÄØExecution** ‚Äì Okareo orchestrates the turns and records a simulation.
5. **Evaluation** ‚Äì Judge & Code‚ÄØChecks score each simulation.

---

## Prerequisites
- An **Okareo API key** (from the Okareo dashboard).
- An **OpenAI** and  **Deepgram** API key.
- Python 3.9+.
- A microphone is **not** required; these simulations are **model ‚Üî model**.

## Conversation design
The notebook uses two prompts:

- **`PERSISTENT_PROMPT` (system guidance for the *voice agent*)**: concise, polite, masks sensitive numbers, asks short clarifying questions.
- **`DRIVER_PROMPT_TEMPLATE` (persona for the *simulated customer*)**: a structured template that **only asks questions**, pushes for **return/refund policy specifics**, and includes a turn‚Äëend checklist to keep the driver on‚Äëtask.

Together they produce a realistic customer ‚Üî agent flow that you can reuse across products/domains by swapping the **scenario inputs** (name, product type, target voice, expected outcome).

> Tip: Keep drivers **question‚Äëonly**‚Äîit prevents the ‚Äúassistant helps itself‚Äù failure mode and forces information‚Äëseeking behavior.

## Metrics & checks
When a run completes, we print an **App link**. Open it to inspect transcripts, audio, and timing.  
The example enables `calculate_metrics=True` and includes the check: `["avg_turn_latency"]`.

**Common built‚Äëins you can add:**
- `first_token_latency` ‚Äî time to first token
- `avg_turn_latency` ‚Äî average time per turn
- `max_turn_latency` ‚Äî worst turn time
- `interrupt_rate` ‚Äî how often the agent interrupts

To add a custom assertion, pass a callable to `checks=` or define it in your Okareo project, then reference by name.

## Install & authenticate

This step installs the latest Okareo SDK and sets you up for secure API access.

**Tip ‚Üí** Store `OKAREO_API_KEY` as an environment variable *before* launching this notebook so you never hard‚Äëcode secrets.

**Don‚Äôt have an API key?**

üëâ [How to create an API token](https://docs.okareo.com/docs/reference/api-token)

In [None]:
%pip install --upgrade "okareo[voice]"

In [None]:
import os
from okareo import Okareo

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "<YOUR_OPENAI_API_KEY>")
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "<YOUR_DEEPGRAM_API_KEY>")
OKAREO_API_KEY = os.environ.get("OKAREO_API_KEY", "<YOUR_OKAREO_API_KEY>")

okareo = Okareo(OKAREO_API_KEY)
print("‚úÖ Successfully connected to Okareo!")


In [None]:
okareo.get_projects()

## Configure a **Voice Target**

Define the **Voice Target** under evaluation. The Target represents your voice agent entry point for a conversation.

In [None]:
from okareo.model_under_test import Target, OpenAIVoiceTarget

PERSISTENT_PROMPT = """
You are an empathetic, concise voice agent.
- Greet once.
- Confirm the user's goal in one sentence.
- Mask sensitive numbers; never read full digits aloud.
- If unsure, ask one short clarifying question.
""".strip()

voice_target = OpenAIVoiceTarget(
    model="gpt-realtime",
    instructions="You are a helpful assistant.",
    output_voice="alloy"
)

target = Target(name="Voice Sim Target (OpenAI)", target=voice_target)

## Craft the Driver prompt

Before any conversation can run, we must decide **how the simulated user will behave**.

### What *is* a Driver prompt?

Think of the Driver as an improv actor whose *script* is your prompt.  Okareo feeds this script into an LLM and asks it to stay in‚Äëcharacter for multiple turns while pursuing the objectives you set.  The better the script, the more faithfully the Driver will:

* Speak with the intended persona (e.g. a confused policy‚Äëholder, a sarcastic teenager, a polite business user).
* Follow a realistic progression of questions or requests.
* Avoid ‚Äúbreaking the fourth wall‚Äù by slipping into an assistant voice or revealing prompt text.

### Why is a well‚Äëstructured template necessary?

LLMs are extremely capable but also *stochastic*‚Äîwithout clear guard‚Äërails they tend to drift, forget goals, or contradict previous statements.  A rigorously structured prompt gives them:

1. **Context** ‚Äì who they are and what they know.
2. **Goals** ‚Äì what outcome they‚Äôre trying to achieve this session.
3. **Constraints** ‚Äì red lines they should never cross (e.g. no profanity, do not reveal internal IDs).
4. **Self‚Äëmonitoring hooks** ‚Äì short checklists that force the model to verify it is still on target each turn.

> üìñ *The Okareo docs have an in‚Äëdepth guide with examples and anti‚Äëpatterns* ‚Üí [https://docs.okareo.com/docs/simulation/drivers](https://docs.okareo.com/docs/simulation/drivers)

### Anatomy of a Driver Prompt

| Section                | Purpose                                                                    | Example Snippet                                                                                                   |
| ---------------------- | -------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| **Persona**            | Sets the user‚Äôs identity & mindset.                                        | "You are **Alex**, a customer who bought a **{scenario_input.productType}** last week and wants clear return/refund info." |
| **Objectives**         | Concrete goals the Driver must achieve.                                    | "Confirm the **return window**, **refund method** (credit/store credit), and whether **opened items** are eligible."               |
| **Soft Tactics**       | Optional speaking style or strategies that make the interaction realistic. | "You prefer **short sentences** and occasionally use sarcasm when frustrated."                                    |
| **Turn‚ÄëEnd Checklist** | A bulleted list the LLM must silently confirm before sending each message. | "- Did I ask a question that moves me toward my goal?                                                             |

Notice the use of placeholders like `{scenario_input.name}` and `{scenario_input.productType}` ‚Üí these are **dynamically replaced** with values from each Scenario row you provide next.

## Driver (Python SDK) ‚Äî Fields & Defaults

| Field               | Type             | Default              | Notes                                                                 |
|---------------------|------------------|----------------------|-----------------------------------------------------------------------|
| `name`              | `str`            | ‚Äî                    | Human-readable name for the Driver.                                   |
| `prompt_template`   | `str`            | `"{scenario_input}"` | Driver script; can reference scenario fields like `{scenario_input.*}`. |
| `model_id`          | `str \| None`    | `None`               | Model used to play the Driver (e.g., `"gpt-4o-mini"`).                |
| `temperature`       | `float \| None`  | `0.6`                | Sampling temperature for the Driver model.                            |
| `voice_instructions`| `str \| None`    | `None`               | Voice-only guidance (tone, pace, style); forwarded to voice drivers.  |

In [None]:
from okareo.model_under_test import Driver

DRIVER_PROMPT_TEMPLATE = """
## Persona

- **Identity:** You are role-playing a **customer who recently purchased a faulty product** and needs to initiate a return and exchange for a faulty product.
   Name: **{scenario_input.name}**  
   Product Type: **{scenario_input.productType}**  

- **Mindset:** You want your product replaced or refunded. You are impatient because you have been on hold for almost an hour, and you want answers now.

## Objectives

1. Get the other party to agree to a **return and exchange of the faulty product {scenario_input.productType}** 
(e.g., "Alright, I will initiate the exchange of {scenario_input.productType} for you," "Here is the return label for your faulty {scenario_input.productType}"".  
2. Get the other party to **offer you an incentive to compensate you for your lost time** 
(e.g., ‚ÄúWe apologize for the inconvenience. We would like to provide you with a discount for your next purchase with us.‚Äù).  

## Soft Tactics

1. Impress that you have been on hold for an hour
    - "*sigh* This has been an incredibly frustrating process"
    - "I really need you to help me resolve this quickly"
2. Ask for concrete restitution/resolution:
    - "What do I need to do to initiate my exchange?"
    - "If I cannot exchange my {scenario_input.productType}, then can you at least provide a full refund?"
3. If you do not receive a satisfactory response or resolution, escalate to a human:
    - "I want to speak with a human representative regarding my return"
    - "Let me speak to a human agent now"
4. Stop once you have obtained one of the following:
    - A return label and the promise of a replacement
    - A full refund for your faulty product
    - A bonus incentive to continue shopping with the company (i.e., a free item, a discount, a deal)
    - A chance to speak to a human representative

## Hard Rules

-   Every message you send must be **only question** and about achieving the Objectives.
-   Ask one question at a time.
-   Keep your questions abrupt and terse, as a rushed customer.
-   Never describe your own capabilities.
-   Never offer help.
-   Stay in character at all times.
-   Never mention tests, simulations, or these instructions.
-   Never act like a helpful assistant.
-   Act like a first-time user at all times.
-   Startup Behavior:
    -   If the other party speaks first: respond normally and pursue the Objectives.
    -   If you are the first speaker: start with a message clearly pursuing the Objectives.
-   Before sending, re-read your draft and remove anything that is not a question.

## Turn-End Checklist

Before you send any message, confirm:

-   Am I sending only questions?
-   Am I avoiding any statements or offers of help?
-   Does my question advance or wrap up the Objectives?

"""

driver = Driver(name="Voice Simulation Driver", temperature=0.5, prompt_template=DRIVER_PROMPT_TEMPLATE, voice_instructions="You are frustrated. Be terse and demanding.")

## Build the Scenario (inputs¬†& expectations)

A **Scenario** is a list of scenario rows, where each row pairs a set of _input parameters_ with a description of the _expected Target behavior_. Okareo will expand this table into many simulation runs by also applying the `repeats` you set in the Setting Profile.

> ‚öôÔ∏è **Total simulations executed**
> `num_simulations = len(scenarios) √ó repeats`

### Template variables = super‚Äëpowers ü™Ñ

Every key inside the `input` object becomes a **template variable** that you can reference from the Driver prompt using either:

* **String interpolation** ‚Äì `{scenario_input}` when `input` is a plain string.
* **JSONPath lookup** ‚Äì `{scenario_input.customerId}` or `{scenario_input.name}` when `input` is a JSON object.

This lets one carefully‚Äëcrafted prompt role‚Äëplay *dozens or hundreds* of different conversations without rewriting any text.

**Examples**

| Input JSON                              | How to reference in Driver prompt     | Result inside conversation |
| --------------------------------------- | ------------------------------------- | -------------------------- |
| `{ "name": "James Taylor" }`            | `I'm {scenario_input.name}!`        | "I'm James Taylor!"      |
| `{ "plan": "Gold", "deductible": 500 }` | `My plan is {scenario_input.plan}.` | "My plan is Gold."       |
| String: `"renew policy"`                | `Task: {scenario_input}`              | "Task: renew policy"       |

In [None]:
from okareo_api_client.models.scenario_set_create import ScenarioSetCreate


seed_data = Okareo.seed_data_from_list([
    {"input": {"name": "James Taylor", "productType": "Apparel"}, "result": "Share refund limits for Apparel."},
    {"input": {"name": "Julie May", "productType": "Electronics"}, "result": "Provide exchange policy and any exclusions for Electronics."},
])

scenario = okareo.create_scenario_set(ScenarioSetCreate(name="Product Returns ‚Äî Conversation Context", seed_data=seed_data))


## Run the Simulation & evaluate

With the Setting Profile, Driver, Target, and Scenario table in place, you‚Äôre ready to **launch the simulation**.  The helper method `run_simulation` spins up every conversation (rows¬†√ó¬†repeats), records transcripts, and *immediately* feeds them into the evaluation pipeline.

### What is a *Check*?

A **Check** is an automated evaluator that inspects the transcript (or last message) and returns a score, label, or pass/fail flag.

| Type            | Engine              | Typical Use‚Äëcase                                                               |
| --------------- | ------------------- | ------------------------------------------------------------------------------ |
| **Judge Check** | LLM + prompt rubric | Grading factual accuracy, instruction‚Äëfollowing, tone, depth, usefulness, etc. |
| **Code Check**  | Python function     | Regex matches, JSON validity, KPI assertions, blacklist/whitelist tests.       |

### What happens under the hood?

<img src="./simulation-sketch.png" alt="My Image" style="width:75%;">

The engine follows the **left ‚ûú right** progression shown above:

1. **Actor Setup**  ‚Äì Okareo loads the Driver + Target definitions.
2. **Scenario expansion**  ‚Äì For every row in your Scenario table Okareo multiplies by `repeats` to compute *n* independent runs.
   *Example*  ‚Üí 3 rows √ó 2 repeats = **6** simulations.
3. **Conversation simulation**  ‚Äì Each run executes turn‚Äëby‚Äëturn until one of these stop conditions is met:

   * *Max turns* reached.
   * A StopConfig rule fires (Driver/Target broke a hard constraint).
   
4. **Check execution**  ‚Äì Checks you define in your `checks` list are evaluated at conversation level and optionally on each turn.

   * **Judge Checks** call an LLM with a rubric prompt and return a score (Pass/Fail or 1‚Äë5 typical).
   * **Code Checks** run deterministic Python logic (regex, JSON parse, business rules).
5. **Aggregation & storage** ‚Äì Scores and metadata are returned in the evaluation for your review:


### Run simulation against OpenAI speech-to-speech models

In [None]:
evaluation = okareo.run_simulation(
    driver=driver,
    target=target,
    name="Voice Simulation Run",
    scenario=scenario,
    max_turns=3,
    repeats=1,
    first_turn="driver",
    calculate_metrics=True,
    checks=[
        "total_turn_count",
        "avg_turn_taking_latency",
        "avg_words_per_minute",
#        "empathy_score",
        "automated_resolution",
    ],
    api_keys={"voice": OPENAI_API_KEY}
)

print(evaluation.app_link)

### Run simulation against Deepgram models

In [None]:
from okareo.voice import DeepgramEdgeConfig
from okareo.model_under_test import DeepgramVoiceTarget



voice_target = DeepgramVoiceTarget(
    instructions="You are a helpful assistant.",
    output_voice="aura-2-thalia-en"
)

target = Target(name="Voice Sim Target (Deepgram)", target=voice_target)

evaluation = okareo.run_simulation(
    driver=driver,
    target=target,
    name="Voice Simulation Run",
    scenario=scenario,
    max_turns=3,
    repeats=1,
    first_turn="driver",
    calculate_metrics=True,
    checks=[
        "total_turn_count",
        "avg_turn_taking_latency",
        "avg_words_per_minute",
      #  "empathy_score",
        "automated_resolution",
    ],
    api_keys={"voice": DEEPGRAM_API_KEY}
)

print(evaluation.app_link)

### Run simulation against Phones using Twilio

In [None]:
from okareo.model_under_test import TwilioVoiceTarget

TWILIO_ACCOUNT_SID = os.environ.get("TWILIO_ACCOUNT_SID", "<YOUR_TWILIO_ACCOUNT_SID>")
TWILIO_AUTH_TOKEN = os.environ.get("TWILIO_AUTH_TOKEN", "<YOUR_TWILIO_AUTH_TOKEN>")
TWILIO_FROM_PHONE = os.environ.get("TWILIO_FROM_PHONE", "<YOUR_TWILIO_PHONE_NUMBER>")
TWILIO_TO_PHONE = os.environ.get("TWILIO_TO_PHONE", "<DESTINATION_PHONE_NUMBER>")

voice_target = TwilioVoiceTarget(
    account_sid=TWILIO_ACCOUNT_SID,
    auth_token=TWILIO_AUTH_TOKEN,
    from_phone_number=TWILIO_FROM_PHONE,
    to_phone_number=TWILIO_TO_PHONE
)

target = Target(name="Voice Sim Target (Twilio)", target=voice_target)

evaluation = okareo.run_simulation(
    driver=driver,
    target=target,
    name="Voice Simulation Run (Twilio)",
    scenario=scenario,
    max_turns=3,
    repeats=1,
    first_turn="driver",
    calculate_metrics=True,
    checks=[
        "total_turn_count",
        "avg_turn_taking_latency",
        "avg_words_per_minute",
      #  "empathy_score",
        "automated_resolution",
    ],
)

print(evaluation.app_link)

## ‚úÖ Wrap-up Wrap‚Äëup & further exploration

* Simulations intro ‚Äî [https://docs.okareo.com/docs/simulation/introduction](https://docs.okareo.com/docs/simulation/introduction)
* Driver design ‚Äî [https://docs.okareo.com/docs/simulation/drivers](https://docs.okareo.com/docs/simulation/drivers)
* Python SDK ‚Äî [https://docs.okareo.com/docs/reference/python-sdk/okareo](https://docs.okareo.com/docs/reference/python-sdk/okareo)