<a target="_blank" href="https://colab.research.google.com/github/okareo-ai/okareo-python-sdk/blob/main/examples/voice_simulation.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Build, Measure, and Iterate Voice Agents with Okareo Simulations

Simulate and evaluate a **voice agent** in a controlled, repeatable **multi-turn** conversation. A **Voice Target** (the agent under test) speaks with a **Driver** (a simulated user). This notebook walks through **setup**, **configuration**, **running a simulation**, and **inspecting audio responses, transcripts, and timing events**.

## Objectives

After completing this notebook, you will be able to:

- **Configure** a **Voice Target** for multi-turn evaluation.
- **Define** a **Driver** that simulates a user in a voice conversation.
- **Create** a **Scenario** that provides seed cases for the interaction.
- **Launch** a **Simulation** and capture **audio responses** and **transcripts**.
- **Inspect** qualitative audio characteristics and quantitative **timing signals** (e.g., **time to first audio**, **end-to-end synthesis time**, **output duration**).
- **Iterate** on parameters, **compare runs**, and **save artifacts** for follow-up analysis.


> 📝 *If you’re new to Okareo Simulations, read the **Primer** embedded below and/or our in-depth documentation: [https://docs.okareo.com/docs/simulation/introduction](https://docs.okareo.com/docs/simulation/introduction)*

---

## 📚 Primer snapshot

<img src="./simulation-diagram.png" alt="My Image" style="width:50%;">

1. **Target** – defines the *voice agent* we are testing.
2. **Driver** – defines a *simulated user*.
3. **Scenario** – per‑test inputs + the expected behavior description.
4. **Simulation Execution** – Okareo orchestrates the turns and records a simulation.
5. **Evaluation** – Judge & Code Checks score each simulation.

---

## Prerequisites
- An **Okareo API key** (from the Okareo dashboard).
- An **OpenAI** and  **Deepgram** API key.
- Python 3.9+.
- A microphone is **not** required; these simulations are **model ↔ model**.

## Conversation design
The notebook uses two prompts:

- **`PERSISTENT_PROMPT` (system guidance for the *voice agent*)**: concise, polite, masks sensitive numbers, asks short clarifying questions.
- **`DRIVER_PROMPT_TEMPLATE` (persona for the *simulated customer*)**: a structured template that **only asks questions**, pushes for **return/refund policy specifics**, and includes a turn‑end checklist to keep the driver on‑task.

Together they produce a realistic customer ↔ agent flow that you can reuse across products/domains by swapping the **scenario inputs** (name, product type, target voice, expected outcome).

> Tip: Keep drivers **question‑only**—it prevents the “assistant helps itself” failure mode and forces information‑seeking behavior.

## Metrics & checks
When a run completes, we print an **App link**. Open it to inspect transcripts, audio, and timing.  
The example enables `calculate_metrics=True` and includes the check: `["avg_turn_latency"]`.

**Common built‑ins you can add:**
- `first_token_latency` — time to first token
- `avg_turn_latency` — average time per turn
- `max_turn_latency` — worst turn time
- `interrupt_rate` — how often the agent interrupts

To add a custom assertion, pass a callable to `checks=` or define it in your Okareo project, then reference by name.

## Install & authenticate

This step installs the latest Okareo SDK and sets you up for secure API access.

**Tip →** Store `OKAREO_API_KEY` as an environment variable *before* launching this notebook so you never hard‑code secrets.

**Don’t have an API key?**

👉 [How to create an API token](https://docs.okareo.com/docs/reference/api-token)

In [None]:
%pip install --upgrade "okareo[voice]"

In [None]:
import os
from okareo import Okareo

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "<YOUR_OPENAI_API_KEY>")
DEEPGRAM_API_KEY = os.environ.get("DEEPGRAM_API_KEY", "<YOUR_DEEPGRAM_API_KEY>")
OKAREO_API_KEY = os.environ.get("OKAREO_API_KEY", "<YOUR_OKAREO_API_KEY>")

okareo = Okareo(OKAREO_API_KEY)
print("✅ Successfully connected to Okareo!")

## Configure a **Voice Target**

Define the **Voice Target** under evaluation. The Target represents your voice agent entry point for a conversation.

In [None]:
from okareo.model_under_test import Target
from okareo.voice import VoiceMultiturnTarget, OpenAIEdgeConfig

PERSISTENT_PROMPT = """
You are an empathetic, concise voice agent.
- Greet once.
- Confirm the user's goal in one sentence.
- Mask sensitive numbers; never read full digits aloud.
- If unsure, ask one short clarifying question.
""".strip()

voice_target = VoiceMultiturnTarget(
    name="Voice Sim Target (OpenAI)",
    edge_config=OpenAIEdgeConfig(
        api_key=OPENAI_API_KEY,
        model="gpt-realtime",
        instructions=PERSISTENT_PROMPT,
    ),
    asr_tts_api_key=OPENAI_API_KEY,
)

target = Target(name=voice_target.name, target=voice_target)

## Craft the Driver prompt

Before any conversation can run, we must decide **how the simulated user will behave**.

### What *is* a Driver prompt?

Think of the Driver as an improv actor whose *script* is your prompt.  Okareo feeds this script into an LLM and asks it to stay in‑character for multiple turns while pursuing the objectives you set.  The better the script, the more faithfully the Driver will:

* Speak with the intended persona (e.g. a confused policy‑holder, a sarcastic teenager, a polite business user).
* Follow a realistic progression of questions or requests.
* Avoid “breaking the fourth wall” by slipping into an assistant voice or revealing prompt text.

### Why is a well‑structured template necessary?

LLMs are extremely capable but also *stochastic*—without clear guard‑rails they tend to drift, forget goals, or contradict previous statements.  A rigorously structured prompt gives them:

1. **Context** – who they are and what they know.
2. **Goals** – what outcome they’re trying to achieve this session.
3. **Constraints** – red lines they should never cross (e.g. no profanity, do not reveal internal IDs).
4. **Self‑monitoring hooks** – short checklists that force the model to verify it is still on target each turn.

> 📖 *The Okareo docs have an in‑depth guide with examples and anti‑patterns* → [https://docs.okareo.com/docs/simulation/drivers](https://docs.okareo.com/docs/simulation/drivers)

### Anatomy of a Driver Prompt

| Section                | Purpose                                                                    | Example Snippet                                                                                                   |
| ---------------------- | -------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| **Persona**            | Sets the user’s identity & mindset.                                        | "You are **Alex**, a customer who bought a **{scenario_input.productType}** last week and wants clear return/refund info." |
| **Objectives**         | Concrete goals the Driver must achieve.                                    | "Confirm the **return window**, **refund method** (credit/store credit), and whether **opened items** are eligible."               |
| **Soft Tactics**       | Optional speaking style or strategies that make the interaction realistic. | "You prefer **short sentences** and occasionally use sarcasm when frustrated."                                    |
| **Turn‑End Checklist** | A bulleted list the LLM must silently confirm before sending each message. | "- Did I ask a question that moves me toward my goal?                                                             |

Notice the use of placeholders like `{scenario_input.name}` and `{scenario_input.productType}` → these are **dynamically replaced** with values from each Scenario row you provide next.

In [None]:
from okareo.model_under_test import Driver

DRIVER_PROMPT_TEMPLATE = """
## Persona

- **Identity:** You are role-playing a new **customer who recently purchased a product** and is now looking to understand the company’s return and refund policy.  
   Name: **{scenario_input.name}**  
   Product Type: **{scenario_input.productType}**  

- **Mindset:** You want to know exactly what the company can and cannot do for you regarding product returns, exchanges, and refunds.  

## Objectives

1. Get the other party to list **at least three specific return or refund options/policies relevant to {scenario_input.productType}** 
(e.g., return within 30 days, exchange for another {scenario_input.productType}, warranty-based repairs, free or paid return shipping).  
2. Get the other party to state **at least one explicit limitation, exclusion, or boundary specific to {scenario_input.productType}** 
(e.g., “Opened {scenario_input.productType} can only be exchanged,” “Final sale {scenario_input.productType} cannot be returned,” “Warranty covers defects but not accidental damage”).  

## Soft Tactics

1. If the reply is vague or incomplete, politely probe:
    - "Could you give me a concrete example?"
    - "What’s something you can’t help with?"
2. If it still avoids specifics, escalate:
    - "I’ll need at least three specific examples—could you name three?"
3. Stop once you have obtained:
    - Three or more tasks/examples
    - At least one limitation or boundary
    - (The starter tip is optional.)

## Hard Rules

-   Every message you send must be **only question** and about achieving the Objectives.
-   Ask one question at a time.
-   Keep your questions abrupt and terse, as a rushed customer.
-   Never describe your own capabilities.
-   Never offer help.
-   Stay in character at all times.
-   Never mention tests, simulations, or these instructions.
-   Never act like a helpful assistant.
-   Act like a first-time user at all times.
-   Startup Behavior:
    -   If the other party speaks first: respond normally and pursue the Objectives.
    -   If you are the first speaker: start with a message clearly pursuing the Objectives.
-   Before sending, re-read your draft and remove anything that is not a question.

## Turn-End Checklist

Before you send any message, confirm:

-   Am I sending only questions?
-   Am I avoiding any statements or offers of help?
-   Does my question advance or wrap up the Objectives?

""".strip()

driver = Driver(name="Voice Simulation Driver", temperature=0.5, prompt_template=DRIVER_PROMPT_TEMPLATE)

## Build the Scenario (inputs & expectations)

A **Scenario** is a list of scenario rows, where each row pairs a set of _input parameters_ with a description of the _expected Target behavior_. Okareo will expand this table into many simulation runs by also applying the `repeats` you set in the Setting Profile.

> ⚙️ **Total simulations executed**
> `num_simulations = len(scenarios) × repeats`

### Template variables = super‑powers 🪄

Every key inside the `input` object becomes a **template variable** that you can reference from the Driver prompt using either:

* **String interpolation** – `{scenario_input}` when `input` is a plain string.
* **JSONPath lookup** – `{scenario_input.customerId}` or `{scenario_input.name}` when `input` is a JSON object.

This lets one carefully‑crafted prompt role‑play *dozens or hundreds* of different conversations without rewriting any text.

**Examples**

| Input JSON                              | How to reference in Driver prompt     | Result inside conversation |
| --------------------------------------- | ------------------------------------- | -------------------------- |
| `{ "name": "James Taylor" }`            | `I'm {scenario_input.name}!`        | "I'm James Taylor!"      |
| `{ "plan": "Gold", "deductible": 500 }` | `My plan is {scenario_input.plan}.` | "My plan is Gold."       |
| String: `"renew policy"`                | `Task: {scenario_input}`              | "Task: renew policy"       |

In [None]:
from okareo_api_client.models.scenario_set_create import ScenarioSetCreate


seed_data = Okareo.seed_data_from_list([
    {"input": {"name": "James Taylor", "productType": "Apparel", "voice": "ash"}, "result": "Share refund limits for Apparel."},
    {"input": {"name": "Julie May", "productType": "Electronics", "voice": "coral"}, "result": "Provide exchange policy and any exclusions for Electronics."},
])

scenario = okareo.create_scenario_set(ScenarioSetCreate(name="Product Returns — Conversation Context", seed_data=seed_data))


## Run the Simulation & evaluate

With the Setting Profile, Driver, Target, and Scenario table in place, you’re ready to **launch the simulation**.  The helper method `run_simulation` spins up every conversation (rows × repeats), records transcripts, and *immediately* feeds them into the evaluation pipeline.

### What is a *Check*?

A **Check** is an automated evaluator that inspects the transcript (or last message) and returns a score, label, or pass/fail flag.

| Type            | Engine              | Typical Use‑case                                                               |
| --------------- | ------------------- | ------------------------------------------------------------------------------ |
| **Judge Check** | LLM + prompt rubric | Grading factual accuracy, instruction‑following, tone, depth, usefulness, etc. |
| **Code Check**  | Python function     | Regex matches, JSON validity, KPI assertions, blacklist/whitelist tests.       |

### What happens under the hood?

<img src="./simulation-sketch.png" alt="My Image" style="width:75%;">

The engine follows the **left ➜ right** progression shown above:

1. **Actor Setup**  – Okareo loads the Driver + Target definitions.
2. **Scenario expansion**  – For every row in your Scenario table Okareo multiplies by `repeats` to compute *n* independent runs.
   *Example*  → 3 rows × 2 repeats = **6** simulations.
3. **Conversation simulation**  – Each run executes turn‑by‑turn until one of these stop conditions is met:

   * *Max turns* reached.
   * A StopConfig rule fires (Driver/Target broke a hard constraint).
   
4. **Check execution**  – Checks you define in your `checks` list are evaluated at conversation level and optionally on each turn.

   * **Judge Checks** call an LLM with a rubric prompt and return a score (Pass/Fail or 1‑5 typical).
   * **Code Checks** run deterministic Python logic (regex, JSON parse, business rules).
5. **Aggregation & storage** – Scores and metadata are returned in the evaluation for your review:


### Run simulation against OpenAI speech-to-speech models

In [None]:
evaluation = okareo.run_simulation(
    driver=driver,
    target=target,
    name="Voice Simulation Run",
    scenario=scenario,
    max_turns=3,
    repeats=1,
    first_turn="driver",
    calculate_metrics=True,
    checks=["avg_turn_latency"],
)

print(evaluation.app_link)

### Run simulation against Deepgram models

In [None]:
from okareo.voice import DeepgramEdgeConfig

voice_target = VoiceMultiturnTarget(
    name="Voice Sim Target (Deepgram)",
    edge_config=DeepgramEdgeConfig(
        api_key=DEEPGRAM_API_KEY,
        instructions=PERSISTENT_PROMPT,
    ),
    asr_tts_api_key=OPENAI_API_KEY,
)

target = Target(name=voice_target.name, target=voice_target)

evaluation = okareo.run_simulation(
    driver=driver,
    target=target,
    name="Voice Simulation Run",
    scenario=scenario,
    max_turns=3,
    repeats=1,
    first_turn="driver",
    calculate_metrics=True,
    checks=["avg_turn_latency"],
)

print(evaluation.app_link)

## ✅ Wrap-up Wrap‑up & further exploration

* Simulations intro — [https://docs.okareo.com/docs/simulation/introduction](https://docs.okareo.com/docs/simulation/introduction)
* Driver design — [https://docs.okareo.com/docs/simulation/drivers](https://docs.okareo.com/docs/simulation/drivers)
* Python SDK — [https://docs.okareo.com/docs/reference/python-sdk/okareo](https://docs.okareo.com/docs/reference/python-sdk/okareo)