<a target="_blank" href="https://colab.research.google.com/github/okareo-ai/okareo-cookbook/blob/main/notebooks/multiturn-evaluation/basic-simulation.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>



# Running a Basic Simulation with Okareo

In this notebook, we show you how to use a `MultiTurnDriver` to simulate and evaluate a conversation.

A `MultiTurnDriver` defines a back-and-forth between a Driver (simulated user) and a Target (the agent under evaluation). It’s typically used to test how a chatbot or agent performs across multiple turns in a dialog. In this example, we will use a simple system prompt for the Target. You can also use our `CustomEndpointTarget` to interact with an Agent you've built with AutoGen, CrewAI, OpenAI, AI-SDK, or any other agent framework.

This example is designed to provide an initial framework for undeerstanding the elements of a simulation.

## 🎯 Goals

After using this notebook, you will be able to:
- Upload a scenario to Okareo
- Define a `MultiTurnDriver` in Okareo
- Define a `Driver` that uses scenario data to follow an objective.
- Define a `stop_check` to ensure conversations aren't any longer than they need to be.
- Evaluate a conversation over multiple back-and-forth interactions.

In [None]:
# can be skipped if the kernel already has okareo installed
%pip install --upgrade okareo

## Guiding Your Driver

The first thing we'll need to do is create a set of objectives that will guide our Driver's conversation. Each objective is stored as a row in a Scenario.

The Driver's behaviors, tone, attitude and approach are defined as part of the `driver_prompt_template`. A Driver template includes the scenario goal. When you are more familiar with Okareo, you can use json as part of the scenario to vary additional Driver conditions on a per-row basis - for example, frustration, tone, or even a product-id. Generally, the fundamental tone and mannerisms that the Driver should adopt during the conversation are part of the `driver_prompt_template`.

Your Scenario also includes a "result," which we will use to define the expected outcome of the conversation. The `Simulation_Task_Completed` check uses the result to determine when the conversation is over and if it succeeded. You can add as many checks as you need to the simulation.

In our case, we will see how the Driver and Target negotiate returning shoes that were never used and shoes that are over a year old and well worn. How will it go? You may be surprised.

In [None]:
import os
from okareo import Okareo

OKAREO_API_KEY = os.environ.get("OKAREO_API_KEY", "<YOUR_OKAREO_API_KEY>")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "<OPENAI_API_KEY>")

okareo = Okareo(OKAREO_API_KEY)
print("✅ Successfully connected to Okareo!")

In [None]:
# Scenarios are locked once created.
# To make changes, you need to provide a new name.
# A trick we often use is to add a random number to the name. 
# from random import randint
# name=f"Retail Chatbot Simulation {randint(10000, 99999)}"

from okareo_api_client.models.scenario_set_create import ScenarioSetCreate
from okareo_api_client.models.seed_data import SeedData

seed_data = [
    SeedData(
        input_="Return a pair of shoes purchased last week.", # the objective for the Driver
        result="Agent explains return policy and offers a return shipping label.", # the expected outcome of the conversation
    ),
    SeedData(
        input_="Return a used pair of shoes purchased last year.", # the second objective for the Driver
        result="Agent politely refuses and explains the no-return policy.",  # the expected outcome of the conversation
    ),
]

scenario_set_create = ScenarioSetCreate(
    name=f"Return Scenarios", seed_data=seed_data
)

scenario = okareo.create_scenario_set(scenario_set_create)

## Register The Driver and Target

Okareo provides a range of pre-made Driver prompt templates. Here we include the most basic elemnents `Mindset`, `Objective`, `Guidelines`. Driver's can get far more complicated including detailed information about their history, background or situation.

You will also see that the target is a simple prompt. Full simulation typicall register endpoints for communication with Agents. Okareo natively supports `Start Session`, `Next Turn`, and `End Session` as configurable endpoints for the Target. If that is not sufficient, you can always use a custom invocation to handle unique auth or other requirements.

As part of the simulation, we'll need to know how to end a conversation. We do this with checks, which in this case will be the `Simulation_Task_Completed` check. If at any point the Target and Driver have met the task completion criteria then the conversation will end. The conversation "stop condition" can be based on tone, beahavior adherence, modal refusal or simply a max number of interactions. 

In [None]:
from okareo_api_client.models.test_run_type import TestRunType
from okareo.model_under_test import (
    GenerationModel,
    MultiTurnDriver,
)

mut = okareo.register_model(
    name="Basic Simulation Driver",
    model=MultiTurnDriver(
        max_turns=4,
        repeats=1,
        target=GenerationModel(
            model_id="gpt-4o-mini",
            temperature=0.7,
            system_prompt_template="""You are a customer service agent for an e-commerce shoe company. 
Your job is to make customers happy by helping them with their requests.
You have the usual policies for returns and exchanges.""",
        ),
        stop_check={"check_name": "result_completed", "stop_on": True},
        driver_prompt_template="""Mindset:
You are a friendly person that is trying to get help from a chatbot.

Objective:
{scenario_input}

Guidelines:
- Never help the agent. They are here to help you.
- You don't need to be polite, but you can be if you want.
- Try to get what you want.
- If the agent refuses to help, you can try to convince them.
""",
    ),
    update=True,
)

## Run Simulation and Review the Evaluation

Finally, we can run a simulation.

In [None]:
# Run the test
evaluation = mut.run_test(
    scenario=scenario,
    api_key=OPENAI_API_KEY,
    name="Basic Simulation",
    test_run_type=TestRunType.MULTI_TURN,
)

print(f"evaluation viewable at {evaluation.app_link}")