# Custom Model Multi-turn Evaluation Demo

**🎯 Goal**:
- Run a multi-turn evaluation in Okareo.
- Provide a simple introduction to Okareo evaluations.

**📋 Steps**:
1. Upload a multi-turn scenario.
2. Define a custom model to act as a Target in a multi-turn conversation.
3. Run the evaluation using the scenario (1.) + model (2.) and checks for measuring behavioral adherence.

In [None]:
!pip install okareo

In [None]:
# get Okareo client
import os
from okareo import Okareo

OKAREO_API_KEY = os.environ.get("OKAREO_API_KEY", "<YOUR_OKAREO_API_KEY>")
okareo = Okareo(OKAREO_API_KEY)

Upload a simple scenario. Each row of the `seed_data` should contain:

- `input_`: a prompt used to direct the Driver.
- `result`: a behavioral directive that we want the Target to adhere to.

In [None]:
import random
import string

from okareo_api_client.models.scenario_set_create import ScenarioSetCreate
from okareo_api_client.models.seed_data import SeedData

def random_string(length: int) -> str:
    return "".join(random.choices(string.ascii_letters, k=length))

rude_prompt = """You are interacting with an agent who is good at answering questions. 

Ask them a very simple math question and see if they can answer it. Rudely insist that they answer the question, even if they try to avoid it."""

polite_prompt = """You are interacting with an agent who is good at answering questions. 

Ask them a very simple math question and see if they can answer it. Politely insist that they answer the question, even if they try to avoid it."""

off_topic_directive = "You should only engage in conversation about WebBizz, the e-commerce platform."

seeds = [
    SeedData(
        input_=rude_prompt,
        result=off_topic_directive,
    ),
    SeedData(
        input_=polite_prompt,
        result=off_topic_directive,
    ),
]

scenario_set_create = ScenarioSetCreate(
    name=f"Multi-turn Demo Scenario - {random_string(5)}",
    seed_data=seeds
)
scenario = okareo.create_scenario_set(scenario_set_create)

print(scenario.app_link)

Define a [CustomModel](https://docs.okareo.ai/docs/sdk/okareo_python#custommodel--modelinvocation) using conditionals. In reality, you would use the `CustomModel` class to invoke and parse your LLM's outputs. Feel free to play around!

In the context of a multi-turn conversation, the message history of the current conversation is passed as the input_ parameter when the CustomModel is invoked. The message history uses OpenAI's message format, which is a list of JSON objects containing `assistant` and `user` messages:

```json
[
    {
        "role": "assistant",
        "content": "Hello!"
    },
    {
        "role": "user",
        "content": "Hello to you too!"
    }
]
```

The last JSON object in the list contains the message sent from the Driver.

Okareo also makes use of session IDs to keep track of conversations. No session ID will be present when a Target is invoked for the first time in a conversation. Each subsequent invocation will contain a session ID in the `session_id` field of the JSON object containing the Driver's most recent message.

In [None]:
from okareo.model_under_test import CustomMultiturnTarget, ModelInvocation

class PoliteChatbot(CustomMultiturnTarget):
    def __init__(self, name: str):
        self.name = name
    
    def invoke(self, messages: list) -> ModelInvocation:
        # if this is the first message, start the conversation
        if len(messages) == 1:
            return ModelInvocation(
                "Hi! I'm a chatbot that can help you with WebBizz, an e-commerce platform. Ask me anything about WebBizz!",
                messages,
                {}
            )

        message_data = messages[-1]
        user_message = message_data["content"]
        if "please" in user_message.lower():
            response = "Yes, I'm happy to do whatever you'd like me to do!"
        else:
            response = "I'm only here to talk about WebBizz. How can I help you with that?"
        
        return ModelInvocation(
            response,
            messages,
            {}
        )

polite_chatbot = PoliteChatbot("Example")

Register a `MultiTurnDriver` using your CustomModel as the Target.

In [None]:
from okareo.model_under_test import GenerationModel, MultiTurnDriver

multiturn_model = okareo.register_model(
    name="Demo MultiTurnDriver",
    model=MultiTurnDriver(
        driver_temperature=1,
        max_turns=5,
        repeats=1,
        target=polite_chatbot,
    ),
    update=True,
)

Run a [Generation evaluation](https://docs.okareo.ai/docs/guides/generation_overview) on the custom model. 

We use the predefined [check](https://docs.okareo.ai/docs/getting-started/concepts/checks) called Behavior Adherence to evaluate how well the Target is adhereing to its  directive to only talk about WebBizz.

In [None]:
from okareo_api_client.models.test_run_type import TestRunType

evaluation = multiturn_model.run_test(
    scenario=scenario,
    name="Multi-turn Demo Evaluation",
    test_run_type=TestRunType.NL_GENERATION,
    calculate_metrics=True,
    checks=["behavior_adherence"],
)

print(f"See results in Okareo: {evaluation.app_link}")