<a target="_blank" href="https://colab.research.google.com/github/okareo-ai/okareo-python-sdk/blob/main/examples/generation_eval.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Welcome to Okareo!

Get your API token from [https://app.okareo.com/](https://app.okareo.com/) and set it in the cell below. 👇
   (Note: You will also need an OpenAI key.)



In [None]:
OKAREO_API_KEY = "<YOUR-OKAREO-API-KEY>"
OPENAI_API_KEY = "<YOUR-OPENAI-API-KEY>"

%pip install okareo openai

We're going to set up a simple generation task that will score a model on how well it can answer a question given some context. The questions will be about WebBizz, an example web business. The answer will be scored on how relevant it is for the given question. The setup will have 3 parts.

1. Creating a scenario with questions and context.
2. Setting up a generation model with prompts
3. Adding a custom check and using it in an evaluation

In [None]:
# Import libraries
import os
import tempfile
from io import StringIO  
import pandas as pd

# Import Okareo libraries
from okareo import Okareo
from okareo_api_client.models.test_run_type import TestRunType
from okareo_api_client.models import ScenarioSetCreate, SeedData

# Create an instance of the Okareo client
okareo = Okareo(OKAREO_API_KEY)

# Load documents from Okareo's GitHub repository
webbizz_articles = os.popen('curl https://raw.githubusercontent.com/okareo-ai/okareo-python-sdk/main/examples/webbizz_10_articles.jsonl').read()

# Convert the JSONL string to a pandas DataFrame
articlesJson = pd.read_json(path_or_buf=StringIO(webbizz_articles), lines=True)

# Load questions from Okareo's GitHub repository
webbizz_questions = os.popen('curl https://raw.githubusercontent.com/okareo-ai/okareo-python-sdk/main/examples/webbizz_retrieval_questions.jsonl').read()

# Convert the JSONL string to a pandas DataFrame
questionsJson = pd.read_json(path_or_buf=StringIO(webbizz_questions), lines=True)

# Get the context for each question
seed_inputs = questionsJson['input'].tolist()
seed_contexts = []
for i in range(len(seed_inputs)):
    context = ""
    # Get the context of the articles that are relevant to the question
    for article_id in questionsJson['result'].tolist()[i]:
        context += articlesJson[articlesJson['result'] == article_id]['input'].values[0] + "\n"
    seed_contexts.append(context)

# Create a scenario set using the questions and contexts
seed_data = []
for i in range(len(seed_inputs)):
    seed_data.append(SeedData(input_={'question': seed_inputs[i], 'context': seed_contexts[i]}, result='N/A'))
scenario_set_create = ScenarioSetCreate(
    name=f"QA Scenario w/ Context- Webbizz",
    seed_data=seed_data
)
scenario = okareo.create_scenario_set(scenario_set_create)

## Question Answer Model
We will be using GPT-4o from OpenAI to generate the answers.

In [None]:
# Import libraries
from openai import OpenAI
from datetime import datetime

# Import Okareo's handler for OpenAI models
from okareo.model_under_test import OpenAIModel

# Create an instance of the OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

# Define a template for the user prompt
USER_PROMPT_TEMPLATE = "Question: {input.question} Context: {input.context}"

# Define a template to prompt the model to provide an answer based on the context
ANSWER_CONTEXT_TEMPLATE = """
You will be provided with context and a question.
Answer the question based on the context.
"""

# Create an instance of the OpenAIModel class
# This class is used to interact with the OpenAI model using user and system prompts
openai_model = OpenAIModel(
        model_id="gpt-4o",
        temperature=0,
        system_prompt_template=ANSWER_CONTEXT_TEMPLATE,
        user_prompt_template=USER_PROMPT_TEMPLATE,
    )

# Define the name of the model with the current timestamp
mut_name=f"OpenAI Answering Model - {datetime.now().strftime('%m-%d %H:%M:%S')}"

# Register the model to use in a test run
model_under_test = okareo.register_model(
    name=mut_name,
    model=openai_model,
    update=True
)

## Custom check with relevance prompt

In [None]:
from okareo.checks import ModelBasedCheck, CheckType

# Create a relevance check for the QA scenario with context
check = okareo.create_or_update_check(
    name="Relevance Check",
    description="Relevance check for QA with context",
    check=ModelBasedCheck(
        prompt_template="""
You will be given a question, context and answer.

Your task is to rate the answer on one metric.

Please make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.

Evaluation Criteria:

Relevance (1-5) - selection of important content from the context. The answer should include only important information 
from the context that is relevant to the question. Annotators were instructed to penalize answers which contained
redundancies and excess information.

Evaluation Steps:

1. Read the question, context, and answer carefully.
2. Compare the question to the context and identify the main points of the context.
3. Assess how well the answer covers the information that the question is asking for.
4. Assign a relevance score from 1 to 5.

Context:

{input.context}

Question:

{input.question}

Answer:

{model_output}

Evaluation Form (scores ONLY):
The output should only be one number.
- Relevance (1-5):
        """,
        check_type=CheckType.SCORE
    ),
)

## Evaluation

In [None]:
# Create a name for the evaluation with the current timestamp
eval_name = f"QA Evaluation - {datetime.now().strftime('%m-%d %H:%M:%S')}"

# Perform a test run using the scenario set
evaluation = model_under_test.run_test(
    name=eval_name,
    scenario=scenario,
    api_key=OPENAI_API_KEY,
    test_run_type=TestRunType.NL_GENERATION, # specify that we are testing a natural language generation model
    calculate_metrics=True,
    # Add the check we just created
    checks=[check.name]
)

# Generate a link back to Okareo for evaluation visualization
print(f"See results in Okareo: {evaluation.app_link}")