# Scenarios in Okareo: An Introduction

<a target="_blank" href="https://colab.research.google.com/github/okareo-ai/okareo-python-sdk/blob/mason/add-unified-generator-example-notebooks/examples/generated_scenarios_walkthrough.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## 🎯 Goals

After using this notebook, you will be able to:
- Upload your test cases to Okareo as a Seed scenario by:
    1. Uploading a file
    2. Defining input data statically
- Generate synthetic new test cases using Scenario generators
- Chain generators together to make more complex test cases

First, import the Okareo library and use your [API key](https://docs.okareo.com/docs/guides/environment#setting-up-your-okareo-environment) to authenticate.

In [1]:
import os
from okareo import Okareo

OKAREO_API_KEY = os.environ["OKAREO_API_KEY"]
okareo = Okareo(OKAREO_API_KEY)

## Uploading a Seed Scenario

Here we use an existing `.jsonl` file to create a seed scenario with the `upload_scenario_set` method. The data here includes short articles about a fictitious company called "WebBizz."

In [None]:
file_path = "./webbizz_10_articles.jsonl"
scenario_name = "Retrieval Articles Scenario"
try:
    # load the file to okareo
    source_scenario = okareo.upload_scenario_set(file_path=file_path, scenario_name=scenario_name)
except:
    print(f"- Loading file {file_path} to Okareo failed. Temporarily download the file from GitHub...")
    import os
    import requests

    # if the file doesn't exist, download it
    file_url = "https://raw.githubusercontent.com/okareo-ai/okareo-python-sdk/main/examples/webbizz_10_articles.jsonl"
    response = requests.get(file_url)
    with open(file_path, "wb") as f:
        f.write(response.content)

    # load the file to okareo
    source_scenario = okareo.upload_scenario_set(file_path=file_path, scenario_name=scenario_name)

    # delete the file
    os.remove(file_path)
print(source_scenario.app_link)

With a seed scenario defined, let's use Okareo to generate some synthetic scenarios.

## Rephrasing generator

Suppose we want to generate a scenario where the same `result`s should be retrieved under `input`s with minor changes to word order. We can achieve this using the Rephrasing generator, which attempts to change the wording of each sentence in a given `input`.

In [5]:
from okareo_api_client.models import ScenarioType

generated_scenario = okareo.generate_scenarios(
    source_scenario=source_scenario.scenario_id,
    name="Retrieval Articles Scenario: Rephrased",
    number_examples=1, # number of examples to generate per row in seed scenario
    generation_type=ScenarioType.REPHRASE_INVARIANT
)

print(generated_scenario)
print(generated_scenario.app_link)

ScenarioSetResponse(scenario_id='c7beefb0-126e-410f-9f4d-083280a04e72', project_id='21b8a05b-f5b6-4578-a36a-fc264036d9d3', time_created=datetime.datetime(2024, 3, 5, 21, 12, 43, 419312), type='REPHRASE_INVARIANT', tags=['seed:983e5d3f-d6bd-4d8d-a74e-c488fdeb0545'], name='Retrieval Articles Scenario: Rephrased', seed_data=[], scenario_count=0, scenario_input=[], app_link='https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/c7beefb0-126e-410f-9f4d-083280a04e72', additional_properties={})
https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/c7beefb0-126e-410f-9f4d-083280a04e72


Now let's compare the generated scenarios to their corresponding seed scenarios.

In [43]:
# helper methods to compare the seed/generated articles
def make_points_to_dict(points):
    """
    note: using the `result`s as keys will not generalize to any scenario,
    but it works for the scenarios in this notebook.
    """
    d = {}
    for p in points:
        res = p.result if type(p.result) == str else p.result[0]
        if res not in d.keys():
            d[res] = [p.input_]
        else:
            d[res].append(p.input_)
    return d

def compare_seed_and_generated(seed_scenario, generated_scenario, number_examples=1):
    gen_points = okareo.get_scenario_data_points(generated_scenario.scenario_id)
    seed_points = okareo.get_scenario_data_points(seed_scenario.scenario_id)
    gen_d = make_points_to_dict(gen_points)
    seed_d = make_points_to_dict(seed_points)
    N = len(gen_d)
    for i, key in enumerate(seed_d.keys()):
        print("-"*8 + f"Example {i}" + "-"*8)
        print(f"Source: {seed_d[key][0]}")
        for j in range(len(gen_d[key])):
            print("-"*26)
            print(f"Generated #{j}: {gen_d[key][j]}")
    

In [22]:
compare_seed_and_generated(source_scenario, generated_scenario)

--------Example 0--------
Source: WebBizz is dedicated to providing our customers with a seamless online shopping experience. Our platform is designed with user-friendly interfaces to help you browse and select the best products suitable for your needs. We offer a wide range of products from top brands and new entrants, ensuring diversity and quality in our offerings. Our 24/7 customer support is ready to assist you with any queries, from product details, shipping timelines, to payment methods. We also have a dedicated FAQ section addressing common concerns. Always ensure you are logged in to enjoy personalized product recommendations and faster checkout processes.
--------------------------
Generated #0: WebBizz prioritizes a smooth digital shopping journey for our customers. Our platform is tailored with straightforward interfaces for easier product browsing and selection. We present a variety of items from leading brands and emerging ones, guaranteeing an assortment and excellence i

## Term Relevance generator

Suppose we want to generate a scenario based on keywords from the same `input`s. We can generate such a scenario using the Term Relevance generator, which extracts the most relevant terms based on [term frequency-inverse document frequency (tf-idf)](https://en.wikipedia.org/wiki/Tf%E2%80%93idf).

In [None]:
generated_scenario_tr = okareo.generate_scenarios(
    source_scenario=source_scenario.scenario_id,
    name="Retrieval Articles Scenario: Term Relevance",
    number_examples=1, # number of examples to generate per seed data
    generation_type=ScenarioType.TERM_RELEVANCE_INVARIANT
)

print(generated_scenario_tr.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/3cb9aebd-5e12-4563-ae1f-e1873e277e6e


In [None]:
compare_seed_and_generated(source_scenario, generated_scenario_tr)

--------Example 0--------
Source: Navigating WebBizz is a breeze with our advanced search functionalities. Whether you're looking for a specific product or exploring our latest collections, our dynamic filters and sorting options will guide you effortlessly. You can sort products based on popularity, price, and ratings to find the perfect fit for your needs.
--------------------------
Generated #0: advanced breeze filters
--------------------------
Generated #1: dynamic breeze filters
--------Example 1--------
Source: We're proud of our diverse product catalog that caters to customers globally. From eco-friendly products to exclusive designer collaborations, WebBizz curates its collections with care. We also frequently feature limited-time offers and flash sales, so keep an eye out for these exclusive deals!
--------------------------
Generated #0: care catalog caters
--------------------------
Generated #1: exclusive catalog caters
--------Example 2--------
Source: Subscribing to our 

## Seed Scenario Creation via Static Definition 

In addition to uploading a list of json objects from a `.jsonl` file, we can also statically define a scenario by explicitly defining `SeedData` objects. The following cell shows you the required imports and arguments to do this.

In [58]:
from okareo_api_client.models import ScenarioSetCreate, SeedData

# list of statically defined seed data
seed_data=[
    SeedData(input_="The quick brown fox jumps over the lazy dog", result="result1"),
    SeedData(input_="The rain in Spain falls mainly on the plain", result="result2"),
    SeedData(input_="Lorem ipsum dolor sit amet, consectetur adipiscing elit", result="result3")
]

# request for scenario set creation 
scenario_set_create = ScenarioSetCreate(
    name="Statically Defined Scenario: Seed",
    generation_type=ScenarioType.SEED,
    number_examples=len(seed_data), # number of examples to generate per seed data
    seed_data=seed_data
)

source_scenario_static = okareo.create_scenario_set(scenario_set_create)

print(source_scenario_static.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/930457a0-1153-4049-a397-56aa25c84101


## Mispelling generator

Using this statically defined scenario, suppose we want to introduce errors into the `input`s. The Misspelling generator does this by emulating human typing error, i.e. by editing random characters in the input to different, adjacent keys on a typical QWERTY keyboard.

In [59]:
generated_scenario_mis = okareo.generate_scenarios(
    source_scenario=source_scenario_static.scenario_id,
    name="Statically Defined Scenario: Misspelling",
    number_examples=2, # number of examples to generate per seed data
    generation_type=ScenarioType.COMMON_MISSPELLINGS
)

print(generated_scenario_mis)
print(generated_scenario_mis.app_link)

ScenarioSetResponse(scenario_id='bb18a587-a191-427a-bb7d-7a4eb1f91224', project_id='21b8a05b-f5b6-4578-a36a-fc264036d9d3', time_created=datetime.datetime(2024, 3, 5, 23, 42, 33, 971629), type='COMMON_MISSPELLINGS', tags=['seed:930457a0-1153-4049-a397-56aa25c84101'], name='Statically Defined Scenario: Misspelling', seed_data=[], scenario_count=0, scenario_input=[], app_link='https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/bb18a587-a191-427a-bb7d-7a4eb1f91224', additional_properties={})
https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/bb18a587-a191-427a-bb7d-7a4eb1f91224


In [60]:
compare_seed_and_generated(source_scenario_static, generated_scenario_mis)

--------Example 0--------
Source: The quick brown fox jumps over the lazy dog
--------------------------
Generated #0: The quick brown fox jumpd over the lazy dog
--------------------------
Generated #1: The quick brown fox jumps over the lazy dpg
--------Example 1--------
Source: The rain in Spain falls mainly on the plain
--------------------------
Generated #0: The rain in Spain falld mainly on the plain
--------------------------
Generated #1: The rain in Spain fslls mainly on the plain
--------Example 2--------
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit
--------------------------
Generated #0: Lorem ipsum dolor sit amet, consectetue adipiscing elit
--------------------------
Generated #1: Lorwm ipsum dolor sit amet, consectetur adipiscing elit


## Contraction generator

Contractions and abbreviations can occur commonly in human-written documents. The Contraction generator lets you generate a scenario that tries to find common abbreviations of strings in the `input`s.

In [63]:
generated_scenario_con = okareo.generate_scenarios(
    source_scenario=source_scenario_static.scenario_id,
    name="Statically Defined Scenario: Contractions",
    number_examples=1, # number of examples to generate per seed data
    generation_type=ScenarioType.COMMON_CONTRACTIONS
)

print(generated_scenario_con.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/03f9e05e-8c37-4f00-b73d-03ed8af1530d


In [64]:
compare_seed_and_generated(source_scenario_static, generated_scenario_con)

--------Example 0--------
Source: The quick brown fox jumps over the lazy dog
--------------------------
Generated #0: The quck brown fox jumps over the lazy dog
--------Example 1--------
Source: The rain in Spain falls mainly on the plain
--------------------------
Generated #0: The rain in Spain falls mainl on the plain
--------Example 2--------
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit
--------------------------
Generated #0: Lorem ipsum dolor sit amet, consecttur adipiscing elit


## Reverse Question generator

Suppose we would like to generate some common user queries that are based on the original `input`s. The Reverse Question generator enables this by generating questions where the answer is contained in the original input.

In [29]:
generated_scenario_question = okareo.generate_scenarios(
    source_scenario=source_scenario.scenario_id,
    name="Retrieval Articles Scenario: Reverse Question",
    number_examples=1, # number of examples to generate per seed data
    generation_type=ScenarioType.TEXT_REVERSE_QUESTION
)

print(generated_scenario_question)
print(generated_scenario_question.app_link)

ScenarioSetResponse(scenario_id='964dd104-8a21-4d16-b00c-afc85577bb4d', project_id='21b8a05b-f5b6-4578-a36a-fc264036d9d3', time_created=datetime.datetime(2024, 3, 5, 22, 54, 56, 242334), type='TEXT_REVERSE_QUESTION', tags=['seed:983e5d3f-d6bd-4d8d-a74e-c488fdeb0545'], name='Retrieval Articles Scenario: Reverse Question', seed_data=[], scenario_count=0, scenario_input=[], app_link='https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/964dd104-8a21-4d16-b00c-afc85577bb4d', additional_properties={})
https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/964dd104-8a21-4d16-b00c-afc85577bb4d


In [44]:
compare_seed_and_generated(source_scenario, generated_scenario_question)

--------Example 0--------
Source: Navigating WebBizz is a breeze with our advanced search functionalities. Whether you're looking for a specific product or exploring our latest collections, our dynamic filters and sorting options will guide you effortlessly. You can sort products based on popularity, price, and ratings to find the perfect fit for your needs.
--------------------------
Generated #0: What features does WebBizz provide to enhance your browsing experience?
--------Example 1--------
Source: We're proud of our diverse product catalog that caters to customers globally. From eco-friendly products to exclusive designer collaborations, WebBizz curates its collections with care. We also frequently feature limited-time offers and flash sales, so keep an eye out for these exclusive deals!
--------------------------
Generated #0: Which types of products does WebBizz pride itself in offering to its global customers?
--------Example 2--------
Source: Subscribing to our newsletter give

## Conditional generator + Chaining generators

Now suppose we would like to rephrase the questions we generated by the Reverse Question generator. The Conditional generator tries to do this by adding a qualifying phrase to the beginning of each question. 

Observe that the `source_scenario` argument below uses the output of the previous generation (`generated_scenario_question`) meaning the generated Reverse Question scenario is the input to the Conditional generator. This is one method for *chaining* generators together, which can let you layer different generators on one another and create composite generations.

In [68]:
# generate a Conditional scenario with the generated Reverse Question scenario as a source
generated_scenario_conditional = okareo.generate_scenarios(
    source_scenario=generated_scenario_question.scenario_id,
    name="Retrieval Articles Scenario: Conditional",
    number_examples=1, # number of examples to generate per seed data
    generation_type=ScenarioType.CONDITIONAL
)

print(generated_scenario_conditional.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/3e2463fc-898b-493a-b8f2-3f4951e1dc1e


In [69]:
compare_seed_and_generated(generated_scenario_question, generated_scenario_conditional)

--------Example 0--------
Source: Can you tell me about a platform I could use for a user-friendly online shopping experience, that provides 24/7 customer support and personalized product recommendations?
--------------------------
Generated #0: When searching for a platform that provides a user-friendly shopping experience, along with 24/7 customer support and personalized product recommendations, could you recommend one?
--------Example 1--------
Source: What measures does WebBizz take to ensure the security of personal and financial data?
--------------------------
Generated #0: Given the measures that WebBizz employs, how is the security of personal and financial data ensured?
--------Example 2--------
Source: What are some exclusive benefits offered to the 'Premium Club' members by WebBizz?
--------------------------
Generated #0: Given the 'Premium Club' membership at WebBizz, what would be some exclusive benefits offered?
--------Example 3--------
Source: How does the 'Wishlist'