<a target="_blank" href="https://colab.research.google.com/github/okareo-ai/okareo-python-sdk/blob/main/examples/generating_classification_scenarios.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Generate synthetic data to use in classification

1. Install Okareo's Python SDK: &nbsp;&nbsp;  `pip install okareo`  &nbsp;&nbsp; 

2. Get your API token from [https://app.okareo.com/](https://app.okareo.com/).  
   (Note: You will need to register first.)

3. Go directly to the API settings by clicking the button under **"1. Create API Token"**. You can skip all other steps.

4. Add your generated API token to the cell below. 👇


In [1]:
OKAREO_API_KEY = "YOUR_API_KEY"

In [None]:
%pip install okareo 

This notebook generates the data that is used to train the model in this notebook:

<a target="_blank" href="https://colab.research.google.com/github/okareo-ai/okareo-python-sdk/blob/main/examples/classification_eval_training.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

We start by manually creating a set of seed questions.

Those seed questions are used to create a Scenario Set in Okareo.

In [2]:
from okareo import Okareo
import os
import tempfile
import json

okareo = Okareo(OKAREO_API_KEY)

# Start with sample data for complaints, returns, and pricing
rows = [
    {"input": "How often do you update your prices based on market trends?", "result": "pricing"},
    {"input": "Why is the price of Product X higher than that of your competitor?", "result": "pricing"},
    {"input": "Can you explain the price difference between similar products from different brands?", "result": "pricing"},
    {"input": "Do you offer price matching with other online retailers?", "result": "pricing"},
    {"input": "Could you provide more details on the bulk pricing for office supplies?", "result": "pricing"},
    {"input": "What factors influence the pricing changes for home appliances?", "result": "pricing"},
    {"input": "How do I initiate a return for a recently purchased item?", "result": "returns"},
    {"input": "What is the time frame within which I can return an unwanted product?", "result": "returns"},
    {"input": "Is there a fee associated with returning an item?", "result": "returns"},
    {"input": "What documentation do I need to provide for a return request?", "result": "returns"},
    {"input": "Can I return an item if I've lost the original packaging?", "result": "returns"},
    {"input": "Are there any exceptions to your return policy for clearance items?", "result": "returns"},
    {"input": "How can I track the status of my return?", "result": "returns"},
    {"input": "Can I return online purchases to a physical store location?", "result": "returns"},
    {"input": "How do I file a formal complaint about a negative experience with customer service?", "result": "complaints"},
    {"input": "Why haven't I received a resolution for my previous complaint?", "result": "complaints"},
    {"input": "What steps are taken aftr I submit a complaint?", "result": "complaints"},
    {"input": "How long does it take for the complaints team to respond?", "result": "complaints"},
    {"input": "How can I provide detailed feedback about the service I received?", "result": "complaints"},
    {"input": "I bought a shirt online, but it doesn't fit. How can I return it?", "result": "returns"},
    {"input": "I want to return a gift I received. How can I do that without the receipt?", "result": "returns"},
    {"input": "The product I ordered is not what I received. How can I return it?", "result": "returns"},
    {"input": "I'm not satisfied with the quality of the product. Can I return it?", "result": "returns"},
    {"input": "The item I bought is faulty. How do I return it?", "result": "returns"},
    {"input": "I received my order late and no longer need it. Can I return it?", "result": "returns"},
    {"input": "The product I ordered doesn't match the description. Can I return it?", "result": "returns"},
    {"input": "The product is not working as expected, I'd like to return it?", "result": "returns",},
    {"input": "I had a poor experience with your customer service. Who can I talk to about this?", "result": "complaints"},
    {"input": "My product broke after just a week of use. How can this issue be resolved?", "result": "complaints"},
    {"input": "I'm not happy with the service I received. Who can I complain to?", "result": "complaints"},
    {"input": "I have a problem with the delivery of my order. Who can I speak to?", "result": "complaints"},
    {"input": "I received a faulty product. Who can I complain to?", "result": "complaints"},
    {"input": "I have a complaint about a staff member. Who should I contact?", "result": "complaints"},
    {"input": "My order was missing an item. Who can I speak to about this?", "result": "complaints"},
    {"input": "Why has the price of your product increased recently?", "result": "pricing"},
    {"input": "I found a similar product cheaper elsewhere. Do you offer price matching?", "result": "pricing"},
    {"input": "Why is there a difference in price between your online and in-store products?", "result": "pricing"},
    {"input": "Can you explain the pricing of your subscription service?", "result": "pricing"},
    {"input": "Do you offer a lower price for bulk purchases?", "result": "pricing"},
    {"input": "Why is the price of the product different in my country?", "result": "pricing"},
    {"input": "Do you have a price guarantee policy?", "result": "pricing"},
    {"input": "Why did the price of the item in my cart change?", "result": "pricing"},
]

temp_dir = tempfile.gettempdir()
file_path = os.path.join(temp_dir, "seed_data_sample.jsonl")

# Write to a .jsonl file
with open(file_path, "w+") as file:
    for row in rows:
        file.write(json.dumps(row) + '\n')
    

# Create scenario set with seed data file
source_scenario = okareo.upload_scenario_set(file_path=file_path, scenario_name="Blog Training Set")
print(source_scenario.app_link)

# make sure to clean up tmp file
os.remove(file_path)

https://app.okareo.com/project/57101004-ed10-4301-bc64-5576f2aa3513/scenario/98f37d03-7e59-41fe-af3a-b9b0fda7513a


From the seed Scenario Set, we create a new Scenario Set using Okareo's generator.

In [3]:
from okareo_api_client.models import ScenarioType
# Use scenario set id or scenario set object from previous step as source for generation
rephrased_scenario = okareo.generate_scenarios(
    source_scenario=source_scenario,
    name="Blog - train - rephrase",
    number_examples=3,
    generation_type=ScenarioType.REPHRASE_INVARIANT
)

print(rephrased_scenario.app_link)

https://app.okareo.com/project/57101004-ed10-4301-bc64-5576f2aa3513/scenario/53d63adc-1d52-45fa-841a-f5b418109bec


In [8]:
import pandas as pd
dps = okareo.get_scenario_data_points("53d63adc-1d52-45fa-841a-f5b418109bec")
formatted = [{"input": dp.input_, "result": dp.result} for dp in dps]
pd.DataFrame(formatted).to_csv("blog_data/rephrased.csv", index=False)

In [4]:
spelling_scenario = okareo.generate_scenarios(
    source_scenario=source_scenario,
    name="Blog - train - spelling",
    number_examples=3,
    generation_type=ScenarioType.COMMON_MISSPELLINGS
)

print(spelling_scenario.app_link)

https://app.okareo.com/project/57101004-ed10-4301-bc64-5576f2aa3513/scenario/eb11b2ef-ea0e-492d-bfe2-6a6ccdf1bfbc


In [9]:
dps = okareo.get_scenario_data_points("eb11b2ef-ea0e-492d-bfe2-6a6ccdf1bfbc")
formatted = [{"input": dp.input_, "result": dp.result} for dp in dps]
pd.DataFrame(formatted).to_csv("blog_data/spelling.csv", index=False)

In [5]:
contr_scenario = okareo.generate_scenarios(
    source_scenario=source_scenario,
    name="Blog - train - contractions",
    number_examples=3,
    generation_type=ScenarioType.COMMON_CONTRACTIONS
)

print(contr_scenario.app_link)

https://app.okareo.com/project/57101004-ed10-4301-bc64-5576f2aa3513/scenario/f32ab75a-388c-4782-96d6-e8a43f6e02a8


In [10]:
dps = okareo.get_scenario_data_points("f32ab75a-388c-4782-96d6-e8a43f6e02a8")
formatted = [{"input": dp.input_, "result": dp.result} for dp in dps]
pd.DataFrame(formatted).to_csv("blog_data/contr.csv", index=False)

In [6]:
cond_scenario = okareo.generate_scenarios(
    source_scenario=source_scenario,
    name="Blog - train - conditional",
    number_examples=3,
    generation_type=ScenarioType.CONDITIONAL
)

print(cond_scenario.app_link)

https://app.okareo.com/project/57101004-ed10-4301-bc64-5576f2aa3513/scenario/6f756046-f5eb-4e56-b19e-2ebee0902b27


In [11]:
dps = okareo.get_scenario_data_points("6f756046-f5eb-4e56-b19e-2ebee0902b27")
formatted = [{"input": dp.input_, "result": dp.result} for dp in dps]
pd.DataFrame(formatted).to_csv("blog_data/cond.csv", index=False)