# Rephrasing Generator

This is a demo of the Okareo rephrasing generator. The rephrasing generator is useful when you would like to automatically create test cases with semantically identical content but different wording. This generator can be used to ensure that slightly different inputs to your LLM/RAG system do not substantially change the results.

This generator takes the inputs of a scenario and rephrases them at a clause level, i.e. the generator will the respect punctuation of the source.

For example, suppose that `[A, ..., E]` are strings. If a given input is structured as follows:

```A, B. C; D -- E!```

Then the rephrase generator will produce the following:

```rephrase(A), rephrase(B). rephrase(C); rephrase(D) -- rephrase(E)!```

To showcase the rephrase generator, we load a subset of the [Game Recommendations on Steam dataset (Kaggle)](https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam) and pick a few representative samples.

In [1]:
import pandas as pd

filename = 'data/games_desc.csv'
df = pd.read_csv(filename)

N_data = 3
data_desc = df['description'].iloc[:N_data].reset_index(drop=True)
data_desc.head()

0    Enter the dark underworld of Prince of Persia ...
1    Monaco: What's Yours Is Mine is a single playe...
2    Escape Dead Island is a Survival-Mystery adven...
Name: description, dtype: object

Set up Okareo object by loading .env file with API key.

In [3]:
import os
from okareo import Okareo
from okareo_api_client.models import ScenarioSetCreate, ScenarioSetGenerate, SeedData, ScenarioType

OKAREO_API_KEY = os.environ["OKAREO_API_KEY"]
okareo = Okareo(OKAREO_API_KEY)

Add the seed scenario using product descriptions as inputs, return results, and show link to UI.

In [4]:
scenario_set_create = ScenarioSetCreate(name="Rephrasing Generator: Seed Scenario",
                                        generation_type=ScenarioType.SEED,
                                        number_examples=N_data,
                                        seed_data=[SeedData(input_=data_desc.iloc[i], result="example result") for i in range(N_data)]
                                        )

source_scenario = okareo.create_scenario_set(scenario_set_create)

print(source_scenario.app_link)

https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/f44c20cf-ecce-41fd-8fe9-c9208df9038b


Use the rephrase seed scenario to generate two examples per each point in the seed scenario, and show link to UI.

In [5]:
number_examples = 2 # number of examples to generate for each example in the seed scenario

generated_scenario = okareo.generate_scenario_set(
    ScenarioSetGenerate(
        source_scenario_id=source_scenario.scenario_id,
        name="Term Relevance Example: Generated Scenario",
        number_examples=number_examples,
        generation_type=ScenarioType.REPHRASE_INVARIANT
    )
)
print(generated_scenario.app_link)

'https://app.okareo.com/project/21b8a05b-f5b6-4578-a36a-fc264036d9d3/scenario/efe18377-658e-494d-b7d1-eb0cb6fafe7a'

Compare each generated description to the corresponding source description. 

In [8]:
# compare the descriptions
generated_points = okareo.get_scenario_data_points(generated_scenario.scenario_id)

for i in range(N_data):
    print("-"*8 + f"Example {i}" + "-"*8)
    print(f"Source: {data_desc[i]}")
    for j in range(number_examples):
        print(f"Generated #{j}: {generated_points[number_examples*i+j].input_}")
    

--------Example 0--------
Source: Enter the dark underworld of Prince of Persia Warrior Within, the sword-slashing sequel to the critically acclaimed Prince of Persia: The Sands of Time™. Hunted by Dahaka, an immortal incarnation of Fate seeking divine retribution, the Prince embarks upon a path of both carnage and mystery to defy his preordained death.
Generated #0: Immerse yourself in the shadowy world of Prince of Persia Warrior Within, the epic follow-up to the well-reviewed Prince of Persia: The Sands of Time™. Pursued relentlessly by Dahaka, a deathless embodiment of Destiny seeking heavenly punishment, the Prince sets out on a journey filled with violence and enigma to resist his foreseen demise.
Generated #1: Step into the sinister realm of Prince of Persia Warrior Within, the action-packed second part to the highly praised Prince of Persia: The Sands of Time™. Chased by Dahaka, an undying manifestation of Destiny demanding divine vengeance, the Prince commences a campaign of d