<a href="https://colab.research.google.com/github/higherbar-ai/open-chat-studio-sim/blob/main/src/simulate-queries.ipynb" target="_parent"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"/></a>

# Execute fixed queries

This notebook executes a fixed set of single-turn queries against an Open Chat Studio experiment.

## Running in Google Colab

Before running this notebook, you'll need to configure a series of secrets in Google Colab; click the key button in the left sidebar, and be sure to click the toggle to give this notebook access to each of the secrets. These are the secrets used by this notebook:

- `OCS_API_KEY`: your Open Chat Studio API key
- `ATHINA_API_KEY`: your Athina API key (optional; only if you want to export results to Athina)
- `EXPERIMENT_ID`: the ID of the experiment you want to issue queries to
- `PARTICIPANT_ID`: the participant ID to use for the queries

## Running in a local environment

When you first run the first code cell in this notebook, it will output a template configuration file for you. Edit that file to specify your configuration parameters (see above for their descriptions). 

## Selecting or uploading your queries to run

The second code cell will prompt you to select or upload a .csv file with the queries you want to run. This file should have the following columns:

- `query`: the query to send to the experiment
- `query_id`: (optional) a unique identifier for the query (if not provided, row number will be used)
- `expected_response`: (optional) the expected response to the query

## Where results go

The results of the queries will be saved to a file called `query_results.csv`. If you're running in Google Colab, click the folder button in the sidebar to view and download that file. If you're running locally, it will be output to the `ocs` subdirectory off of your local directory. 

If an Athina API key is configured, the results will also be exported to an Athina dataset.

In [None]:
# First, we need to set up our environment handler
try:
    # Try importing directly (local environment)
    from colab_or_local_env import ColabOrLocalEnv  # type: ignore[import]
except ImportError:
    # If import fails, we're probably in Colab, so fetch the file
    import requests
    
    # Fetch the environment handler
    url = "https://raw.githubusercontent.com/higherbar-ai/open-chat-studio-sim/main/src/colab_or_local_env.py"
    response = requests.get(url)
    response.raise_for_status()
    
    # Save it to the current directory
    with open("colab_or_local_env.py", "w") as f:
        f.write(response.text)
    
    # Now we can import it
    from colab_or_local_env import ColabOrLocalEnv
    
# set log level to WARNING
import logging
logging.basicConfig(level=logging.WARNING)

# Initialize our environment
env = ColabOrLocalEnv(
    github_repo="higherbar-ai/open-chat-studio-sim",
    requirements_path="requirements.txt",
    module_paths=["src/ocs_api.py", "src/ocs_simulation_support.py"],
    config_path="~/.ocs/.env",
    config_template={
        "OCS_API_KEY": "",
        "ATHINA_API_KEY": "",
        "EXPERIMENT_ID": "",
        "PARTICIPANT_ID": "open-chat-studio-sim",
    }
)

# Set up the environment (install requirements, fetch modules if needed)
env.setup_environment()

# Internal configuration
api_timeout_seconds = 300      # how long to give API calls before timing out
api_num_retries = 3            # how many times to retry API calls before giving up
api_retry_delay_seconds = 2    # how long to wait between retries
continue_on_error = True       # whether to record errors and continue (if False, errors will halt execution)

# Get API keys from environment
ocs_api_key = env.get_config_setting("OCS_API_KEY")
athina_api_key = env.get_config_setting("ATHINA_API_KEY")
participant_id = env.get_config_setting("PARTICIPANT_ID")
experiment_id = env.get_config_setting("EXPERIMENT_ID")

# Validate required configuration
if not all([ocs_api_key, participant_id, experiment_id]):
    raise ValueError("Please supply at least OCS_API_KEY, PARTICIPANT_ID, and EXPERIMENT_ID in your secrets or configuration file.")

# Output files to ~/ocs directory if local, otherwise /content if Google Colab
if env.is_colab:
    output_path_prefix = "/content"
else:
    import os
    output_path_prefix = os.path.expanduser("~/ocs")
    os.makedirs(output_path_prefix, exist_ok=True)

# Initialize OCS API support
from ocs_api import OCSAPIClient    # type: ignore[import]
ocs_api_client = OCSAPIClient(
    api_key=ocs_api_key, 
    timeout_seconds=api_timeout_seconds, 
    num_retries=api_num_retries, 
    retry_wait_seconds=api_retry_delay_seconds
)

# Report results
print(f"Configuration loaded for {'Colab' if env.is_colab else 'local'} environment, OCS API initialized.")

## Select or upload your queries to run

The code cell below will prompt you to select or upload a .csv file. That file should have the following columns:

- `query`: the query to send to the experiment
- `query_id`: (optional) a unique identifier for the query (if not provided, row number will be used)
- `expected_response`: (optional) the expected response to the query

In [None]:
# prompt for the CSV file with queries to run
queries_to_run_files = env.get_input_files("CSV file with queries to run")

# check for one CSV file
if len(queries_to_run_files) != 1:
    raise ValueError("Please select exactly one CSV file with queries to run.")
elif not queries_to_run_files[0].endswith(".csv"):
    raise ValueError("Please select a CSV file with queries to run.")

queries_to_run_file = str(queries_to_run_files[0])

## Execute queries

The following code block reads a list of simulations to run from the .csv file selected or uploaded above, executes them, and saves the results to the `query_results.csv` file. 

Your .csv file should have the following columns:

- `query`: the query to send to the experiment
- `query_id`: (optional) a unique identifier for the query (if not provided, row number will be used)
- `expected_response`: (optional) the expected response to the query

`query_results.csv` will have the following columns:

- `query_id`: the unique identifier for the query
- `session_id`: the unique identifier for the session
- `query`: the query sent to the AI assistant
- `response`: the response received from the AI assistant
- `expected_response`: (optional) the expected response for the query (if provided in the input file)

In [None]:
import csv
import pandas as pd
import os

# load input file using pandas
queries_to_run = pd.read_csv(queries_to_run_file)

# run through and execute each of the queries
results = []
for index, row in queries_to_run.iterrows():
    # initialize query details
    query_id = str(row.get("query_id", index+1))
    query = row["query"]
    expected_response = row.get("expected_response", "")
    response = ""
    session_id = ""
    
    # report out
    print(f"Executing query {query_id}...")
    
    # execute query, catching and logging any errors
    try:
        # create a new session for the query
        api_response = ocs_api_client.create_experiment_session(experiment_id, participant_id)
        session_id = api_response["id"]
    
        # send the query to the experiment
        api_response = ocs_api_client.send_new_api_message(experiment_id, query, session_id)
        response = api_response["response"]
    except Exception as e:
        if continue_on_error:
            # log the error and continue to the next query
            logging.error(f"Continuing following query error: {str(e)}")
            response = f"ERROR: {str(e)}"
        else:
            # raise the error to halt execution
            raise

    # add to results
    results.append({
        "query_id": query_id,
        "session_id": session_id,
        "query": query,
        "response": response
    })
    # optionally add expected response to results
    if "expected_response" in queries_to_run.columns:
        results[-1]["expected_response"] = expected_response

# save results to output .csv file
output_file = os.path.join(output_path_prefix, "query_results.csv")
output_rows = []
fieldnames=["query_id", "session_id", "query", "response"]
# if there's an expected_response column in the input file, include it in the output
if "expected_response" in queries_to_run.columns:
    fieldnames.append("expected_response")
with open(output_file, "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames, quoting=csv.QUOTE_NONNUMERIC, escapechar='\\')
    writer.writeheader()
    for result in results:
        # output and record for potential next steps
        writer.writerow(result)
        output_rows.append(result)

# report results
print()
print(f"Queries executed and {len(results)} results saved to {output_file}.")

## Optional: Export results to Athina dataset

If an Athina API key is configured, the results can be exported to an Athina dataset. The dataset will be named `queries-{experiment_id}-{timestamp}` and will contain the rows from the `query_results.csv` file.

In [None]:
from ocs_simulation_support import athina_create_dataset    # type: ignore[import]

# optionally export the results to an Athina dataset
if athina_api_key:
    # push new dataset to Athina
    dataset_name = f"queries-{experiment_id}-{pd.Timestamp.now().strftime('%Y%m%d%H%M%S')}"
    dataset_description = f"Simulated queries for experiment {experiment_id} at {pd.Timestamp.now()}"
    try:
        dataset = athina_create_dataset(athina_api_key=athina_api_key, dataset_name=dataset_name, dataset_description=dataset_description, dataset_rows=output_rows)
    except Exception as e:
        print(f"Failed to create Athina dataset: {e}")
    else:
        print(f"Results exported to Athina dataset {dataset.id} (name: {dataset_name}).")