# Execute fixed queries

This notebook executes a fixed set of single-turn queries against an Open Chat Studio experiment. Before running, you should:

## 1. Set up the project overall

Follow the overall project setup instructions in `README.md`.

## 2. Configure your .ini file

The notebook begins by loading credentials and configuration from an `.ini` file stored in `~/.ocs/open-chat-studio-sim.ini`. The `~` in the path refers to the current user's home directory, and the `.ini` file contents should follow this format:

    [ocs]
    ocs-api-key=YOURKEYHERE
    experiment-id=YOURIDHERE
    participant-id=open-chat-studio-sim
    
    [files]
    input-path-prefix=~/ocs-sim/inputs
    output-path-prefix=~/ocs-sim/outputs
    
    [athina]
    athina-api-key=

You can get started quickly by:

1. Copying the `example-open-chat-studio-sim.ini` file to `~/.ocs/open-chat-studio-sim.ini`.

2. Editing `~/.ocs/open-chat-studio-sim.ini` as follows:

    a. Add your Open Chat Studio API key

    b. Add the experiment ID for the experiment you want to issue queries to
 
    c. Adjust the input and output path prefixes as appropriate (where `~` refers to your user home directory)

    d. Optionally, add an Athina API key if you want to export results as a dataset in Athina

## 4. Save input file to your configured input path

Your configured input path should include a `queries_to_run.csv` file with the following columns:

- `query`: the query to send to the experiment
- `query_id`: (optional) a unique identifier for the query (if not provided, row number will be used)
- `expected_response`: (optional) the expected response to the query

Finally, note that supporting code for this notebook can be found in `ocs_api.py` and `ocs_simulation_support.py`.

In [1]:
import logging
import configparser
import os

# set log level to WARNING
logging.basicConfig(level=logging.WARNING)

# load credentials and other configuration from local ini file
inifile_location = os.path.expanduser("~/.ocs/open-chat-studio-sim.ini")
inifile = configparser.RawConfigParser()
inifile.read(inifile_location)

# load configuration
ocs_api_key = inifile.get("ocs", "ocs-api-key")
experiment_id = inifile.get("ocs", "experiment-id")
participant_id = inifile.get("ocs", "participant-id")
input_path_prefix = os.path.expanduser(inifile.get("files", "input-path-prefix"))
output_path_prefix = os.path.expanduser(inifile.get("files", "output-path-prefix"))
athina_api_key = inifile.get("athina", "athina-api-key")

# initialize OCS API support
from ocs_api import OCSAPIClient
ocs_api_client = OCSAPIClient(ocs_api_key)

# report results
print("Local configuration loaded, OCS API initialized.")

Local configuration loaded, OCS API initialized.


## Execute queries

The following code block reads a list of queries to run from the `queries_to_run.csv` file in the configured input path, executes them, and saves the results to the `query_results.csv` file in the configured output path. 

`queries_to_run.csv` should have the following columns:

- `query`: the query to send to the experiment
- `query_id`: (optional) a unique identifier for the query (if not provided, row number will be used)
- `expected_response`: (optional) the expected response to the query

`query_results.csv` will have the following columns:

- `query_id`: the unique identifier for the query
- `session_id`: the unique identifier for the session
- `query`: the query sent to the AI assistant
- `response`: the response received from the AI assistant
- `expected_response`: (optional) the expected response for the query (if provided in the input file)

In [2]:
import csv
import pandas as pd

# load input file using pandas
input_file = os.path.join(input_path_prefix, "queries_to_run.csv")
queries_to_run = pd.read_csv(input_file)

# run through and execute each of the queries
results = []
for index, row in queries_to_run.iterrows():
    # initialize query details
    query_id = str(row.get("query_id", index+1))
    query = row["query"]
    expected_response = row.get("expected_response", "")
    response = ""
    session_id = ""
    
    # execute query, catching and logging any errors
    try:
        # create a new session for the query
        api_response = ocs_api_client.create_experiment_session(experiment_id, participant_id)
        session_id = api_response["id"]
    
        # send the query to the experiment
        api_response = ocs_api_client.send_new_api_message(experiment_id, query, session_id)
        response = api_response["response"]
    except Exception as e:
        # log the error and continue to the next query
        logging.error(f"Continuing following query error: {str(e)}")
        response = f"ERROR: {str(e)}"

    # add to results
    results.append({
        "query_id": query_id,
        "session_id": session_id,
        "query": query,
        "response": response
    })
    # optionally add expected response to results
    if "expected_response" in queries_to_run.columns:
        results[-1]["expected_response"] = expected_response

# save results to output .csv file
output_file = os.path.join(output_path_prefix, "query_results.csv")
output_rows = []
fieldnames=["query_id", "session_id", "query", "response"]
# if there's an expected_response column in the input file, include it in the output
if "expected_response" in queries_to_run.columns:
    fieldnames.append("expected_response")
with open(output_file, "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames, quoting=csv.QUOTE_NONNUMERIC, escapechar='\\')
    writer.writeheader()
    for result in results:
        # output and record for potential next steps
        writer.writerow(result)
        output_rows.append(result)

# report results
print(f"Queries executed and {len(results)} results saved to {output_file}.")

Queries executed and 3 results saved to /Users/crobert/Files/ocs-sim/outputs/query_results.csv.


## Optional: Export results to Athina dataset

If an Athina API key is configured, the results can be exported to an Athina dataset. The dataset will be named `queries-{experiment_id}-{timestamp}` and will contain the rows from the `query_results.csv` file.

In [3]:
from ocs_simulation_support import athina_create_dataset

# optionally export the results to an Athina dataset
if athina_api_key:
    # push new dataset to Athina
    dataset_name = f"queries-{experiment_id}-{pd.Timestamp.now().strftime('%Y%m%d%H%M%S')}"
    dataset_description = f"Simulated queries for experiment {experiment_id} at {pd.Timestamp.now()}"
    try:
        dataset = athina_create_dataset(athina_api_key=athina_api_key, dataset_name=dataset_name, dataset_description=dataset_description, dataset_rows=output_rows)
    except Exception as e:
        print(f"Failed to create Athina dataset: {e}")
    else:
        print(f"Results exported to Athina dataset {dataset.id} (name: {dataset_name}).")

Results exported to Athina dataset 1b08d3ef-366a-4ecf-ac51-8c0a03c6358a (name: queries-f721dce8-1e6e-4aff-a7ab-81459b255ed8-20240905135958).
