# Replay conversations

This notebook replays conversations using an Open Chat Studio experiment. Before running, you should:

## 1. Set up the project overall

Follow the overall project setup instructions in `README.md`.

## 2. Configure your .ini file

The notebook begins by loading credentials and configuration from an `.ini` file stored in `~/.ocs/open-chat-studio-sim.ini`. The `~` in the path refers to the current user's home directory, and the `.ini` file contents should follow this format:

    [ocs]
    ocs-api-key=YOURKEYHERE
    experiment-id=YOURIDHERE
    participant-id=open-chat-studio-sim
    
    [files]
    input-path-prefix=~/ocs-sim/inputs
    output-path-prefix=~/ocs-sim/outputs
    
    [athina]
    athina-api-key=

You can get started quickly by:

1. Copying the `example-open-chat-studio-sim.ini` file to `~/.ocs/open-chat-studio-sim.ini`.

2. Editing `~/.ocs/open-chat-studio-sim.ini` as follows:

    a. Add your Open Chat Studio API key

    b. Add the experiment ID for the experiment you want to replay conversations with
 
    c. Adjust the input and output path prefixes as appropriate (where `~` refers to your user home directory)

    d. Optionally, add an Athina API key if you want to export results as a dataset in Athina

## 4. Save input file to your configured input path

Your configured input path should include a `conversations_to_replay.csv` file that follows the format of experiment session exports in Open Chat Studio. The columns we rely on here are:

- `Message ID`: the unique identifier for the chat message
- `Message Type`: the chat message type (`human` or `ai`)
- `Message Content`: the chat message
- `Session ID`: the unique session ID for the conversation

Finally, note that supporting code for this notebook can be found in `ocs_api.py` and `ocs_simulation_support.py`.

In [1]:
import logging
import configparser
import os

# set log level to WARNING
logging.basicConfig(level=logging.WARNING)

# load credentials and other configuration from local ini file
inifile_location = os.path.expanduser("~/.ocs/open-chat-studio-sim.ini")
inifile = configparser.RawConfigParser()
inifile.read(inifile_location)

# load configuration
ocs_api_key = inifile.get("ocs", "ocs-api-key")
experiment_id = inifile.get("ocs", "experiment-id")
participant_id = inifile.get("ocs", "participant-id")
input_path_prefix = os.path.expanduser(inifile.get("files", "input-path-prefix"))
output_path_prefix = os.path.expanduser(inifile.get("files", "output-path-prefix"))
athina_api_key = inifile.get("athina", "athina-api-key")

# initialize OCS API support
from ocs_api import OCSAPIClient
ocs_api_client = OCSAPIClient(ocs_api_key)

# report results
print("Local configuration loaded, OCS API initialized.")

Local configuration loaded, OCS API initialized.


## Replay conversations

The following code block reads each human message from the `conversations_to_replay.csv` file in the configured input path, fetches a new AI response from each (using _the original conversation history_), and saves the results to the `replayed_conversations.csv` file in the configured output path. 

`conversations_to_replay.csv` should follow the format of Open Chat Studio experiment session exports. The columns used here are:

- `Message ID`: the unique identifier for the chat message
- `Message Type`: the chat message type (`human` or `ai`)
- `Message Content`: the chat message
- `Session ID`: the unique session ID for the conversation

`replayed_conversations.csv` will have the following columns:

- `message_id`: the unique identifier for the original query
- `session_id`: the unique identifier for the _original_ experiment session being replayed (links conversations)
- `replay_session_id`: the unique identifier for the _new_ experiment session created during replay
- `query`: the query sent to the AI assistant
- `response`: the response received from the AI assistant
- `orig_response`: the original response received from the AI assistant
- `context`: the raw conversation history included at the time of replay

In [2]:
import csv
import pandas as pd
import json

# load input file using pandas
input_file = os.path.join(input_path_prefix, "conversations_to_replay.csv")
conversations_to_replay = pd.read_csv(input_file)

# run through and replay each step of each conversation
results = []
orig_session_id = ""
orig_messages = []
user_message = ""
user_message_id = ""
session_id = ""
for index, row in conversations_to_replay.iterrows():
    if orig_session_id != row["Session ID"]:
        # initialize for new conversations
        orig_session_id = row["Session ID"]
        orig_messages = []
        user_message = ""
        user_message_id = ""
    
    if row["Message Type"] == "human":
        # remember user message, but only process when we get to the original AI response
        user_message = row["Message Content"]
        user_message_id = row["Message ID"]
    elif user_message and row["Message Type"] == "ai":
        # remember original AI response
        orig_response = row["Message Content"]
        
        # replay conversation step, catching and logging any errors
        try:
            # create a new session for the step, including the original conversation history
            api_response = ocs_api_client.create_experiment_session(experiment_id, participant_id, orig_messages)
            session_id = api_response["id"]
        
            # send the user message to the experiment
            api_response = ocs_api_client.send_new_api_message(experiment_id, user_message, session_id)
            response = api_response["response"]
        except Exception as e:
            # log the error and continue to the next query
            logging.error(f"Continuing following error fetching conversation response: {str(e)}")
            response = f"ERROR: {str(e)}"
    
        # add to results
        results.append({
            "message_id": user_message_id,
            "session_id": orig_session_id,
            "replay_session_id": session_id,
            "query": user_message,
            "response": response,
            "orig_response": orig_response,
            "context": json.dumps(orig_messages)
        })
        
        # add original exchange to message history
        orig_messages.append({
            "role": "user",
            "content": user_message
        })
        orig_messages.append({
            "role": "assistant",
            "content": orig_response
        })

# save results to output .csv file
output_file = os.path.join(output_path_prefix, "replayed_conversations.csv")
output_rows = []
fieldnames=["message_id", "session_id", "replay_session_id", "query", "response", "orig_response", "context"]
with open(output_file, "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames, quoting=csv.QUOTE_NONNUMERIC, escapechar='\\')
    writer.writeheader()
    for result in results:
        # output and record for potential next steps
        writer.writerow(result)
        output_rows.append(result)

# report results
print(f"Conversations replayed and {len(results)} results saved to {output_file}.")

Conversations replayed and 12 results saved to /Users/crobert/Files/ocs-sim/outputs/replayed_conversations.csv.


## Optional: Export results to Athina dataset

If an Athina API key is configured, the results can be exported to an Athina dataset. The dataset will be named `replayed-conversations-{experiment_id}-{timestamp}` and will contain the rows from the `replayed_conversations.csv` file.

In [3]:
from ocs_simulation_support import athina_create_dataset

# optionally export the results to an Athina dataset
if athina_api_key:
    # push new dataset to Athina
    dataset_name = f"replayed-conversations-{experiment_id}-{pd.Timestamp.now().strftime('%Y%m%d%H%M%S')}"
    dataset_description = f"Replayed conversations for experiment {experiment_id} at {pd.Timestamp.now()}"
    try:
        dataset = athina_create_dataset(athina_api_key=athina_api_key, dataset_name=dataset_name, dataset_description=dataset_description, dataset_rows=output_rows)
    except Exception as e:
        print(f"Failed to create Athina dataset: {e}")
    else:
        print(f"Results exported to Athina dataset {dataset.id} (name: {dataset_name}).")

Results exported to Athina dataset 3fcc1d12-10b2-4a27-95e9-729005248455 (name: replayed-conversations-f721dce8-1e6e-4aff-a7ab-81459b255ed8-20240905142035).
