# Example - from TCR Data to Prompts to Analysis Results

---

This scrip helps reproduce the results in the TCR paper by performing the following steps:
* Create Prompts
* (Not included) Run the Prompts
* Parse the responses
* Analyze the results as a binary classification problem and focus on the "not discussed" class.


In [None]:
from utils.benchmark_utils import *

## Create Prompts

In this section, we convert the TCR data to prompts with predefined prompt template by different snippet length. Except for the prompt template, the settings are the same as the benchmark experiments in the TCR paper.

* [**REQUIRED**] To update the prompt template, please overwrite the `prompt_topic_relevance` function by updating the cell below. Currently it is just a placeholder without actual prompt requests.

* To change other settings, please modify cuntions in `benchmark_utils.py`.

In [None]:
# ========================================================================================================
# Overwrite the following functions with your own implementation
# ========================================================================================================

def prompt_topic_relevance(str_trans, str_topics, len_topics, quotes_style=''):
    '''
    Generate a prompt based on the template and inputs to determine the relevance of topics in a conversation transcript.
    Parameters:
        str_trans (str): The conversation transcript.
        str_topics (str): The list of topics being discussed.
        len_topics (int): The length of the topic list.
        quotes_style (str): Optional. Specifies the style of quotes to unify. Valid values are "single", "double", or an empty string. Default is an empty string.
    Returns:
        str: The generated prompt.
    '''
    prompt_template = '''HERE IS THE PROMPT TEMPATE THAT TAKES THE FOLLOWING INPUTS: {input_trans}, {input_topics}, {len_topics}. '''
    prompt_template = prompt_template.replace('\n    ', '\n')
    str_trans = unify_quotes(str_trans, style=quotes_style)
    str_prompt = prompt_template.format(input_trans=str_trans, input_topics=str_topics, len_topics=len_topics)
    return str_prompt

In [None]:
# Inputs (Folder that inlcudes the json files in the TCR data format)
input_folder = 'data\example_aug'

# Outputs (Folder to save the generated prompts in json format)
output_folder = 'tmp\prompts'
prep_folder(output_folder)
    
# Generate prompts
t_pfiles = [x for x in locate_suffix_files(input_folder, suffix='json')]
list_prompt = generate_relevance_prompts(t_pfiles)

# Write to json: one json per Line
prompt_path = write_prompts_to_json(list_prompt, output_folder)

## Run Prompts 

Run the prompts generated above by using your selected model. 

We assume all responses are stored in a single JSON file with the `prompId: response` key-value paris. In the section below, we refer to this file path as `result_path`.

An example of the results in the `result_path` file
```
{
    "ICSI_aug_addTopics_w_topics_Bed004_variation_addToics_0_300_5": "THIS IS A RANDOM LIST [1, 3, 2, 0]",
    "ICSI_aug_addTopics_w_topics_Bed009_variation_addToics_0_300_7": "THIS IS A RANDOM LIST [0, 0, 2, 1]",
}
```

## Analyze the Results

Depending on your prompts, the returned responses may have a different format. Overwrite the function `prompt_styles` to reflect how your responses are structured.

In the process below, we assume that the response contains a list of scores that has the same length as the input topic list.

In [None]:
# ========================================================================================================
# Overwrite the following functions with your own implementation
# ========================================================================================================

def prompt_styles():
    '''
    Define the response parts and style examples for parsing responses.
    Returns:
        dict: A dictionary containing the response parts as keys and their types as values.
        list or str: The style example(s) to be replaced in the responses.
    '''
    response_parts = {
        'THIS IS A RANDOM LIST': 'list',
    }
    list_to_remove = ['']
    return response_parts, list_to_remove

In [None]:
# Get ground truth data
df_gt = get_gt_data(prompt_path)
fname = os.path.split(prompt_path)[-1].replace('.json', '')

In [None]:
# Get Topic-Conversation Relevance results
result_path = 'PATH_TO_YOUR_RESULTS_JSON_FILE'
response_parts, list_to_remove = prompt_styles()
all_responses = get_raw_completions(result_path)
df_model, error_contents = parse_responses(all_responses, df_gt, list_to_remove, response_parts)
if len(error_contents) > 0:
    raise ValueError(f'To continue, handle the following error contents: {error_contents}')

In [None]:
# Analyze the results
df_all_join, list_buckets, threshold_pairs = join_results(df_gt, df_model, fname, output_folder)
df_all_join['prompt_size'] = df_all_join['promptId'].str.split('_').str[-2]
for psg, psdf in df_all_join.groupby('prompt_size'):
    for b in list_buckets:
        for p in threshold_pairs[b]:
            ptitle = ' '.join(['\n', '='*10, 'Evaluation Results for {0} snippets; classification {1}; threshold {2}'.format(psg, b, p), '='*10])
            metrics, metrics_binary = calculate_metrics(psdf, b, p)
            display_metrics(metrics, metrics_binary, title=ptitle)