# Semantic Similarity Example

## Setup imports and API keys

First, we'll need to set our API keys. If we are in DEBUG mode, we don't need to use real OpenAI or Hegel AI API keys, so for now we'll set them to empty strings.

In [1]:
import os
os.environ['DEBUG']="1"
os.environ['HEGELAI_API_KEY'] = ""
os.environ['OPENAI_API_KEY'] = ""

Then we'll import the relevant `prompttools` modules to setup our experiment.

In [2]:
from typing import Dict, List
from prompttools.experiment.openai_completion_experiment import (
    OpenAICompletionExperiment,
)
from prompttools.harness.prompt_template_harness import (
    PromptTemplateExperimentationHarness,
)

## Run experiments

Next, we create our test inputs. For this example, we'll use a prompt template, which uses [jinja](https://jinja.palletsprojects.com/en/3.1.x/) for templating.

In [3]:
prompt_templates = ["Echo the following input: {{input}}", "Repeat the following input: {{input}}"]
user_inputs = [{"input": "This is a test"}, {"input": "This is not a test"}]

Now we can define an experimentation harness for our inputs and model. We could also pass model arguments if, for example, we wanted to change the model temperature.

In [4]:
harness = PromptTemplateExperimentationHarness("gpt-3.5-turbo", prompt_templates, user_inputs)

We can then run the experiment to get results.

In [5]:
harness.prepare()
harness.run()
harness.visualize()

Unnamed: 0,messages,response(s),latency
0,Echo the following input: This is a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],4e-06
1,Echo the following input: This is not a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],2e-06
2,Repeat the following input: This is a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],2e-06
3,Repeat the following input: This is not a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],2e-06


You can use the `pivot` keyword argument to view results by the template and inputs that created them.

In [6]:
harness.visualize(pivot=True)

prompt_template,Echo the following input: {{input}},Repeat the following input: {{input}}
user_input,Unnamed: 1_level_1,Unnamed: 2_level_1
{'input': 'This is a test'},[\n\nThe Los Angeles Dodgers won the World Series in 2020],[\n\nThe Los Angeles Dodgers won the World Series in 2020]
{'input': 'This is not a test'},[\n\nThe Los Angeles Dodgers won the World Series in 2020],[\n\nThe Los Angeles Dodgers won the World Series in 2020]


## Evaluate the model response

To evaluate the results, we'll define an eval function. Since we are prompting the model to echo our input, we can use semantic distance to check if the model's response is similar to the user input.

In [7]:
from typing import Dict, Tuple
import chromadb
chroma_client = chromadb.Client()


def extract_responses(output) -> str:
    return [choice["text"] for choice in output["choices"]]


# Define an evaluation function that assigns scores to each inference
def check_similarity(input_pair: Tuple[str,Dict[str,str]], results: Dict, metadata: Dict) -> float:
    collection = chroma_client.create_collection(name="test_collection")
    collection.add(
        documents=[dict(input_pair[1])['input']],
        ids=["id1"]
    )
    query_results = collection.query(
        query_texts=extract_responses(results),
        n_results=1
    )
    chroma_client.delete_collection("test_collection")
    return min(query_results['distances'])[0]
    


[2023-07-04 14:12:01,876] INFO in posthog: Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
[2023-07-04 14:12:01,897] INFO in ctypes: Successfully imported ClickHouse Connect C data optimizations
[2023-07-04 14:12:01,899] INFO in ctypes: Successfully import ClickHouse Connect C/Numpy optimizations
[2023-07-04 14:12:01,906] INFO in json_impl: Using python library for writing JSON byte strings


Let's test our similarity function.

In [8]:
check_similarity((prompt_templates[0], user_inputs[0]), {"choices": [{"text": "This is a test"}, {"text": "This is a text"}]}, {})



0.0

Finally, we can evaluate and visualize the results.

In [9]:
harness.evaluate("did_echo", check_similarity, use_input_pairs=True)
harness.visualize()



Unnamed: 0,messages,response(s),latency,did_echo
0,Echo the following input: This is a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],4e-06,1.893674
1,Echo the following input: This is not a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],2e-06,1.872977
2,Repeat the following input: This is a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],2e-06,1.893674
3,Repeat the following input: This is not a test,[\n\nThe Los Angeles Dodgers won the World Series in 2020],2e-06,1.872977
