# Experiment Results

This notebooks shows the results from the experiments. The results are read from the log files. Their path has to be defined in the following cell.

In [2]:
import pandas as pd
import plotly.express as px

ERRORDETECTION_CUSTOM_DATASET = pd.read_csv(
    "keep_logs/ed_customDataset/experiment-results.csv", index_col=False
)
ERRORDETECTION_PROMPTS = pd.read_csv(
    "keep_logs/ed_prompts/experiment-results.csv", index_col=False
)
MODEL_COMPARISON = pd.read_csv(
  "keep_logs/test_ed_Meta_vs_Bloke_good_prompt/experiment-results.csv", index_col=False
)

## Model Comparison

This experiment compares the perfomance of the model from theBloke and meta.

In [12]:
MODEL_COMPARISON["Runtime HH:MM:SS"] = MODEL_COMPARISON["Runtime"].apply(
    lambda x: f"1970-01-01 {x}"
)
MODEL_COMPARISON["Full Name"] = (
    MODEL_COMPARISON["Name"]
    + " - "
    + MODEL_COMPARISON["Namespace"].apply(lambda x: x.split(".")[2])
)
n_rows = MODEL_COMPARISON["Number of Rows"][0]

df_bloke = MODEL_COMPARISON[MODEL_COMPARISON["Namespace"].str.startswith("Bloke")]
df_meta = MODEL_COMPARISON[MODEL_COMPARISON["Namespace"].str.startswith("Meta")]

for name, df in {"Meta": df_meta, "TheBloke": df_bloke}.items():
  fig = px.box(
      df,
      y="Runtime HH:MM:SS",
      x="Full Name",
      title=f"Runtime with {n_rows} rows for {name}",
  )
  fig.update_yaxes(tickformat="%H:%M:%S", range=["1970-01-01 00:00:00", df["Runtime HH:MM:SS"].max()])
  fig.show()
  fig = px.box(
      df, y="F1", x="Full Name", title=f"F1 with {n_rows} rows for {name}"
  )
  fig.update_yaxes(range=[0, 0.5])
  fig.show()

## Different prompts

This experiment tests the performance of the model when detecting errors with different prompts.

In [None]:
ERRORDETECTION_PROMPTS["Runtime HH:MM:SS"] = ERRORDETECTION_PROMPTS["Runtime"].apply(
    lambda x: f"1970-01-01 {x}"
)
ERRORDETECTION_PROMPTS["Full Name"] = (
    ERRORDETECTION_PROMPTS["Name"]
    + " - "
    + ERRORDETECTION_PROMPTS["Namespace"].apply(lambda x: x.split(".")[1])
)
n_rows = ERRORDETECTION_PROMPTS["Number of Rows"][0]

prompts_df = ERRORDETECTION_PROMPTS[["Name", "Prompt"]].drop_duplicates()
for index in prompts_df.index:
    print("Name:  ", prompts_df["Name"][index])
    print("Prompt:", prompts_df["Prompt"][index])
    print("-------------")

fig = px.box(
    ERRORDETECTION_PROMPTS,
    y="Runtime HH:MM:SS",
    x="Full Name",
    title=f"Runtime with {n_rows} rows",
)
fig.update_yaxes(tickformat="%H:%M:%S")
fig.show()
fig = px.box(
    ERRORDETECTION_PROMPTS, y="F1", x="Full Name", title=f"F1 for {n_rows} rows"
)
fig.show()

Name:   Answer either yes or no
Prompt: You are a helpful assistant who is great in finding errors in tabular data. You always answer in a single word, either yes or no.\n\nQ: Is there an error in {attr}?\n{context}\n\nA:
-------------
Name:   Answer in a single word
Prompt: You are a helpful assistant who is great in finding errors in tabular data. You always answer in a single word.\n\nQ: Is there an error in {attr}?\n{context}\n\nA:
-------------
Name:   Precise and short
Prompt: You are a helpful assistant who is great in finding errors in tabular data. You answer as precise and short as possible.\n\nQ: Is there an error in {attr}?\n{context}\n\nA:
-------------
Name:   Deep breath
Prompt: You are a helpful assistant who is great in finding errors in tabular data. Take a deep breath and than answer the following question.\n\nQ: Is there an error in {attr}?\n{context}\n\nA:
-------------
Name:   No prompt introduction
Prompt: Q: Is there an error in {attr}?\n{context}\n\nA:
--------

## Custom Dataset

This experiment tests the performance of the model when detecting errors on a custom dataset.

In [None]:
ERRORDETECTION_CUSTOM_DATASET["Runtime HH:MM:SS"] = ERRORDETECTION_CUSTOM_DATASET[
    "Runtime"
].apply(lambda x: f"1970-01-01 {x}")
ERRORDETECTION_CUSTOM_DATASET["Full Name"] = (
    ERRORDETECTION_CUSTOM_DATASET["Name"]
    + " - "
    + ERRORDETECTION_CUSTOM_DATASET["Namespace"].apply(lambda x: x.split(".")[1])
)
n_rows = ERRORDETECTION_CUSTOM_DATASET["Number of Rows"][0]
fig = px.box(
    ERRORDETECTION_CUSTOM_DATASET,
    y="Runtime HH:MM:SS",
    x="Full Name",
    title=f"Runtime with {n_rows} rows",
)
fig.update_yaxes(tickformat="%H:%M:%S")
fig.show()
fig = px.box(
    ERRORDETECTION_CUSTOM_DATASET, y="F1", x="Full Name", title=f"F1 for {n_rows} rows"
)
fig.show()