In [1]:
import examples, core 
import os
import yaml
import dspy
import metrics

In [2]:
with open(os.path.join("..", "keys.yaml"), "r") as file:
    config = yaml.safe_load(file)
    openai_api_key = config["openai_api_key"]

llm = dspy.OpenAI(model='gpt-4o', api_key=openai_api_key, max_tokens=2000)

In [3]:
data = examples.load_examples("examples.json")

exp_metrics = metrics.Metrics(
            [
                metrics.accuracy,
                metrics.fluency,
                metrics.completeness,
                metrics.conciseness,
                metrics.context_awareness,
            ], verbose=False
        )

explingo = core.Explingo(llm=llm, context="The model predicts house prices", 
                         examples=data, metric=exp_metrics)

# BASIC PROMPTING

TODO: update Explingo's prompting method to take in multiple different prompt options. 
DSPy does not support this by default, but we are using DSPy's evaluations

In [4]:
test_explanations = [d.explanation for d in data]
test_explanation_format = data[0].explanation_format

explingo.run_experiment(test_explanations, test_explanation_format, prompt_type="basic", max_iters=5)

(7.212000000000001,
 accuracy             2.000
 fluency              1.000
 completeness         2.000
 conciseness          0.212
 context_awareness    2.000
 dtype: float64)

In [6]:
explingo.run_experiment(test_explanations, test_explanation_format, prompt_type="few-shot", max_iters=5)

(6.859999999999999,
 accuracy             2.00
 fluency              1.00
 completeness         2.00
 conciseness          0.26
 context_awareness    1.60
 dtype: float64)

In [8]:
explingo.run_experiment(test_explanations, test_explanation_format, prompt_type="bootstrap-few-shot", max_iters=5)

100%|██████████| 25/25 [00:00<00:00, 168.89it/s]


Bootstrapped 4 full traces after 25 examples in round 0.


(6.856,
 accuracy             2.000
 fluency              1.200
 completeness         1.200
 conciseness          0.456
 context_awareness    2.000
 dtype: float64)

In [9]:
llm.inspect_history(n=1)




You are helping users understand an ML model's prediction. Given an explanation and information about the model,
convert the explanation into a human-readable narrative.

---

Follow the following format.

Context: what the ML model predicts

Explanation: explanation of an ML model's prediction

Explanation Format: format the explanation is given in

Narrative: human-readable narrative version of the explanation

Rationalization: explains why given features may be relevant

---

Context: The model predicts whether a mushroom is poisonous

Explanation: (odor, none, 0.14), (gill-size, broad, 0.06), (gill-spacing, crowded, 0.06), (spore-print-color, brown, 0.03), (stalk-surface-above-ring, smooth, 0.03)

Explanation Format: SHAP feature contribution in (feature_name, feature_value, contribution) format

Narrative: The model predicts that the mushroom is poisonous primarily because it has no odor, which contributes significantly to the prediction. Additionally, the broad gill size and c

"\n\n\nYou are helping users understand an ML model's prediction. Given an explanation and information about the model,\nconvert the explanation into a human-readable narrative.\n\n---\n\nFollow the following format.\n\nContext: what the ML model predicts\n\nExplanation: explanation of an ML model's prediction\n\nExplanation Format: format the explanation is given in\n\nNarrative: human-readable narrative version of the explanation\n\nRationalization: explains why given features may be relevant\n\n---\n\nContext: The model predicts whether a mushroom is poisonous\n\nExplanation: (odor, none, 0.14), (gill-size, broad, 0.06), (gill-spacing, crowded, 0.06), (spore-print-color, brown, 0.03), (stalk-surface-above-ring, smooth, 0.03)\n\nExplanation Format: SHAP feature contribution in (feature_name, feature_value, contribution) format\n\nNarrative: The model predicts that the mushroom is poisonous primarily because it has no odor, which contributes significantly to the prediction. Additional