<a href="https://colab.research.google.com/github/edgarbc/LLM_optimizer/blob/main/DSPy_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial to demonstrate how to use DSPy for prompt engineer.

By Edgar Bermudez - github: edgarbc

Based on https://learnbybuilding.ai/tutorials/a-gentle-introduction-to-dspy



June, 2024

In [6]:
!pip install dspy
!pip install openai

Collecting dspy
  Downloading dspy-0.1.5-py3-none-any.whl (1.3 kB)
Collecting dspy-ai==2.4.5 (from dspy)
  Downloading dspy_ai-2.4.5-py3-none-any.whl (197 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m197.5/197.5 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting backoff~=2.2.1 (from dspy-ai==2.4.5->dspy)
  Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Collecting joblib~=1.3.2 (from dspy-ai==2.4.5->dspy)
  Downloading joblib-1.3.2-py3-none-any.whl (302 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.2/302.2 kB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai<2.0.0,>=0.28.1 (from dspy-ai==2.4.5->dspy)
  Downloading openai-1.35.13-py3-none-any.whl (328 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.5/328.5 kB[0m [31m30.9 MB/s[0m eta [36m0:00:00[0m
Collecting ujson (from dspy-ai==2.4.5->dspy)
  Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (

In [10]:
# installs
import requests
from bs4 import BeautifulSoup
import dspy

Load parameters. Google colab now allows you to define secrets and load them. Choose the key symbol from the navigation bar to the left.

In [1]:
from google.colab import userdata
import os
api_key = userdata.get('OPENAI_API_KEY')

# Set the API key as an environment variable
os.environ["OPENAI_API_KEY"] = api_key

# 2. Data extraction
Get some grug brain developer text from html using beautifulsoup. Check https://grugbrain.dev

In [3]:
res = requests.get("https://grugbrain.dev/")
soup = BeautifulSoup(res.text, 'html.parser')
raw_text = [p.text for p in soup.find_all('p') if p.text]

In [4]:
# show some raw text examples
raw_text[:5]

['this collection of thoughts on software development gathered by grug brain developer',
 'grug brain developer not so smart, but grug brain developer program many long year and learn some things\nalthough mostly still confused',
 'grug brain developer try collect learns into small, easily digestible and funny page, not only for you, the young grug, but also for him\nbecause as grug brain developer get older he forget important things, like what had for breakfast or if put pants on',
 'big brained developers are many, and some not expected to like this, make sour face',
 'THINK they are big brained developers many, many more, and more even definitely probably maybe not like this, many\nsour face (such is internet)']

# 3. LLM setup
Setup the chatgpt 3.5 and the class to handle data and translate grudge language into plain English.

In [7]:
from openai import OpenAI
# initialize an openai client

client = OpenAI()
openai_model_name= "gpt-3.5-turbo"

class BuildMessages:
    def __init__(self, system_prompt, user_prompt):
        self.system_prompt = system_prompt
        self.user_prompt = user_prompt
    def render(self, **kwargs):
        sys = self.system_prompt.format(**kwargs)
        user = self.user_prompt.format(**kwargs)
        return [
            {"role":"system", "content":sys},
            {"role":"user", "content":user},
        ]
from functools import cache
@cache
def translate_grug(grug_text):
    prompt = BuildMessages(
    "You are an expert in deciphering strange text. The user will provide text written by someone named Grug and you will provide the translation.",
    """Translate the following text into plain english: '{text}'.

    Do not respond with any other text. Only provide that text. Now take a deep breath and begin."""
)
    result = client.chat.completions.create(messages=prompt.render(text=grug_text), model=openai_model_name)
    return result.choices[0].message.content

Now translate 10 examples using the translating function

In [8]:
dataset = []
for grug_text in raw_text[:10]:
    translated = translate_grug(grug_text)
    dataset.append({"grug_text":grug_text, "plain_english":translated})

Now construct DSPy examples using the translated dataset above.

In [11]:
examples = []
for row in dataset:
    examples.append(dspy.Example(grug_text=row["grug_text"], plain_english=row["plain_english"]).with_inputs("plain_english"))

In [12]:
print(examples)

[Example({'grug_text': 'this collection of thoughts on software development gathered by grug brain developer', 'plain_english': 'Grug is a software developer who has collected thoughts on software development.'}) (input_keys={'plain_english'}), Example({'grug_text': 'grug brain developer not so smart, but grug brain developer program many long year and learn some things\nalthough mostly still confused', 'plain_english': 'Grug is not very smart, but Grug has been developing programs for many years and has learned some things. However, Grug is still mostly confused.'}) (input_keys={'plain_english'}), Example({'grug_text': 'grug brain developer try collect learns into small, easily digestible and funny page, not only for you, the young grug, but also for him\nbecause as grug brain developer get older he forget important things, like what had for breakfast or if put pants on', 'plain_english': 'Grug, as a brain developer, tries to collect learning into small, easily digestible, and funny p

# 4. Training dataset preparation

In [14]:
import numpy as np
from random import shuffle
def split_for_train_test(values, test_size = 1/3.0):
    shuffle(values)
    train = int(len(values)-test_size*len(values))
    print(train)
    return values[:train], values[train:]
train, test = split_for_train_test(examples)

6


In [15]:
train[0]

Example({'grug_text': '(note: grug once think big brained but learn hard way)', 'plain_english': 'Grug used to think he was very intelligent, but he learned the hard way that he was not as smart as he thought.'}) (input_keys={'plain_english'})

# 5. DSPy setup
Define signatures for the translation task

In [16]:
import dspy
class GrugTranslation(dspy.Signature):
    "Translate plain english to Grug text."
    plain_english = dspy.InputField()
    grug_text = dspy.OutputField()

In [18]:
# define the model
turbo = dspy.OpenAI(model='gpt-3.5-turbo', max_tokens=1000)
# config settings
dspy.settings.configure(lm=turbo)
# define the signature for the grug translation function
from dspy.signatures.signature import signature_to_template
grug_translation_as_template = signature_to_template(GrugTranslation)


In [21]:
print(f"DSPy template: {str(grug_translation_as_template)}")
print(f"Translation test: {grug_translation_as_template.query(examples[0])}")


DSPy template: Template(Translate plain english to Grug text., ['Plain English:', 'Grug Text:'])
Translation test: Plain English: Grug used to think he was very intelligent, but he learned the hard way that he was not as smart as he thought.
Grug Text: (note: grug once think big brained but learn hard way)


Now lets see the signature in the class (inherited from DSPy signature)

In [22]:
GrugTranslation.signature
GrugTranslation.with_instructions

Define the DSPy module to carry out the prompt engineering technique

In [23]:
class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought(GrugTranslation)

    def forward(self, plain_english):
        return self.prog(plain_english=plain_english)
c = CoT()

In [24]:
c.forward("You should not construct complex systems.")

Prediction(
    rationale='produce the grug_text. We want to simplify things and avoid unnecessary complications.',
    grug_text='You no make big big thing.'
)

# 6. Metrics
Define readibility index

In [25]:
# https://apps.dtic.mil/sti/tr/pdf/AD0667273.pdf
def automated_readability_index(text):
    import re
    characters = len(re.sub(r'\s+', '', text)) # Count characters (ignoring whitespace)
    words = len(text.split()) # Count words by splitting the text
    # Count sentences by finding period, exclamation, or question mark
    sentences = len(re.findall(r'[.!?\n]', text))
    # small change is to add a new line character as grug doesn't seem to use punctuation.
    if words == 0 or sentences == 0:  # Prevent division by zero
        return 0
    # Calculate the Automated Readability Index (ARI)
    ari = (4.71 * (characters / words)) + (0.5 * (words / sentences)) - 21.43

    return round(ari, 2)

Test the automated readibility index (ARI) on the examples.

In [30]:
print("    Eng    \tGrug")
for ex in examples:
    source_ari = automated_readability_index(ex.plain_english)
    grug_ari = automated_readability_index(ex.grug_text)
    print(f"ARI {source_ari} \t=> {grug_ari}")

    Eng    	Grug
ARI 8.3 	=> 0
ARI 8.78 	=> -3.95
ARI 8.33 	=> 0
ARI 11.65 	=> 0
ARI 8.7 	=> 22.95
ARI 12.69 	=> 0
ARI 10.59 	=> 14.62
ARI 9.81 	=> 14.12
ARI 14.04 	=> 0
ARI 7.62 	=> 13.98


LLM-as-a-judge for automated evaluation

In [31]:
# https://dspy-docs.vercel.app/docs/building-blocks/metrics#intermediate-using-ai-feedback-for-your-metric
class AssessBasedOnQuestion(dspy.Signature):
    """Given the assessed text provide a yes or no to the assessment question."""
    assessed_text = dspy.InputField(format=str)
    assessment_question = dspy.InputField(format=str)
    assessment_answer = dspy.OutputField(desc="Yes or No")

In [32]:
example_question_assessment = dspy.Example(assessed_text="This is a test.", assessment_question="Is this a test?", assessment_answer="Yes").with_inputs("assessed_text", "assessment_question")
print(signature_to_template(AssessBasedOnQuestion).query(example_question_assessment))

Assessed Text: This is a test.
Assessment Question: Is this a test?
Assessment Answer: Yes


The judge LLM will be GPT4.
The judge will decide whether the meaning of the translated and original texts are the same (similarity metric).


In [33]:
gpt4T = dspy.OpenAI(model='gpt-4-turbo', max_tokens=500)
def similarity_metric(truth, pred, trace=None):
    truth_grug_text = truth.grug_text
    proposed_grug_text = pred.grug_text
    similarity_question = f"""Does the assessed text have the same meaning as the gold_standard text provided?
Gold Standard: "{truth_grug_text}"
Provide only a yes or no answer."""
    with dspy.context(lm=gpt4T):
        assessor = dspy.Predict(AssessBasedOnQuestion)
        raw_similarity_result = assessor(assessed_text=proposed_grug_text, assessment_question=similarity_question)
    print(raw_similarity_result) # for debugging
    raw_similarity = raw_similarity_result.assessment_answer.lower().strip()
    same_meaning = raw_similarity == 'yes'
    return same_meaning

In [34]:
def ari_metric(truth, pred, trace=None):
    truth_grug_text = truth.grug_text
    proposed_grug_text = pred.grug_text

    gold_ari = automated_readability_index(truth_grug_text)
    pred_ari = automated_readability_index(proposed_grug_text)
    print(f"ARI {gold_ari} => {pred_ari}")
    ari_result = pred_ari <= 7.01
    return ari_result

Define the overall metric, if the meaning of the translated and source are similar and the automated readibility index is below 7.0.

In [35]:
def overall_metric(provided_example, predicted, trace=None):
    similarity = similarity_metric(provided_example, predicted, trace)
    ari = ari_metric(provided_example, predicted, trace)
    if similarity and ari:
        return True
    return False

# 7. Prompt optimization


In [36]:
# run optimization
from dspy.teleprompt import BootstrapFewShot
config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
optimizer = BootstrapFewShot(metric=overall_metric, **config)
optimizer.max_errors = 1 # helpful to debug errors faster
optimized_cot = optimizer.compile(CoT(), trainset=train, valset=test)


 17%|█▋        | 1/6 [00:04<00:22,  4.51s/it]

Prediction(
    assessment_answer='Assessed Text: grug think smart, but find out not as smart as think\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "(note: grug once think big brained but learn hard way)"\nAssessment Answer: Yes'
)
ARI 0 => 0


 33%|███▎      | 2/6 [00:06<00:13,  3.29s/it]

Prediction(
    assessment_answer='Assessed Text: That text appears to already be in plain English and does not require translation.\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "is fine!"\nAssessment Answer: Yes'
)
ARI -3.95 => 8.78


 50%|█████     | 3/6 [00:09<00:08,  2.92s/it]

Prediction(
    assessment_answer='Assessed Text: "Hard stuff not good."\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "complexity bad"\nAssessment Answer: Yes'
)
ARI 0 => 2.94


 67%|██████▋   | 4/6 [00:13<00:06,  3.25s/it]

Prediction(
    assessment_answer='Assessed Text: Grug is coder who gather thinkings on code making.\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "this collection of thoughts on software development gathered by grug brain developer"\nAssessment Answer: Yes'
)
ARI 0 => 5.05


 83%|████████▎ | 5/6 [00:19<00:04,  4.35s/it]

Prediction(
    assessment_answer='Assessed Text: grug, brain developer, try collect learning into small, easy funny pages. not just for young grug, but for him too. as grug get older, forget important things like breakfast or pants.\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "grug brain developer try collect learns into small, easily digestible and funny page, not only for you, the young grug, but also for him because as grug brain developer get older he forget important things, like what had for breakfast or if put pants on"\nAssessment Answer: Yes'
)
ARI 22.95 => 6.98


100%|██████████| 6/6 [00:20<00:00,  3.48s/it]

Prediction(
    assessment_answer='Assessment Answer: Yes'
)
ARI 0 => 0
Bootstrapped 0 full traces after 6 examples in round 0.





In [37]:
# output evaluation
from dspy.evaluate import Evaluate
individual_metrics = [similarity_metric, ari_metric]
for metric in individual_metrics:
    evaluate = Evaluate(metric=metric, devset=train, num_threads=1, display_progress=True, display_table=5)
    evaluate(optimized_cot)

Average Metric: 0 / 1  (0.0):  17%|█▋        | 1/6 [00:03<00:19,  3.92s/it]

Prediction(
    assessment_answer='Assessed Text: grug used to think big brain, but learn hard way not as smart as think\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "(note: grug once think big brained but learn hard way)"\nAssessment Answer: Yes'
)


Average Metric: 0 / 2  (0.0):  33%|███▎      | 2/6 [00:04<00:08,  2.03s/it]

Prediction(
    assessment_answer='Assessed Text: That text appears to already be in plain English and does not require translation.\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "is fine!"\nAssessment Answer: Yes'
)


Average Metric: 0 / 3  (0.0):  50%|█████     | 3/6 [00:07<00:06,  2.19s/it]

Prediction(
    assessment_answer='Assessed Text: "Complexity bad."\nAssessment Question: Does the assessed text have the same meaning as the gold_standard text provided?\nGold Standard: "complexity bad"\nAssessment Answer: Yes'
)


Average Metric: 0 / 4  (0.0):  67%|██████▋   | 4/6 [00:08<00:03,  1.76s/it]

Prediction(
    assessment_answer='Assessment Answer: Yes'
)


Average Metric: 1 / 5  (20.0):  83%|████████▎ | 5/6 [00:09<00:01,  1.77s/it]

Prediction(
    assessment_answer='Yes'
)


Average Metric: 1 / 6  (16.7): 100%|██████████| 6/6 [00:11<00:00,  1.88s/it]

Prediction(
    assessment_answer='Assessment Answer: Yes'
)
Average Metric: 1 / 6  (16.7%)





Unnamed: 0,example_grug_text,plain_english,rationale,pred_grug_text,similarity_metric
0,(note: grug once think big brained but learn hard way),"Grug used to think he was very intelligent, but he learned the hard way that he was not as smart as he thought.",realize that Grug is not as smart as he thought. We...,"grug used to think big brain, but learn hard way not as smart as think",False
1,is fine!,That text appears to already be in plain English and does not require translation.,not translate this text. We simply leave it as is.,That text appears to already be in plain English and does not require translation.,False
2,complexity bad,"""Complexity is bad.""",produce the Grug text. We need to simplify this statement and convey it in a straightforward manner.,"""Complexity bad.""",False
3,this collection of thoughts on software development gathered by grug brain developer,Grug is a software developer who has collected thoughts on software development.,produce the grug_text. We will break down the sentence and translate each part into Grug text.,grug software developer collect thought on software developer,False
4,"grug brain developer try collect learns into small, easily digestible and funny page, not only for you, the young grug, but also for him because...","Grug, as a brain developer, tries to collect learning into small, easily digestible, and funny pages. These are not only for you, the young Grug,...",produce the grug_text. We will break down the sentence and translate each part into Grug text.,"grug brain developer try collect learning into small, easy digest, funny page. not only for you, young grug, but also for him. as grug, brain...",✔️ [True]


Average Metric: 3 / 6  (50.0): 100%|██████████| 6/6 [00:00<00:00, 289.54it/s]

ARI 0 => 0
ARI -3.95 => 8.78
ARI 0 => 17.25
ARI 0 => 0
ARI 22.95 => 10.78
ARI 0 => 0
Average Metric: 3 / 6  (50.0%)





Unnamed: 0,example_grug_text,plain_english,rationale,pred_grug_text,ari_metric
0,(note: grug once think big brained but learn hard way),"Grug used to think he was very intelligent, but he learned the hard way that he was not as smart as he thought.",realize that Grug is not as smart as he thought. We...,"grug used to think big brain, but learn hard way not as smart as think",✔️ [True]
1,is fine!,That text appears to already be in plain English and does not require translation.,not translate this text. We simply leave it as is.,That text appears to already be in plain English and does not require translation.,False
2,complexity bad,"""Complexity is bad.""",produce the Grug text. We need to simplify this statement and convey it in a straightforward manner.,"""Complexity bad.""",False
3,this collection of thoughts on software development gathered by grug brain developer,Grug is a software developer who has collected thoughts on software development.,produce the grug_text. We will break down the sentence and translate each part into Grug text.,grug software developer collect thought on software developer,✔️ [True]
4,"grug brain developer try collect learns into small, easily digestible and funny page, not only for you, the young grug, but also for him because...","Grug, as a brain developer, tries to collect learning into small, easily digestible, and funny pages. These are not only for you, the young Grug,...",produce the grug_text. We will break down the sentence and translate each part into Grug text.,"grug brain developer try collect learning into small, easy digest, funny page. not only for you, young grug, but also for him. as grug, brain...",False


In [38]:
# have a look
optimized_cot.forward("You should not construct complex systems.")

Prediction(
    rationale='avoid confusion and keep things simple for Grug to understand.',
    grug_text='no build complex system'
)

In [39]:
# save the optimized model
optimized_cot.save(path="/tmp/model.json")


SyntaxError: invalid syntax (<ipython-input-3-4113a1f1792d>, line 3)

## Summary

Summary about prompt engineering using Dspy.