In this example notebook, we mine-tune an AI program's sense of humor, by giving it feedback on its jokes. Run the cells in order

In [None]:
#you need to install these
%pip install dspy
%pip install ipywidgets
%pip install IPython

In [1]:
#import
import dspy
import os
import dotenv

#setup dspy
dotenv.load_dotenv(os.path.expanduser("~/.env"))  # load OpenAI API key from .env file, or you can set directly with 
llm = dspy.OpenAI(model='gpt-4o', max_tokens=4096, temperature=0.8)
#llm = dspy.OpenAI(model='gpt-4o', max_tokens=4096, api_key="sk-") #or you can set the API key directly here
dspy.settings.configure(lm=llm)



In the next cell, you will be prompted to request a type of joke that you want. 5 jokes will be generated.

In [2]:
query = input("What kind of jokes do you want to generate?") #from interactive input
#query = "Jokes about Elvis." #alternatively write the string directly into the code

# 1) Declare with a signature, and request 5 jokes.
comedian = dspy.ChainOfThought('query -> joke', n=5)

# 2) Call with input argument.
response = comedian(query=query)

# 3) Access the outputs.
response.completions.joke


['Why did the baby elephant bring a suitcase to the zoo? Because it wanted to pack its trunk!',
 "Why did the baby kangaroo get in trouble at school? Because he couldn't stop jumping to conclusions!",
 "Why don't baby ducks tell jokes? Because they always crack up!",
 'Why did the baby kitten bring a pencil to bed? Because it wanted to draw some purr-fect dreams!',
 'Why did the baby lion always get lost? Because he couldn\'t stop "lion" around!']

Here is the human feedback part. For each joke, you must provide a rating (1-5) and give feedback.

In [3]:
def get_user_rating(joke):
    while True:
        try:
            rating = int(input(f"{joke}\n---\nPlease rate on a 5-point scale (1-5): "))
            if 1 <= rating <= 5:
                return rating  # Cast the integer to a string before returning
            else:
                print("Invalid input. Please enter a number between 1 and 5.")
        except ValueError:
            print("Invalid input. Please enter a number.")
def get_user_feedback(joke):
    feedback = input(f"{joke}\n---\nPlease provide your feedback: ")
    return feedback

history = []

# Get rating and feedback for each joke
for joke in response.completions.joke:
    print(joke)
    rating = get_user_rating(joke)
    feedback = get_user_feedback(joke)
    this_joke = {
        'query': query,
        'rationale': response.completions.rationale[response.completions.joke.index(joke)],
        'joke': joke,
        'rating': rating,
        'feedback': feedback
    }
    history.append(this_joke)


Why did the baby elephant bring a suitcase to the zoo? Because it wanted to pack its trunk!


Why did the baby kangaroo get in trouble at school? Because he couldn't stop jumping to conclusions!
Why don't baby ducks tell jokes? Because they always crack up!
Why did the baby kitten bring a pencil to bed? Because it wanted to draw some purr-fect dreams!
Why did the baby lion always get lost? Because he couldn't stop "lion" around!


the `history` variable saves everything

In [4]:
display(history)

[{'query': 'jokes about baby animals',
  'rationale': "Query: jokes about baby animals\nReasoning: Let's think step by step in order to produce the joke. We should consider a common characteristic or behavior of baby animals that can be humorously exaggerated or word-play that is easy to understand.",
  'joke': 'Why did the baby elephant bring a suitcase to the zoo? Because it wanted to pack its trunk!',
  'rating': 4,
  'feedback': 'moderately funny'},
 {'query': 'jokes about baby animals',
  'rationale': "Query: jokes about baby animals  \nReasoning: Let's think step by step in order to produce the joke. We can start by considering characteristics or behaviors of baby animals that are endearing or amusing. For example, baby animals are often cute, clumsy, curious, and have unique names that can be used cleverly in wordplay. By combining these traits, we can create a lighthearted and funny joke.",
  'joke': "Why did the baby kangaroo get in trouble at school? Because he couldn't stop 

now we convert `history` into DSPy.Example

In [5]:
from dspy import Example
history_examples = [
    Example(base=item).with_inputs('query','rating','feedback') for item in history #what should the example inputs be?
]
display(history_examples)



[Example({'query': 'jokes about baby animals', 'rationale': "Query: jokes about baby animals\nReasoning: Let's think step by step in order to produce the joke. We should consider a common characteristic or behavior of baby animals that can be humorously exaggerated or word-play that is easy to understand.", 'joke': 'Why did the baby elephant bring a suitcase to the zoo? Because it wanted to pack its trunk!', 'rating': 4, 'feedback': 'moderately funny'}) (input_keys={'rating', 'feedback', 'query'}),
 Example({'query': 'jokes about baby animals', 'rationale': "Query: jokes about baby animals  \nReasoning: Let's think step by step in order to produce the joke. We can start by considering characteristics or behaviors of baby animals that are endearing or amusing. For example, baby animals are often cute, clumsy, curious, and have unique names that can be used cleverly in wordplay. By combining these traits, we can create a lighthearted and funny joke.", 'joke': "Why did the baby kangaroo g

Here we are going to optimize the comedian to our joke preferences, using BootstrapFewShot. The quantitative rating is normalized as float 0-1, and serves as the optimization metric. the optimized program is compiled as `minetuned`

In [7]:
class ComedianModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.signature = comedian.signature
        self.predictor = dspy.ChainOfThought(self.signature)

    def forward(self, query, rating=None, feedback=None):
        result = self.predictor(query=query, rating=rating, feedback=feedback)
        return dspy.Prediction(rationale=result.rationale, joke=result.joke)

from dspy.teleprompt.bootstrap import BootstrapFewShot
minetuned = BootstrapFewShot(
    metric=lambda example, prediction, *args: float(example['rating']/5) #normalize rating to 0-1
).compile(
    student=ComedianModule(),
    trainset=history_examples,
)



 80%|████████  | 4/5 [00:00<00:00, 381.14it/s]

Bootstrapped 4 full traces after 5 examples in round 0.





optionally, we can inspect the history to see what happened. change n to go back further in the llm history.

In [None]:
llm.inspect_history(n=1)

now we can see if `minetuned` is any funnier than the base model

In [9]:
am_i_funny_now = minetuned(query=query)

display(am_i_funny_now.joke)


'Why did the baby bunny carry a pencil? Because it wanted to draw some "hare"-raising pictures!'