# A Practical Intro to DSPy

### Basics:

In [1]:
import dspy

llm = dspy.LM(model = "openai/gpt-4o-mini")
dspy.configure(lm=llm)

In [None]:
# Defining the module with a simple signature: "question -> answer: int"
simple_model = dspy.Predict("question -> answer: int")

In [3]:
simple_model(
  question="""I have 5 different balls and I randomly select 4. 
    How many possible combinations of the balls I can get?"""
)

Prediction(
    answer=5
)

In [4]:
# Prints the history of the model's interactions, including the prompt, input question and the generated answer.
dspy.inspect_history(n = 1)





[34m[2026-02-02T20:02:13.820693][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str):
Your output fields are:
1. `answer` (int):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## answer ## ]]
{answer}        # note: the value you produce must be a single int value

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
I have 5 different balls and I randomly select 4. 
    How many possible combinations of the balls I can get?

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]` (must be formatted as a valid Python int), and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## answer ## ]]
5

[[ ## completed ## ]][0m







### Adding Structured Output and enabling reasoning through ChainOfThought module:

In [None]:
# You can also configure the adapter to get the output in a specific format. For example, if you want the output in JSON format, you can use the JSONAdapter.
dspy.configure(adapter = dspy.JSONAdapter())

In [6]:
# Chain of Thought (CoT) allows the model to generate intermediate reasoning steps before arriving at the final answer. This can often lead to more accurate and explainable results, especially for complex problems.
cot_model = dspy.ChainOfThought("question -> answer: int")
cot_model(question="""I have 5 different balls and I randomly select 4. How many possible combinations of the balls I can get?""")

Prediction(
    reasoning='To find the number of combinations of selecting 4 balls from 5 different balls, we use the combination formula C(n, k) = n! / (k! * (n - k)!), where n is the total number of items to choose from (5 balls) and k is the number of items to choose (4 balls). So, C(5, 4) = 5! / (4! * (5 - 4)!) = 5! / (4! * 1!) = (5 * 4!) / (4! * 1) = 5. There are 5 possible combinations of selecting 4 balls from 5.',
    answer=5
)

In [7]:
# This will show you the full trajectory of the model's reasoning, including the final answer.
dspy.inspect_history(n = 1)





[34m[2026-02-02T20:02:17.139531][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `answer` (int):
All interactions will be structured in the following way, with the appropriate values filled in.

Inputs will have the following structure:

[[ ## question ## ]]
{question}

Outputs will be a JSON object with the following fields.

{
  "reasoning": "{reasoning}",
  "answer": "{answer}        # note: the value you produce must be a single int value"
}
In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
I have 5 different balls and I randomly select 4. How many possible combinations of the balls I can get?

Respond with a JSON object in the following order of fields: `reasoning`, then `answer` (must be formatted as a valid Python int).


[31mResponse:[0m

[32m{
  "reasoning": "To find the number of 

In [8]:
print(cot_model(question="""I have 25 different balls and I randomly select 9. 
  How many possible combinations of the balls I can get?"""))

Prediction(
    reasoning='To find the number of combinations of selecting 9 balls from 25, we can use the combination formula which is given by C(n, k) = n! / (k! * (n - k)!), where n is the total number of items to choose from and k is the number of items to choose. In this case, n = 25 and k = 9. So, we calculate C(25, 9) = 25! / (9! * (25 - 9)!) = 25! / (9! * 16!). Using this formula and performing the calculations, we find that the number of combinations is 2042975.',
    answer=2042975
)


### Configuring MLflow for tracing and logging:

In [9]:
import mlflow

# Tell MLflow about the server URI.
mlflow.set_tracking_uri("http://127.0.0.1:5000")

# Create a unique name for your experiment.
mlflow.set_experiment("DSPy")
mlflow.dspy.autolog()

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Creating a REPL (Read-Eval-Print Loop) tool to evaluate mathematical expressions using Python.
from dspy import PythonInterpreter
import math

def evaluate_math(expr: str) -> str:
  # Executes Python and returns the output as string
  with PythonInterpreter() as interp:
    return interp(expr)

In [14]:
# Using ReAct module which lets the model decide when to use tools (like our evaluate_math function) during its reasoning process. The model will generate a plan, and if it decides that it needs to evaluate a mathematical expression, it will call the evaluate_math function with the appropriate expression.
react_model = dspy.ReAct(
  signature="question -> answer: int", 
  tools=[evaluate_math]
)

response = react_model(question="""I have 25 different balls and I randomly 
  select 9. How many possible combinations of the balls I can get?""")

print(response.answer) 

2042975


In [15]:
response.trajectory

{'thought_0': 'To find the number of combinations of 25 balls taken 9 at a time, I can use the combination formula C(n, k) = n! / (k!(n - k)!), where n is the total number of items and k is the number of items to choose. In this case, n is 25 and k is 9.',
 'tool_name_0': 'evaluate_math',
 'tool_args_0': {'expr': 'math.comb(25, 9)'},
 'observation_0': 'Execution error in evaluate_math: \nTraceback (most recent call last):\n  File "/Users/hariharansampathkumar/Documents/projects/ai_engineering/Prompting/meta_prompting/.venv/lib/python3.12/site-packages/dspy/predict/react.py", line 111, in forward\n    trajectory[f"observation_{idx}"] = self.tools[pred.next_tool_name](**pred.next_tool_args)\n                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/Users/hariharansampathkumar/Documents/projects/ai_engineering/Prompting/meta_prompting/.venv/lib/python3.12/site-packages/dspy/utils/callback.py", line 343, in sync_wrapper\n    raise exception\n  

### Defining and using a complex Signature using Class:

In [None]:
# Defining a custom signature using Classes and type annotations. This allows you to create more complex and structured inputs and outputs for your models.
from typing import Literal, List

class NPSTopic(dspy.Signature):
  """Classify NPS topics"""

  comment: str = dspy.InputField()
  answer: List[Literal['Slow or Unreliable Shipping', 
    'Inaccurate Product Descriptions or Photos', 
    'Limited Size or Shade Availability', 'Difficult Product Discovery',
    'Unresponsive or Generic Customer Support', 
    'Website or App Bugs', 'Confusing Loyalty or Discount Systems', 
    'Complicated Returns or Exchanges', 'Customs and Import Charges', 
    'Damaged or Incorrect Items']] = dspy.OutputField()

In [17]:
nps_topic_model = dspy.ChainOfThought(NPSTopic)

In [18]:
response = nps_topic_model(
  comment = """Absolutely frustrated! Every time I find something I love, 
    it's sold out in my size. What's the point of having a wishlist 
    if nothing is ever available?""")

print(response.answer)

['Limited Size or Shade Availability']


### Optimizing Few-shot examples for the Prompt using DSPy Optimizer: (when the answers are numeric/quantifiable)

The below example helps understand how to use DSPy optimizers to select the best few-shot examples for our Prompt.

This example uses `BootstrapFewShot` optimizer from the DSPy library to select the best examples with reasoning traces which can improve the prompt performance.

In [20]:
import dspy

# 1. SETUP: Configure your Language Model (The "Worker")
# Replace with your actual API Key
lm = dspy.LM("openai/gpt-5-mini")
dspy.configure(lm=lm)

# 2. DEFINE SIGNATURE: The "Interface" (What you want, not how to do it)
# We tell DSPy: "Input is a question, Output is the answer."
class MathTutor(dspy.Signature):
    """Answer math questions with a clear final number."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="The final numerical answer")

# 3. DEFINE MODULE: The "Logic"
# We wrap our signature in 'ChainOfThought'. 
# This tells DSPy: "Don't just guess. Force the model to think step-by-step."
math_module = dspy.ChainOfThought(MathTutor)

# 4. DATA: The "Training Set" (Small examples to teach the model)
# DSPy uses these to learn what a "good" interaction looks like.
train_data = [
    dspy.Example(question="What is 10 + 20?", answer="30").with_inputs("question"),
    dspy.Example(question="If I have 5 apples and eat 2, how many left?", answer="3").with_inputs("question"),
    dspy.Example(question="What is 100 divided by 4?", answer="25").with_inputs("question"),
]

# 5. THE META-PROMPTING STEP (The "Compiler") ðŸš€
# This is where the magic happens. We use 'BootstrapFewShot'.
# It acts as a teacher: it runs the examples, sees how the model thinks, 
# and saves the BEST "reasoning paths" into the final prompt.
from dspy.teleprompt import BootstrapFewShot

# Define a simple success metric (Exact Match)
def validate_answer(example, prediction, trace=None):
    return example.answer == prediction.answer

# The Compiler
teleprompter = BootstrapFewShot(metric=validate_answer, max_bootstrapped_demos=3)

print("Compiling (Optimizing) the prompt... this takes a few seconds...")
compiled_math_tutor = teleprompter.compile(math_module, trainset=train_data)

# 6. RUN IT
# Now we use the "Compiled" version.
my_question = "If I buy 3 packs of 6 sodas, how many sodas do I have?"
pred = compiled_math_tutor(question=my_question)

print(f"\nQuestion: {my_question}")
print(f"Reasoning (Generated by DSPy): {pred.reasoning}")
print(f"Final Answer: {pred.answer}")

# Optional: Inspect the actual prompt DSPy wrote for you
lm.inspect_history(n=1)

Compiling (Optimizing) the prompt... this takes a few seconds...


100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 3/3 [00:15<00:00,  5.23s/it]


Bootstrapped 3 full traces after 2 examples for up to 1 rounds, amounting to 3 attempts.





Question: If I buy 3 packs of 6 sodas, how many sodas do I have?
Reasoning (Generated by DSPy): Multiply number of packs by sodas per pack: 3 Ã— 6 = 18.
Final Answer: 18




[34m[2026-02-04T00:27:55.197271][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `answer` (str): The final numerical answer
All interactions will be structured in the following way, with the appropriate values filled in.

Inputs will have the following structure:

[[ ## question ## ]]
{question}

Outputs will be a JSON object with the following fields.

{
  "reasoning": "{reasoning}",
  "answer": "{answer}"
}
In adhering to this structure, your objective is: 
        Answer math questions with a clear final number.


[31mUser message:[0m

[[ ## question ## ]]
What is 10 + 20?


[31mAssistant message:[0m

{
  "reasoning": "Add 10 and 20: 10 + 20 = 30.",
  "answer": "30"
}


[31mUser message:[0m

[[ ## question ## ]]
If I have 5 appl

### Optimizing Few-shot examples for the Prompt using DSPy Optimizer: (when the answers are creative/text)

The below example helps understand how to use DSPy optimizers to select the best few-shot examples for our Prompt.

This example uses `BootstrapFewShot` optimizer from the DSPy library to select the best examples with reasoning traces which can improve the prompt performance.

The key distinctions in this example are the below:
- If the answer/output in your training examples are non-numeric. For these cases, the metric should be an `LLM` to evaluate and understand the relationship between your INPUT and OUTPUT in the example pairs.

In [21]:
import dspy

# 1. SETUP MODELS
# We use a cheaper model for the "Student" and a smarter model for the "Judge" (optional)
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# 2. THE STUDENT (The Task you want to optimize)
class EmailWriter(dspy.Signature):
    """Write a professional email based on the user's raw notes."""
    raw_notes = dspy.InputField(desc="Rough bullet points")
    email = dspy.OutputField(desc="A professional email")

# 3. THE JUDGE (The Metric)
# This is a special signature just for evaluation.
class ProfessionalJudge(dspy.Signature):
    """Assess if the email is professional and concise. Output 'Yes' or 'No'."""
    email_text = dspy.InputField()
    assessment = dspy.OutputField(desc="Return strictly 'Yes' or 'No'")

# 4. DEFINE THE METRIC FUNCTION
def is_professional_metric(example, prediction, trace=None):
    # We call the LLM *inside* the metric function to grade the answer
    judge = dspy.Predict(ProfessionalJudge)
    response = judge(email_text=prediction.email)
    
    # Return True if the Judge said "Yes", else False
    return response.assessment.lower() == "yes"

# 5. DATA (Training Examples)
train_data = [
    dspy.Example(raw_notes="meeting pushed to 5pm, sorry", email="Hi Team, The meeting has been rescheduled to 5 PM. Apologies for the delay.").with_inputs("raw_notes"),
    dspy.Example(raw_notes="need the report by friday or else", email="Hi there, Could you please ensure the report is submitted by Friday? Thanks.").with_inputs("raw_notes"),
]

# 6. OPTIMIZE (The Meta-Loop)
# The Teleprompter will now generate emails, have the 'Judge' grade them, 
# and keep the prompts that got a "Yes" from the Judge.
from dspy.teleprompt import BootstrapFewShot

teleprompter = BootstrapFewShot(metric=is_professional_metric, max_bootstrapped_demos=2)
compiled_email_writer = teleprompter.compile(dspy.ChainOfThought(EmailWriter), trainset=train_data)

# 7. TEST
print(compiled_email_writer(raw_notes="u gotta fix the bug fast"))

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [01:07<00:00, 33.92s/it]


Bootstrapped 2 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.




Prediction(
    reasoning="The raw notes convey an urgent request to address a bug quickly. The phrasing 'u gotta' indicates a casual tone, but the urgency suggests it is important to communicate the need for a prompt resolution while still maintaining professionalism in the email.",
    email="Subject: Urgent: Bug Fix Needed\n\nHi [Recipient's Name],\n\nI hope you're doing well. I wanted to bring to your attention a bug that requires immediate attention. Please prioritize fixing it as quickly as possible.\n\nThank you for your prompt response to this matter.\n\nBest,\n\n[Your Name]"
)


In [22]:
compiled_email_writer.inspect_history(n=1)





[34m[2026-02-04T01:09:47.355721][0m

[31mSystem message:[0m

Your input fields are:
1. `raw_notes` (str): Rough bullet points
Your output fields are:
1. `reasoning` (str): 
2. `email` (str): A professional email
All interactions will be structured in the following way, with the appropriate values filled in.

Inputs will have the following structure:

[[ ## raw_notes ## ]]
{raw_notes}

Outputs will be a JSON object with the following fields.

{
  "reasoning": "{reasoning}",
  "email": "{email}"
}
In adhering to this structure, your objective is: 
        Write a professional email based on the user's raw notes.


[31mUser message:[0m

[[ ## raw_notes ## ]]
meeting pushed to 5pm, sorry


[31mAssistant message:[0m

{
  "reasoning": "The raw notes indicate a scheduling change for a meeting, which needs to be communicated to the relevant parties. The email serves to inform them of the new meeting time courteously and apologetically for any inconvenience caused.",
  "email": "Sub