## Building Blocks of DSPy

- **dspy.Signature**: define the input/output contract of DSPy module.
- **dspy.Module**: define the logic of interacting with LLMs.

## Advantages

- LM-agnostic programming
- Seamless productization
- Automatic program optimization

In [1]:
import json
import dspy
import mlflow
from typing import Literal
from dotenv import load_dotenv, find_dotenv


_ = load_dotenv(find_dotenv())

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"), track_usage=True)

mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("my-local-experiment-part-1")
mlflow.dspy.autolog(
    log_evals=True, log_compiles=True, log_traces_from_compile=True
)
mlflow.tracing.disable_notebook_display()

# running local MLflow server
# mlflow server --host 127.0.0.1 --port 5000

## Class-based Signature Example

In [2]:
class SlangEquivalence(dspy.Signature):
    """Find a Portuguese slang equivalance in English and classify it's formality level."""

    # inputs
    portuguese_slang: str = dspy.InputField()
    # outputs
    english_slang: str = dspy.OutputField()
    alternative_slangs: list[str] = dspy.OutputField(
        desc="More equivalent alternative slangs."
    )
    formality_level: Literal["L1", "L2", "L3", "L4", "L5"] = dspy.OutputField(
        desc="Slang formality level. Lesser the number more informal."
    )

## String-based Signature Example

In [3]:
slang_equivalance = dspy.make_signature(
    "portuguese_slang -> english_slang: str, alternative_slangs: list[str]"
)

## LM Interaction via Module Examples

In [4]:
predict = dspy.Predict(slang_equivalance)
output = predict(portuguese_slang="Boiar")

In [5]:
output

Prediction(
    english_slang='To be clueless or to not understand something.',
    alternative_slangs=['to zone out', 'to space out', 'to be lost', 'to be in the dark']
)

In [6]:
print(
    f"English equivalent slang: {output['english_slang']}\nAlternative slangs: {output.alternative_slangs}"
)

English equivalent slang: To be clueless or to not understand something.
Alternative slangs: ['to zone out', 'to space out', 'to be lost', 'to be in the dark']


In [7]:
cot = dspy.ChainOfThought(SlangEquivalence)
output = cot(portuguese_slang="quebrar o galho")

In [8]:
output

Prediction(
    reasoning='"Quebrar o galho" is a Portuguese slang that means to help someone out or to find a workaround for a problem. In English, a similar expression would be "to lend a hand" or "to help out." This phrase conveys a sense of informal assistance or support.',
    english_slang='lend a hand',
    alternative_slangs=['help out', 'give a hand', 'pitch in'],
    formality_level='L2'
)

In [9]:
dspy.inspect_history(n=1)





[34m[2025-07-25T21:22:12.523101][0m

[31mSystem message:[0m

Your input fields are:
1. `portuguese_slang` (str):
Your output fields are:
1. `reasoning` (str): 
2. `english_slang` (str): 
3. `alternative_slangs` (list[str]): More equivalent alternative slangs.
4. `formality_level` (Literal['L1', 'L2', 'L3', 'L4', 'L5']): Slang formality level. Lesser the number more informal.
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## portuguese_slang ## ]]
{portuguese_slang}

[[ ## reasoning ## ]]
{reasoning}

[[ ## english_slang ## ]]
{english_slang}

[[ ## alternative_slangs ## ]]
{alternative_slangs}        # note: the value you produce must adhere to the JSON schema: {"type": "array", "items": {"type": "string"}}

[[ ## formality_level ## ]]
{formality_level}        # note: the value you produce must exactly match (no extra characters) one of: L1; L2; L3; L4; L5

[[ ## completed ## ]]
In adhering to this structure, your objective 

## Changing the Adapter

In [10]:
dspy.settings.configure(lm=dspy.LM("openai/gpt-4o"))
dspy.configure(adapter=dspy.JSONAdapter())

In [11]:
cot = dspy.ChainOfThought(SlangEquivalence)
output = cot(portuguese_slang="quebrar o galho")



In [12]:
output

Prediction(
    reasoning='"Quebrar o galho" is a Portuguese slang expression that means to help someone out or to do a favor, often in a situation where a quick or temporary solution is needed. The English equivalent would be "to help out" or "to lend a hand."',
    english_slang='help out',
    alternative_slangs=['lend a hand', 'do a favor', 'give a hand'],
    formality_level='L3'
)

In [13]:
dspy.inspect_history(n=1)





[34m[2025-07-25T21:22:18.859567][0m

[31mSystem message:[0m

Your input fields are:
1. `portuguese_slang` (str):
Your output fields are:
1. `reasoning` (str): 
2. `english_slang` (str): 
3. `alternative_slangs` (list[str]): More equivalent alternative slangs.
4. `formality_level` (Literal['L1', 'L2', 'L3', 'L4', 'L5']): Slang formality level. Lesser the number more informal.
All interactions will be structured in the following way, with the appropriate values filled in.

Inputs will have the following structure:

[[ ## portuguese_slang ## ]]
{portuguese_slang}

Outputs will be a JSON object with the following fields.

{
  "reasoning": "{reasoning}",
  "english_slang": "{english_slang}",
  "alternative_slangs": "{alternative_slangs}        # note: the value you produce must adhere to the JSON schema: {\"type\": \"array\", \"items\": {\"type\": \"string\"}}",
  "formality_level": "{formality_level}        # note: the value you produce must exactly match (no extra characters) one 

## Creating a Custom Module

In [14]:
dspy.settings.configure(
    lm=dspy.LM("openai/gpt-4o-mini"),
    adapter=dspy.ChatAdapter(),
    track_usage=False,
)

In [15]:
class SkillIdentifier(dspy.Signature):
    """Given a job description, identify the most important technical skills needed for a successfully job appliance."""

    job_description: str = dspy.InputField()
    tech_skills: list[str] = dspy.OutputField(
        desc="Ordered technical skills from most important to less important"
    )


class SkillSummary(dspy.Signature):
    """Given a technical skill make a short summary showing what is it about and give some examples during the explanation. Also include examples of project based ideas to prove the knowledge about the skill."""

    tech_skill: str = dspy.InputField()
    skill_summary: str = dspy.OutputField(
        desc="A Markdown-formatted summary section"
    )
    project_based_examples: list[str] = dspy.OutputField()


class DetailJob(dspy.Module):
    def __init__(self):
        self.skill_identifier = dspy.Predict(SkillIdentifier)
        self.skill_summary = dspy.Predict(SkillSummary)

    def forward(self, job_description, top_k=3):
        skills_resp = self.skill_identifier(job_description=job_description)
        print(f"## IDENTIFIED SKILLS: {skills_resp.tech_skills}")
        for i, item in enumerate(skills_resp.tech_skills):
            details_resp = self.skill_summary(tech_skill=item)

            print("=" * 50)
            print(f"* SKILL: {item}")
            print("=" * 50)
            print(f">> SUMMARY:\n{details_resp.skill_summary}")
            print(
                f"---\n>> PROJECT BASED IDEAS:\n{details_resp.project_based_examples}"
            )

            if i == top_k - 1:
                break

In [16]:
description = """
About the Role As an AI Research Engineer, you will: Build and maintain LLM-based and agentic product features, including
rigorous evaluation, benchmarking and backtesting. Create systems that can improve over time based on feedback from humans
in the loop, such as prompt optimisation and fine-tuning.s Explore applications of the latest developments of AI to enhance
our product features, explainability, speed and accuracy. What You’ll Do AI features: Use the latest in LLMs, agents and
computer vision to build systems that are faster, more thorough and more accurate than humans. Evals: Build benchmarks,
evaluate backtests and analyse results to quantify our performance and prevent regressions. MLOps: Ensure that we maintain
good practices around collecting and managing datasets, backtests and benchmarks to to maximise development velocity Prompt
Engineering: Develop and test prompts to optimise model performance and deliver high-quality outputs. Model Fine-Tuning:
Adapt and fine-tune large language models and vision models for specific use cases. Custom Model Training: Train and deploy
bespoke AI models tailored to solve unique compliance challenges. Collaboration: Work closely with the engineering and
product teams to translate customer needs into actionable AI solutions. Iterate: Continuously iterate on our AI systems to
improve performance, reduce errors, and deliver value to customers. Our Culture 1. Own With Urgency As a startup, delivering
value quickly is our superpower. We achieve this by valuing speed and ownership over perfection. We're not afraid to get our
hands dirty, experiment, and iterate quickly to achieve our goals, and completely own the outcome. 2. Transparency We believe
in open communication and full visibility across teams and roles. Decisions, successes, and failures are shared openly to
foster trust and collaboration. 3. Customer First, Team Second, Self Last Our priority is creating value for our customers.
We then focus on building a supportive, growth-oriented team environment, putting individual needs last to ensure collective
success. What We’re Looking For Experience: 3+ years in an AI research or engineering role, with experience building and
testing agentic systems. Technical Expertise: Hands-on experience with prompt engineering, fine-tuning pre-trained models,
and training custom models, including vision models. Experience using TypeScript a plus. Product Mindset: Strong
understanding of how AI can solve real-world problems, with a focus on customer needs. Interests: Ability to stay updated with
the latest advancements in AI and apply them effectively. Collaboration: Excellent communication skills to work across teams
and explain complex AI concepts to non-technical stakeholders. Ownership: A proactive, problem-solving mindset with the
ability to take full responsibility for projects and outcomes. Growth-Oriented: Excited to learn new skills, tackle challenges,
and adapt as the company scales.
"""

In [17]:
job_detail = DetailJob()
job_detail(job_description=description)

## IDENTIFIED SKILLS: ['Prompt Engineering', 'Model Fine-Tuning', 'Custom Model Training', 'MLOps', 'LLM Development', 'Computer Vision', 'Benchmarking and Evaluation', 'TypeScript']
* SKILL: Prompt Engineering
>> SUMMARY:
Prompt Engineering is the practice of designing and refining input prompts to effectively communicate with AI models, particularly in natural language processing. It involves understanding how to frame questions or statements to elicit the most accurate and relevant responses from AI systems. This skill is crucial for maximizing the utility of AI tools, as the quality of the output is often directly related to the clarity and specificity of the input. For example, a well-structured prompt can lead to more insightful answers, while vague prompts may yield less useful information.
---
>> PROJECT BASED IDEAS:
['Develop a chatbot that uses prompt engineering to provide customer support, ensuring that the prompts guide users to articulate their issues clearly.', 'Create a

## Saving and Loading

In [18]:
job_detail.save("data/dspy_modules/jobdetail.json", save_program=False)

In [19]:
job_detail.load("data/dspy_modules/jobdetail.json")

In [20]:
job_detail.save("data/dspy_modules/jobdetail/", save_program=True)

In [21]:
loaded_module = dspy.load("data/dspy_modules/jobdetail/")

In [22]:
# loaded_module()

## Optimizing Agents

### RAG Agent

In [23]:
mlflow.set_experiment("my-local-experiment-part-2")

<Experiment: artifact_location='mlflow-artifacts:/262207438023260723', creation_time=1753487661794, experiment_id='262207438023260723', last_update_time=1753487661794, lifecycle_stage='active', name='my-local-experiment-part-2', tags={}>

In [24]:
def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(
        query, k=3
    )
    return [x["text"] for x in results]


react = dspy.ReAct("question -> answer", tools=[search_wikipedia])

In [25]:
# load train dataset
trainset = []
with open("data/dspy_data/trainset.jsonl", "r") as f:
    for line in f:
        trainset.append(
            dspy.Example(**json.loads(line)).with_inputs("question")
        )

# load validation dataset
valset = []
with open("data/dspy_data/valset.jsonl", "r") as f:
    for line in f:
        valset.append(dspy.Example(**json.loads(line)).with_inputs("question"))

In [26]:
# check data example
print(trainset[85])

Example({'question': 'So Long, Scarecrow is titled in reference to which 1939 musical fantasy film?', 'answer': 'The Wizard of Oz'}) (input_keys={'question'})


In [27]:
tp = dspy.MIPROv2(
    metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=16
)

In [28]:
# dspy.cache.load_memory_cache("./memory_cache.pkl")

In [None]:
optimized_react = tp.compile(
    react,
    trainset=trainset,
    valset=valset,
    requires_permission_to_run=False,
)

In [None]:
optimized_react.react.signature

In [None]:
optimized_react.react.demos

In [None]:
evaluator = dspy.Evaluate(
    metric=dspy.evaluate.answer_exact_match,
    devset=valset,
    display_table=True,
    display_progress=True,
    num_threads=24,
)

In [None]:
original_score = evaluator(react)
print(f"Original score: {original_score}")

In [None]:
optimized_score = evaluator(optimized_react)
print(f"Optimized score: {optimized_score}")