<center>
<img src="https://supportvectors.ai/logo-poster-transparent.png" width="400px" style="opacity:0.7">
</center>

In [1]:
%run supportvectors-common.ipynb


<div style="color:#aaa;font-size:8pt">
<hr/>
&copy; SupportVectors. All rights reserved. <blockquote>This notebook is the intellectual property of SupportVectors, and part of its training material. 
Only the participants in SupportVectors workshops are allowed to study the notebooks for educational purposes currently, but is prohibited from copying or using it for any other purposes without written permission.

<b> These notebooks are chapters and sections from Asif Qamar's textbook that he is writing on Data Science. So we request you to not circulate the material to others.</b>
 </blockquote>
 <hr/>
</div>



# DSPy 
(https://dspy.ai/)


## Motivation: Why Do We Need DSPy?

Language models (LLMs) like GPT-4, Claude, and Llama are powerful tools for natural language processing tasks. However, achieving high-quality results often requires extensive prompt engineering, chaining multiple prompts, and retrieving relevant information—all of which can be difficult to optimize manually.

### The Traditional LLM Development Challenge

Traditional approaches to building AI applications with language models face several critical limitations:

**1. Prompt Engineering Overhead**
- Crafting effective prompts requires extensive trial and error
- Each task requires custom prompt design and optimization
- Manual prompt tuning doesn't scale to complex applications

**2. Model Brittleness**
- A prompt that works well on one model may fail on another
- Performance degrades when switching between different model versions
- No systematic way to adapt prompts across model families

**3. Scaling Complexity**
- As tasks become more complex, manually composing reasoning steps becomes unmanageable
- Multi-step reasoning requires careful orchestration of multiple LLM calls
- Retrieval-augmented generation (RAG) systems need sophisticated prompt engineering

**4. Lack of Systematic Optimization**
- No principled way to improve prompts based on feedback
- Manual optimization doesn't leverage the full potential of modern LLMs
- Difficult to maintain and iterate on complex AI systems

### The DSPy Solution

[DSPy](https://dspy.ai/) (Declarative Self-Improving Python) addresses these challenges by providing a **declarative framework for building modular AI software**. Instead of wrangling prompts or training jobs, DSPy enables you to:

- **Build AI software from natural-language modules** and compose them generically
- **Compile AI programs into effective prompts and weights** for your language models
- **Iterate fast on structured code** rather than brittle strings
- **Make AI software more reliable, maintainable, and portable** across models and strategies

Think of DSPy as a **higher-level language for AI programming**—like the shift from assembly to C or pointer arithmetic to SQL. It abstracts away the complexity of prompt engineering and provides systematic optimization algorithms.

## What is DSPy?

DSPy is a **declarative framework for building modular AI software** that allows you to **iterate fast on structured code**, rather than brittle strings, and offers algorithms that **compile AI programs into effective prompts and weights** for your language models.

### Core Philosophy

Instead of wrangling prompts or training jobs, DSPy enables you to **build AI software from natural-language modules** and to _generically compose them_ with different models, inference strategies, or learning algorithms. This makes AI software **more reliable, maintainable, and portable** across models and strategies.

### Key Components

**1. Modules** - Help you describe AI behavior as code, not strings
- Define structured interfaces for AI tasks
- Compose complex reasoning pipelines
- Work across different models and strategies

**2. Optimizers** - Tune the prompts and weights of your AI modules
- Systematic algorithms for improving performance
- Learn from examples and feedback
- Compose different optimization strategies

**3. Ecosystem** - Advances open-source AI research
- Large community contributing modules and optimizers
- Distributed improvement of compositional architectures
- Faster iteration and better programs over time

### Supported Models

DSPy works with a wide range of language models:
- **OpenAI**: GPT-4, GPT-4o-mini, GPT-3.5-turbo
- **Anthropic**: Claude-3-opus, Claude-3-sonnet, Claude-3-haiku
- **Google**: Gemini-2.5-flash, Gemini-1.5-pro
- **Databricks**: Databricks-llama-4-maverick
- **Local Models**: Llama, Mistral, and other open-source models
- **Other Providers**: Via LiteLLM integration

## How DSPy Works

DSPy follows a systematic approach to building and optimizing AI programs. Here's how the framework operates:

### Step 1: Define Signatures (Task Descriptions)

Instead of writing direct prompts, DSPy uses **signatures** to define what your AI program should do. Signatures specify the inputs and outputs in a structured way.

**Example: Question Answering Task**
```python
import dspy

# Define the signature for answering questions from context
class AnswerQuestion(dspy.Signature):
    """Answer a question based on the given context."""
    context: str = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()
```

This signature tells DSPy:
- **Inputs**: `context` (the source material) and `question` (what to answer)
- **Output**: `answer` (the response we want to optimize)
- **Task**: Answer questions based on provided context

### Step 2: Create Modules (AI Components)

DSPy modules are the building blocks that implement your signatures. They handle the actual LLM interactions and reasoning.

```python
# Configure your language model
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)

# Create a module that uses Chain-of-Thought reasoning
predictor = dspy.ChainOfThought(AnswerQuestion)

# Use the module
response = predictor(
    context="Quantum mechanics explains the behavior of particles at a subatomic level.",
    question="What is quantum mechanics?"
)
print(response.answer)
```

**Key Benefits:**
- DSPy automatically determines the best prompting strategy
- No manual prompt engineering required
- Works consistently across different models

### Step 3: Optimization with Examples

DSPy's power comes from its optimization algorithms that learn from examples to improve performance.

```python
# Prepare training examples
examples = [
    AnswerQuestion(
        context="Photosynthesis converts sunlight into energy.",
        question="What is photosynthesis?",
        answer="A process where plants convert sunlight into energy."
    ),
    AnswerQuestion(
        context="Gravity is a force that attracts objects toward each other.",
        question="What is gravity?",
        answer="A force that pulls objects toward one another."
    )
]

# Optimize using MIPROv2 (a powerful DSPy optimizer)
optimizer = dspy.MIPROv2()
optimized_predictor = optimizer.compile(predictor, trainset=examples)

# The optimized predictor now performs better
response = optimized_predictor(
    context="Machine learning is a subset of artificial intelligence.",
    question="What is machine learning?"
)
```

### How Optimization Works

DSPy optimizers like MIPROv2 follow a sophisticated process:

1. **Bootstrapping**: Run your program on various inputs to collect traces of input/output behavior
2. **Grounded Proposals**: Use your program's code, data, and traces to draft potential instructions
3. **Discrete Search**: Sample mini-batches, propose instruction combinations, and evaluate performance
4. **Surrogate Learning**: Update models to improve proposals over time

The result is a program that automatically learns better prompting strategies and reasoning patterns from your examples.

## Benefits of Using DSPy

### 1. Eliminates Manual Prompt Engineering

**Traditional Approach:**
```python
# Manual prompt crafting - brittle and time-consuming
prompt = """
Given the following context: {context}
Answer this question: {question}
Please provide a clear and accurate answer.
"""
```

**DSPy Approach:**
```python
# Declarative signature - DSPy handles the prompting
class AnswerQuestion(dspy.Signature):
    context: str = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()
```

DSPy automatically optimizes prompts based on your data and performance metrics, eliminating the need for manual prompt engineering.

### 2. Model Portability

The same DSPy program works seamlessly across different language models:

```python
# Works with OpenAI
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)

# Switch to Anthropic - no code changes needed
lm = dspy.LM('anthropic/claude-3-sonnet')
dspy.configure(lm=lm)

# Switch to local model - still works
lm = dspy.LM('ollama/llama3')
dspy.configure(lm=lm)
```

### 3. Systematic Optimization

DSPy provides powerful optimizers that systematically improve your programs:

- **MIPROv2**: Advanced prompt optimization with grounded proposals
- **BootstrapFinetune**: Fine-tune models for specific tasks
- **BetterTogether**: Compose multiple optimization strategies
- **Ensemble**: Combine multiple optimized programs for better performance

### 4. Advanced RAG Capabilities

DSPy excels at building sophisticated Retrieval-Augmented Generation systems:

```python
# Multi-hop RAG with automatic reasoning
class MultiHopRAG(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # DSPy learns optimal retrieval and reasoning strategies
        passages = self.retrieve(question)
        return self.generate_answer(context=passages, question=question)
```

### 5. Composable and Maintainable

DSPy programs are modular and composable:

```python
# Build complex AI systems by composing simple modules
class ResearchAgent(dspy.Module):
    def __init__(self):
        self.search = SearchModule()
        self.summarize = SummarizeModule()
        self.analyze = AnalyzeModule()
    
    def forward(self, query):
        results = self.search(query)
        summary = self.summarize(results)
        analysis = self.analyze(summary)
        return analysis
```

### 6. Real-World Performance

DSPy has demonstrated significant performance improvements in real applications:

- **Banking77 Classification**: Improved GPT-4o-mini performance from 66% to 87%
- **Math Reasoning**: Enhanced performance on complex mathematical problems
- **Multi-hop RAG**: Better retrieval and reasoning in complex information systems
- **Agent Systems**: More reliable and maintainable AI agents

### 7. Community and Ecosystem

DSPy benefits from a large, active community:

- **250+ contributors** from Stanford NLP and beyond
- **Tens of thousands** of users building modular LM programs
- **Open-source research** advancing the field
- **Production applications** across various industries

## Getting Started

To begin using DSPy, install it and set up your language model:

```bash
uv add dspy # this is already included in pyproject.toml
```

```python
import dspy

# Configure your language model
lm = dspy.LM("openai/gpt-4o-mini", api_key="YOUR_API_KEY")
dspy.configure(lm=lm)

# Start building AI programs!
```

## Using DSPy
We now try out some examples that have been adapted from the DSPy main [web-page](https://dspy.ai/)

*Note*:
If we want to connect to a local LM on ollama through dspy, we do it using the below snippet of code:

```python
import dspy
lm = dspy.LM('ollama_chat/llama3.2', api_base='http://localhost:11434', api_key='')
dspy.configure(lm=lm)
```

However, in the below demonstration, we connect to OpenAI gpt-4o-mini because some of the reasoning based interactions work better with it.  Remember to have the `OPENAI_API_KEY` defined in the `.env` file.

In [2]:
import dspy
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)

In the below example, `'question -> response'` syntax within the argument of the `dspy.ChainOfThought` infers that the `Signature` of the `ChainOfThought` prompt is taking an input of the form `question` and returning an output of the form `response`.  And the default types of these variables are strings.

In [3]:
predict = dspy.Predict('question -> response')
predict(question="should curly braces appear on their own line?")

Prediction(
    response="The placement of curly braces on their own line largely depends on the coding style and conventions being followed. In languages like Java or C#, it's common to place the opening curly brace at the end of the preceding line, while others, like Python, use indentation and do not require curly braces at all. Ultimately, it's a matter of personal or team preference and should be consistent throughout the codebase."
)

In [4]:
dspy.inspect_history(n=1)





[34m[2025-11-19T15:34:37.882127][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str):
Your output fields are:
1. `response` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## question ## ]]
should curly braces appear on their own line?

Respond with the corresponding output fields, starting with the field `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## response ## ]]
The placement of curly braces on their own line largely depends on the coding style and conventions being followed. In languages like Java or C#, it's common to place the opening curly brace at the end of the preceding line, while others, l

In [5]:
cot = dspy.ChainOfThought('question -> response')
cot(question="should curly braces appear on their own line?")

Prediction(
    reasoning="The placement of curly braces often depends on the coding style or guidelines being followed. In languages like Java and C#, it's common to place opening curly braces on the same line as the control statement, while closing braces may be on their own line for clarity. However, others may prefer the braces to be on their own lines to visually separate blocks of code, which can enhance readability. Ultimately, consistency within a codebase is key, and adhering to a project's specific style guide is advisable.",
    response="Curly braces can be placed on their own lines or together with the corresponding statement, depending on the coding style guidelines being followed. There isn't a strict rule; it often comes down to personal or team preference for clarity and readability."
)

In [6]:
dspy.inspect_history(n=1)





[34m[2025-11-19T15:40:27.103838][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `response` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## question ## ]]
should curly braces appear on their own line?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## reasoning ## ]]
The placement of curly braces often depends on the coding style or guidelines being followed. In languages like Java and C#, it's common to place openi

In [7]:
from rich import print as rprint

In [8]:
class guardrailed_response(dspy.Signature):
    """Generate a guardrailed response to a question."""

    question: str = dspy.InputField(desc="Never answer any question about Elon Musk")
    response: str = dspy.OutputField(desc="a guardrailed response to the question.  If the question is about Elon Musk, respond with 'I am sorry, I cannot answer that question.'")

guardrailed_response = dspy.ChainOfThought(guardrailed_response)

In [9]:
guardrailed_response(question="What is the capital of France?")

Prediction(
    reasoning='The question is not about Elon Musk and is straightforward, asking for a geographical fact.',
    response='The capital of France is Paris.'
)

In [10]:
guardrailed_response(question="Who is Elon Musk?")

Prediction(
    reasoning='The question asks for information about Elon Musk, which falls under the category of topics I am not allowed to discuss.',
    response='I am sorry, I cannot answer that question.'
)

# Some common use-cases for DSPy

## Multi-stage pipelines


The code block below demonstrates a multi-stage pipeline using DSPy's framework.

### 1. `Outline` Class (DSPy Signature)

```python
class Outline(dspy.Signature):
    """Outline a thorough overview of a topic."""
    
    topic: str = dspy.InputField()
    title: str = dspy.OutputField()
    sections: list[str] = dspy.OutputField()
    section_subheadings: dict[str, list[str]] = dspy.OutputField(desc="mapping from section headings to subheadings")
```

**What it does:**
- **`dspy.Signature`**: This is DSPy's way of defining the input/output contract for a language model task. According to DSPy documentation, Signatures are "the basic building blocks of DSPy programs" that define what inputs a module expects and what outputs it should produce.

- **`dspy.InputField()`**: Defines the input parameter `topic` that the language model will receive.

- **`dspy.OutputField()`**: Defines the expected outputs:
  - `title`: The main title of the article
  - `sections`: A list of section headings
  - `section_subheadings`: A dictionary mapping section headings to their subheadings

- **`desc` parameter**: Provides additional context to the language model about what the output should contain.

### 2. `DraftSection` Class (DSPy Signature)

```python
class DraftSection(dspy.Signature):
    """Draft a top-level section of an article."""
    
    topic: str = dspy.InputField()
    section_heading: str = dspy.InputField()
    section_subheadings: list[str] = dspy.InputField()
    content: str = dspy.OutputField(desc="markdown-formatted section")
```

**What it does:**
- Another DSPy Signature that defines the contract for drafting individual article sections
- Takes the topic, section heading, and subheadings as inputs
- Outputs markdown-formatted content for that specific section

### 3. `DraftArticle` Class (DSPy Module)

```python
class DraftArticle(dspy.Module):
    def __init__(self):
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)
    
    def forward(self, topic):
        # Implementation details...
```

**What it does:**
- **`dspy.Module`**: The base class for creating composable DSPy components. According to DSPy documentation, Modules are "the main building blocks of DSPy programs" that can be combined to create complex AI systems.

- **`dspy.ChainOfThought`**: This is a prompting technique that encourages the language model to show its reasoning process step-by-step before providing the final answer. It wraps the Signature classes to add reasoning capabilities.

- **`forward()` method**: This is the main execution method that:
  1. First calls `build_outline` to create an article outline
  2. Then iterates through each section and calls `draft_section` to generate content
  3. Returns a `dspy.Prediction` object containing the final article

### Key DSPy Concepts Demonstrated:

1. **Modularity**: The code breaks down article generation into smaller, reusable components (outline generation and section drafting).

2. **Composability**: The `DraftArticle` module combines two other modules (`build_outline` and `draft_section`) to create a more complex system.

3. **Declarative Programming**: Instead of writing prompts manually, you declare what inputs and outputs you want, and DSPy handles the prompt engineering.

4. **Chain of Thought**: The `ChainOfThought` wrapper encourages the language model to reason through the task step-by-step, which typically improves output quality.

The final line `article = draft_article(topic="Quantum Chromodynamics")` demonstrates how to use the module - simply call it with a topic and it will generate a complete article with title, sections, and content.

In [11]:
class Outline(dspy.Signature):
    """Outline a thorough overview of a topic."""

    topic: str = dspy.InputField()
    title: str = dspy.OutputField()
    sections: list[str] = dspy.OutputField()
    section_subheadings: dict[str, list[str]] = dspy.OutputField(desc="mapping from section headings to subheadings")

class DraftSection(dspy.Signature):
    """Draft a top-level section of an article."""

    topic: str = dspy.InputField()
    section_heading: str = dspy.InputField()
    section_subheadings: list[str] = dspy.InputField()
    content: str = dspy.OutputField(desc="markdown-formatted section")

class DraftArticle(dspy.Module):
    def __init__(self):
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)

    def forward(self, topic):
        outline = self.build_outline(topic=topic)
        sections = []
        for heading, subheadings in outline.section_subheadings.items():
            section, subheadings = f"## {heading}", [f"### {subheading}" for subheading in subheadings]
            section = self.draft_section(topic=outline.title, section_heading=section, section_subheadings=subheadings)
            sections.append(section.content)
        return dspy.Prediction(title=outline.title, sections=sections)

draft_article = DraftArticle()
article = draft_article(topic="Quantum Chromodynamics")

In [12]:
import re

def replace_latex_delimiters(text):
    # Replace all occurrences of \[ and \] with $
    text = re.sub(r'\\\[', r'$', text)  # Replace \[ with $
    text = re.sub(r'\\\]', r'$', text)  # Replace \] with $
    
    # Replace all occurrences of \( and \) with $
    text = re.sub(r'\\\(', r'$', text)  # Replace \( with $
    text = re.sub(r'\\\)', r'$', text)  # Replace \) with $
        
    return text

In [13]:
from IPython.display import Markdown, display
display(Markdown(f'# {article.title}'))
for section in article.sections:
    display(Markdown(replace_latex_delimiters(section)))

# Overview of Quantum Chromodynamics

## Introduction to Quantum Chromodynamics

### What is QCD?
Quantum Chromodynamics (QCD) is the quantum field theory that describes the interactions between quarks and gluons, the fundamental constituents of matter. QCD is based on the principles of quantum mechanics and special relativity, and it employs the concept of color charge, which is analogous to electric charge in electromagnetism but has three types—commonly referred to as red, green, and blue. The core idea of QCD is that these color charges interact through the exchange of gluons, the force carriers that mediate the strong force. This interaction is responsible for binding quarks together to form protons, neutrons, and other baryons, as well as binding gluons together in a unique way that leads to confinement.

### Historical Context
The development of QCD was a significant milestone in the history of particle physics, emerging in the late 20th century as part of the broader framework known as the Standard Model. Its roots can be traced back to the observations of deep inelastic scattering experiments in the 1960s, which revealed that protons and neutrons were not elementary particles but were composed of smaller constituents called quarks. The formulation of QCD began to take shape in the early 1970s, culminating in the development of the theory by David Gross, Frank Wilczek, and H. David Politzer, who were later awarded the Nobel Prize for their contributions.

### Significance in Particle Physics
QCD plays a pivotal role in our understanding of the fundamental forces of nature. As one of the four fundamental interactions—alongside electromagnetism, weak nuclear force, and gravity—QCD is essential for explaining phenomena such as the mass of protons and neutrons, the stability of atomic nuclei, and the behavior of matter under extreme conditions, such as in neutron stars or during the early moments of the universe. Furthermore, ongoing research in QCD helps to bridge the gap between theoretical predictions and experimental observations at high-energy particle colliders, enhancing our understanding of the strong force and its implications for the universe’s structure.

## Key Concepts in QCD

### Quarks and Gluons
At the core of Quantum Chromodynamics are two types of fundamental particles known as quarks and gluons. Quarks are the building blocks of hadrons, such as protons and neutrons, and come in six varieties, referred to as "flavors": up, down, charm, strange, top, and bottom. Gluons, on the other hand, are the force carriers for the strong interaction, analogous to how photons mediate electromagnetic forces. They are unique in that they also possess color charge, allowing them to interact with each other in addition to quarks.

### Color Charge
Color charge is a property of quarks and gluons that is central to the strong force. Unlike electric charge, color charge comes in three types, designated as red, green, and blue, with corresponding anticolor charges. Quarks always combine in such a way that they form color-neutral particles, a requirement known as color confinement. This results in the observable fact that free quarks are never found in isolation.

### Confinement and Asymptotic Freedom
Confinement is the phenomenon whereby quarks and gluons are perpetually confined within hadrons, preventing the isolation of individual quarks. Asymptotic freedom, conversely, describes the behavior of quarks and gluons at extremely short distances; at these scales, they interact weakly, allowing them to behave almost as free particles. This dual behavior is key to understanding the complexities of strong interactions and is a fundamental aspect of QCD.

## Mathematical Framework of QCD

### Lagrangian Formalism
At the heart of Quantum Chromodynamics is its Lagrangian, which encapsulates the dynamics of quarks and gluons through a non-Abelian gauge theory. The QCD Lagrangian can be expressed as follows:

$
\mathcal{L}_{\text{QCD}} = \sum_{q} \bar{\psi}_q(i\gamma^\mu D_\mu - m_q)\psi_q - \frac{1}{4}G^{\mu\nu}_aG_{\mu\nu}^a
$

Here, $\psi_q$ represents the quark fields, $m_q$ is the quark mass, $D_\mu$ is the covariant derivative that accounts for the interaction with the gluon fields, and $G^{\mu\nu}_a$ is the gluon field strength tensor. This form encapsulates the interaction between quarks, mediated by gluons, and reflects the fundamental symmetry properties intrinsic to the color charge of particles.

### Feynman Diagrams
Feynman diagrams serve as a powerful visual tool for calculating particle interactions in QCD. Each line and vertex in the diagram represents specific particles and their interactions, enabling physicists to systematically compute scattering amplitudes. The basic elements include:

- Quark lines, depicted as solid lines.
- Gluon lines, represented by wavy or curly lines, indicating the exchange of force carriers.
- Interaction vertices, where lines meet, corresponding to fundamental QCD interactions.

These diagrams facilitate not just comprehension of complex processes such as jet production in high-energy collisions, but also provide a framework for perturbative calculations essential in predicting outcomes of particle interactions.

### Renormalization
Given that QCD is a quantum field theory, it exhibits divergences that require careful treatment through renormalization. The process involves redefining mass and charge parameters to absorb infinities arising in loop calculations. Renormalization ensures that physical predictions remain finite and measurable. In QCD, various renormalization schemes, such as the $\overline{\text{MS}}$ (modified minimal subtraction), are employed to facilitate perturbative calculations, particularly at small coupling constants where the theory is most reliable. This crucial aspect of the mathematical framework allows for precise predictions that align with experimental results.

## Experimental Evidence and Discoveries

### Deep Inelastic Scattering
Deep Inelastic Scattering (DIS) is one of the pivotal experiments that provided direct evidence for the existence of quarks within protons and neutrons. Conducted at high-energy particle colliders, DIS involves bombarding a proton with a high-energy electron. The resulting scattering events reveal information about the internal structure of the proton, indicating that it is not a simple particle, but rather a collection of quarks and gluons. Measurements from these experiments have confirmed the predictions of QCD regarding the distribution of quarks and have supported the concept that these particles engage in strong interactions.

### Jet Production
Jet production is another crucial piece of evidence for QCD, observed in high-energy collisions where quarks and gluons are produced. When a high-energy collision occurs, quarks are freed from their confinement and emit radiation, resulting in the formation of collimated streams of particles known as jets. The characteristics of these jets—such as their distribution and multiplicity—are consistent with QCD predictions, providing further validation for the theory. The studies of jet production at particle accelerators like the Large Hadron Collider (LHC) have shown that the behavior of jets complies with the underlying principles of QCD, helping physicists to understand the dynamics of strong interactions.

### Lattice QCD
Lattice QCD is a computational approach that allows physicists to study QCD in a non-perturbative regime by discretizing space-time into a lattice. This method enables simulations of quark and gluon interactions, offering insights into phenomena that are difficult to observe experimentally. Recent advancements in lattice QCD have yielded critical insights into binding energies, mass calculations of hadrons, and the nature of the QCD phase transition. By comparing lattice calculations with experimental results, researchers have been able to refine their understanding of the strong force and its implications for particle physics.

## Implications of QCD in Modern Physics

### QCD and the Standard Model
Quantum Chromodynamics is crucial for the Standard Model of particle physics, which unifies the electromagnetic, weak, and strong interactions. It describes how quarks and gluons interact through the exchange of color charge, providing insights into the asymptotic freedom of quarks at high energies. This has been pivotal in explaining phenomena observed in high-energy particle collisions, such as those at the Large Hadron Collider (LHC). The predictions made by QCD have been confirmed through various experimental results, including the discovery of the Higgs boson.

### Role in Cosmology
In cosmology, QCD plays a vital role in understanding the early universe's development. During the first moments after the Big Bang, the universe was in a state of extremely high temperature and density, where quarks and gluons existed freely in what is known as the quark-gluon plasma. The dynamics of QCD are essential to explain the transition from this state to the formation of protons and neutrons, leading to the baryogenesis process and the subsequent formation of atomic nuclei. Understanding QCD thus helps to elucidate the conditions that led to the cosmic structure we observe today.

### Future Directions
The future of research in Quantum Chromodynamics is rich with possibilities. There are ongoing efforts to refine our understanding of confinement—a phenomenon where quarks are never found in isolation. Advances in lattice QCD, a non-perturbative approach to solve QCD on a discretized space-time lattice, aim to provide clearer predictions and a deeper understanding of the strong force. Furthermore, studying QCD in the context of high-energy astrophysical phenomena and heavy-ion collisions can reveal more about matter under extreme conditions and potentially unlock answers to unsolved questions in the universe's evolution.

## Agents

In [14]:
import os
SERPER_API_KEY = os.getenv("SERPER_API_KEY")

In [15]:
import requests

In [16]:
def google_search(query: str, num: int = 3):
    """Search Google using Serper API and return the top results."""
    url = "https://google.serper.dev/search"
    headers = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
    params = {"q": query, "num" : num}
    
    response = requests.post(url, json=params, headers=headers)
    
    if response.status_code == 200:
        results = response.json()
        return results.get("organic", [])  # Extract search results
    else:
        return f"Error: {response.status_code}, {response.text}"

In [17]:
def evaluate_math(expression: str):
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str):
    #results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    #return [x['text'] for x in results]
    return 5

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, google_search])

pred = react(question="What is 5 multiplied by the Hardy-Ramanujan number?")
rprint(pred.answer)

In [18]:
dspy.inspect_history()





[34m[2025-11-19T15:51:27.106802][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str): 
2. `trajectory` (str):
Your output fields are:
1. `reasoning` (str): 
2. `answer` (float):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}        # note: the value you produce must be a single float value

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
What is 5 multiplied by the Hardy-Ramanujan number?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
The Hardy-Ramanujan number, also known as the taxi-cab number, is 1729. I can multiply 5 by 1729 to find the answer.

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "5 * 1729"}

[[ #

## Information Extraction

In [19]:
class ExtractInfo(dspy.Signature):
    """Extract structured information from text."""

    text: str = dspy.InputField()
    title: str = dspy.OutputField()
    headings: list[str] = dspy.OutputField()
    entities: list[dict[str, str]] = dspy.OutputField(desc="a list of entities and their metadata")

module = dspy.Predict(ExtractInfo)

text = "Stampede-like situation at New Delhi Railway Station was reported as a huge crowd reached the New Delhi railway station for Mahakumbh. Most of these passengers were those who did not have confirmed train tickets. According to the Railways, the situation is currently under control now."
response = module(text=text)

rprint(response.title)
rprint(response.headings)
rprint(response.entities)

In [20]:
rprint(dspy.inspect_history())





[34m[2025-11-19T15:55:57.338134][0m

[31mSystem message:[0m

Your input fields are:
1. `text` (str):
Your output fields are:
1. `title` (str): 
2. `headings` (list[str]): 
3. `entities` (list[dict[str, str]]): a list of entities and their metadata
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## text ## ]]
{text}

[[ ## title ## ]]
{title}

[[ ## headings ## ]]
{headings}        # note: the value you produce must adhere to the JSON schema: {"type": "array", "items": {"type": "string"}}

[[ ## entities ## ]]
{entities}        # note: the value you produce must adhere to the JSON schema: {"type": "array", "items": {"type": "object", "additionalProperties": {"type": "string"}}}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Extract structured information from text.


[31mUser message:[0m

[[ ## text ## ]]
Stampede-like situation at New Delhi Railway Station was reported as a huge crowd reac

## Classification

In [21]:
from typing import Literal

class Classify(dspy.Signature):
    """Classify subject of the given sentence"""

    sentence: str = dspy.InputField()
    subject: Literal['Health', 'Math', 'Politics'] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
sentences = ["Regularly monitoring lipid levels is important.", 
             "A one-to-one and onto function is invertible",
             "Opinions are generally getting polarized, it is difficult to find neutral media.  It is either strongly pro or strongly anti government"]

for sentence in sentences:
    pred = classify(sentence=sentence)
    rprint(f"{pred}  :  {sentence}")

## RAG

In [22]:

rag = dspy.ChainOfThought('context, question -> response')

question = "What's the Ramanujan Hardy number and how did it become famous?"
rprint(rag(context=google_search(question), question=question))

In [23]:
dspy.inspect_history(n=1)





[34m[2025-11-19T15:56:38.922599][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str): 
2. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `response` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `context`, `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## context ## ]]
Error: 400, {"message":"Not enough credits","statusCode":400}

[[ ## question ## ]]
What's the Ramanujan Hardy number and how did it become famous?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## rea

In [24]:
rprint(rag)

## Math

In [25]:
math = dspy.ChainOfThought("question -> answer: float")
prediction = math(question="8 queens are placed on a chess-board. What is the probability that none of them attacks each other?")
rprint(prediction)

In [26]:
display(Markdown(replace_latex_delimiters(prediction.reasoning)))

The problem of placing 8 queens on a chessboard such that none of them can attack each other is a well-known combinatorial problem. The total number of ways to place 8 queens on an 8x8 chessboard is given by the number of placements of 8 queens without any restrictions. This is equivalent to choosing 8 out of 64 squares, where the order doesn't matter, which is calculated as combinations. However, since each queen can attack in vertical, horizontal, and diagonal lines, we need to ensure that they are placed such that they do not threaten each other.

The solution to the 8-queens problem, where 8 queens can be placed without threatening each other, is known to have 92 unique solutions. Therefore, the probability $ P $ that a random placement of 8 queens does not result in any two queens attacking each other is given by the ratio of successful outcomes (92 solutions) to the total possible configurations (which is $ 64^8 $ since each queen can go in one of the 64 squares independently). 

So, we calculate:
$ P = \frac{92}{64^8} $

Calculating $ 64^8 $:
$ 64^8 = (2^6)^8 = 2^{48} = 281474976710656 $

Now we compute the probability:
$ P = \frac{92}{281474976710656} \approx 3.265 \times 10^{-15} $

Thus, the probability that none of the queens attack each other is a very small number.

## Multi-modal image classification

In [27]:
class DogPictureSignature(dspy.Signature):
    """Output the dog breed of the dog in the image."""
    image_1: dspy.Image = dspy.InputField(desc="An image of a dog")
    answer: str = dspy.OutputField(desc="The dog breed of the dog in the image")

image_url = "https://picsum.photos/id/237/200/300"
classify = dspy.Predict(DogPictureSignature)
classify(image_1=dspy.Image.from_url(image_url))

Prediction(
    answer='Labrador Retriever'
)

## MCP Tools
Refer to ```dspy_mcp.py``` and ```mcp_server.py```

# DSPy Optimizers

A DSPy optimizer is an algorithm that can tune the parameters of a DSPy program (i.e., the prompts and/or the LM weights) to maximize the metrics you specify, like accuracy.

A typical DSPy optimizer takes three things:

- Your DSPy program.  This may be a single module (e.g., dspy.Predict) or a complex multi-module program.

- Your metric. This is a function that evaluates the output of your program, and assigns it a score (higher is better).

- A few training inputs. This may be very small (i.e., only 5 or 10 examples) and incomplete (only inputs to your program, without any labels).

## What does a DSPy Optimizer tune? How does it tune them?

Different optimizers in DSPy will tune your program's quality by synthesizing good few-shot examples for every module, like dspy.BootstrapRS, proposing and intelligently exploring better natural-language instructions for every prompt, like dspy.MIPROv2, and dspy.GEPA, and building datasets for your modules and using them to finetune the LM weights in your system, like dspy.BootstrapFinetune.

## Which optimizer should I use?
Ultimately, finding the ‘right’ optimizer to use & the best configuration for your task will require experimentation. Success in DSPy is still an iterative process - getting the best performance on your task will require you to explore and iterate.

That being said, here's the general guidance on getting started:

- If you have very few examples (around 10), start with BootstrapFewShot.

- If you have more data (50 examples or more), try BootstrapFewShotWithRandomSearch.

- If you prefer to do instruction optimization only (i.e. you want to keep your prompt 0-shot), use MIPROv2 configured for 0-shot optimization.

- If you’re willing to use more inference calls to perform longer optimization runs (e.g. 40 trials or more), and have enough data (e.g. 200 examples or more to prevent overfitting) then try MIPROv2.

- If you have been able to use one of these with a large LM (e.g., 7B parameters or above) and need a very efficient program, finetune a small LM for your task with BootstrapFinetune.

## Automatic Finetuning
This optimizer is used to fine-tune the underlying LLM(s).

BootstrapFinetune: Distills a prompt-based DSPy program into weight updates. The output is a DSPy program that has the same steps, but where each step is conducted by a finetuned model instead of a prompted LM. See the classification fine-tuning tutorial for a complete example.

## Optimizers to tune model weights using DSPy

Let's walk through a quick example of fine-tuning the LM weights within a DSPy program. We'll apply to a simple 77-way classification task.

1. Dataset preparation

    - Loads the 77 intent labels (like "pending_cash_withdrawal", "balance", etc.) from Banking77.

    - Takes the first 2000 training samples.

    - Wraps each into a dspy.Example.

    - Adds two things:

        - label = ground truth intent (supervised signal).

        - hint = same label (only available during training).

    - .with_inputs("text", "hint") marks inputs for the module (so during training DSPy can provide both, but at test time you don’t need the hint).

In [None]:
import random
from typing import Literal
from dspy.datasets import DataLoader
from datasets import load_dataset

# Load the Banking77 dataset.
CLASSES = load_dataset("PolyAI/banking77", split="train", trust_remote_code=True).features['label'].names
kwargs = dict(fields=("text", "label"), input_keys=("text",), split="train", trust_remote_code=True)

# Load the first 2000 examples from the dataset, and assign a hint to each *training* example.
trainset = [
    dspy.Example(x, hint=CLASSES[x.label], label=CLASSES[x.label]).with_inputs("text", "hint")
    for x in DataLoader().from_huggingface(dataset_name="PolyAI/banking77", **kwargs)[:2000]
]
random.Random(0).shuffle(trainset)

In [None]:
rprint(trainset[:5])

2. Configuration

Uses OpenAI’s GPT-4o-mini as the LM backend for this program.

Experimental mode allows you to use features like ChainOfThoughtWithHint.

Signature: mapping from text → label.
Then you overwrite the label field’s type so it must be one of the 77 Banking intents (Literal[...CLASSES...]).

ChainOfThoughtWithHint = variant of CoT that can optionally consume the hint during training.

- At train time: it sees (text, hint) and learns to reason consistently.

- At inference time: it only sees text.

In [None]:
dspy.configure(lm=dspy.LM('gpt-4o-mini-2024-07-18'))
dspy.settings.experimental = True
# Define the DSPy module for classification. It will use the hint at training time, if available.
signature = dspy.Signature("text -> label").with_updated_fields('label', type_=Literal[tuple(CLASSES)])
classify = dspy.ChainOfThoughtWithHint(signature)

- BootstrapFinetune = DSPy’s self-distillation / finetuning compiler.

    - It runs the teacher LM (GPT-4o-mini) on your trainset to generate traces (CoT reasoning + labels).

    - Then it fine-tunes a student model (by default, the same LM unless you configure a smaller one) using those traces.

- metric = here just checks accuracy (pred.label == gold.label).

- compile() = runs the whole pipeline:

    - Teacher generates rationales/labels.

    - DSPy fine-tunes the student LM with those.

    - Returns a new program (optimized_classifier) that now uses the finetuned student.

In [None]:
# Optimize via BootstrapFinetune.
optimizer = dspy.BootstrapFinetune(metric=(lambda x, y, trace=None: x.label == y.label), num_threads=24)
optimized_classifier = optimizer.compile(classify, trainset=trainset)

optimized_classifier(text="What does a pending cash withdrawal mean?")

[BootstrapFinetune] Preparing the student and teacher programs...
[BootstrapFinetune] Bootstrapping data...
Average Metric: 2.00 / 2 (100.0%):   0%|          | 1/2000 [00:00<00:15, 131.13it/s]

Average Metric: 1972.00 / 2000 (98.6%): 100%|██████████| 2000/2000 [00:07<00:00, 263.51it/s] 


2025/02/16 01:41:53 INFO dspy.evaluate.evaluate: Average Metric: 1972 / 2000 (98.6%)


[BootstrapFinetune] Preparing the train data...
[BootstrapFinetune] Collected data for 2000 examples
[BootstrapFinetune] After filtering with the metric, 1972 examples remain
[BootstrapFinetune] Using 1972 data points for fine-tuning the model: gpt-4o-mini-2024-07-18
[BootstrapFinetune] Starting LM fine-tuning...
[BootstrapFinetune] 1 fine-tuning job(s) to start
[BootstrapFinetune] Starting 1 fine-tuning job(s)...
[OpenAI Provider] Validating the data format
[OpenAI Provider] Saving the data to a file
[OpenAI Provider] Data saved to /home/chandar/.dspy_cache/finetune/b4c1b369e7d19080.jsonl
[OpenAI Provider] Uploading the data to the provider
[OpenAI Provider] Starting remote training
[OpenAI Provider] Job started with the OpenAI Job ID ftjob-081hMRhv27ujEQ7a6tIR00jM
[OpenAI Provider] Waiting for training to complete
[OpenAI Provider] 2025-02-16 01:41:57 Validating training file: file-9gPuW5bKmvuy2Ld3kaF7vw
[OpenAI Provider] 2025-02-16 01:44:06 Fine-tuning job started
[OpenAI Provider] 

Prediction(
    reasoning='A pending cash withdrawal refers to a transaction that has been initiated but has not yet been completed or processed. This status indicates that the request to withdraw cash is still being handled by the bank or financial institution, and the funds have not yet been deducted from the account. It is important for users to understand this status as it affects their available balance and can lead to confusion if they are unaware that the transaction is still in progress.',
    label='pending_cash_withdrawal'
)

## Saving and loading the fine-tuned DSPy program

In [None]:
optimized_classifier.save("./dspy_program/program.pkl", save_program=False)


In [None]:
loaded_dspy_program = dspy.ChainOfThoughtWithHint(signature) # load program with same signature
loaded_dspy_program.load("./dspy_program/program.pkl")



## Comparing metrics of fine-tuned with original DSPy

In [None]:
rprint("Before optimization:\n", classify)
rprint("\nAfter optimization:\n", loaded_dspy_program)


In [None]:
rprint(classify(text="What does a pending cash withdrawal mean?"))

In [None]:
rprint(dspy.inspect_history())





[34m[2025-03-06T20:12:12.269445][0m

[31mSystem message:[0m

Your input fields are:
1. `text` (str)

Your output fields are:
1. `reasoning` (str)
2. `label` (Literal['activate_my_card', 'age_limit', 'apple_pay_or_google_pay', 'atm_support', 'automatic_top_up', 'balance_not_updated_after_bank_transfer', 'balance_not_updated_after_cheque_or_cash_deposit', 'beneficiary_not_allowed', 'cancel_transfer', 'card_about_to_expire', 'card_acceptance', 'card_arrival', 'card_delivery_estimate', 'card_linking', 'card_not_working', 'card_payment_fee_charged', 'card_payment_not_recognised', 'card_payment_wrong_exchange_rate', 'card_swallowed', 'cash_withdrawal_charge', 'cash_withdrawal_not_recognised', 'change_pin', 'compromised_card', 'contactless_not_working', 'country_support', 'declined_card_payment', 'declined_cash_withdrawal', 'declined_transfer', 'direct_debit_payment_not_recognised', 'disposable_card_limits', 'edit_personal_details', 'exchange_charge', 'exchange_rate', 'exchange_via_ap

In [None]:
rprint(loaded_dspy_program(text="What does a pending cash withdrawal mean?"))

In [None]:
rprint(dspy.inspect_history())





[34m[2025-03-06T20:12:36.297994][0m

[31mSystem message:[0m

Your input fields are:
1. `text` (str)

Your output fields are:
1. `reasoning` (str)
2. `label` (Literal['activate_my_card', 'age_limit', 'apple_pay_or_google_pay', 'atm_support', 'automatic_top_up', 'balance_not_updated_after_bank_transfer', 'balance_not_updated_after_cheque_or_cash_deposit', 'beneficiary_not_allowed', 'cancel_transfer', 'card_about_to_expire', 'card_acceptance', 'card_arrival', 'card_delivery_estimate', 'card_linking', 'card_not_working', 'card_payment_fee_charged', 'card_payment_not_recognised', 'card_payment_wrong_exchange_rate', 'card_swallowed', 'cash_withdrawal_charge', 'cash_withdrawal_not_recognised', 'change_pin', 'compromised_card', 'contactless_not_working', 'country_support', 'declined_card_payment', 'declined_cash_withdrawal', 'declined_transfer', 'direct_debit_payment_not_recognised', 'disposable_card_limits', 'edit_personal_details', 'exchange_charge', 'exchange_rate', 'exchange_via_ap

## DSPy Evaluate class.

This class is used to evaluate the performance of a DSPy program. Users need to provide a evaluation dataset and a metric function in order to use this class. 

In [None]:
# Define an evaluator that we can re-use.
evaluate = dspy.Evaluate(devset=trainset, metric=(lambda x, y, trace=None: x.label == y.label), num_threads=24,
                         display_progress=True, display_table=2)

In [None]:
rprint(evaluate(loaded_dspy_program))

Average Metric: 16.00 / 16 (100.0%):   1%|          | 15/2000 [00:00<00:21, 91.85it/s] 

Average Metric: 1995.00 / 2000 (99.8%): 100%|██████████| 2000/2000 [00:07<00:00, 253.19it/s] 


2025/03/06 20:13:26 INFO dspy.evaluate.evaluate: Average Metric: 1995 / 2000 (99.8%)


Unnamed: 0,text,example_label,hint,reasoning,pred_label,<lambda>
0,What if I type in the wrong PIN too many times?,pin_blocked,pin_blocked,The question is about the consequences of entering the wrong PIN m...,pin_blocked,✔️ [True]
1,What if I need to use GBP instead of USD?,exchange_via_app,exchange_via_app,"The user is inquiring about using GBP instead of USD, which indica...",exchange_via_app,✔️ [True]


In [None]:
rprint(evaluate(classify))

Average Metric: 1972.00 / 2000 (98.6%): 100%|██████████| 2000/2000 [00:07<00:00, 271.99it/s] 


2025/03/06 20:13:45 INFO dspy.evaluate.evaluate: Average Metric: 1972 / 2000 (98.6%)


Unnamed: 0,text,example_label,hint,reasoning,pred_label,<lambda>
0,What if I type in the wrong PIN too many times?,pin_blocked,pin_blocked,The question is about the consequences of entering the wrong PIN m...,pin_blocked,✔️ [True]
1,What if I need to use GBP instead of USD?,exchange_via_app,exchange_via_app,"The user is inquiring about using GBP instead of USD, which relate...",exchange_via_app,✔️ [True]


## Conclusion

DSPy represents a paradigm shift in working with LLMs, moving from imperative programming (writing explicit prompts) to declarative self-improving programs. By letting DSPy learn how to optimize interactions, we can achieve:

- Higher-quality outputs,

- Less manual tuning,

- Better generalization across different models.