# DSPy Tutorial: Programmatic LLM Optimization for Insurance

This tutorial is a comprehensive guide to **DSPy**, Stanford's framework for "programming - not prompting" language models. DSPy treats prompts as optimizable programs, enabling systematic improvement of LLM behavior.

## What You'll Learn

1. DSPy core concepts: Signatures, Modules, and Optimizers
2. Building Weather Verification with ReAct
3. Building Eligibility Determination with ChainOfThought
4. Prompt optimization with MIPRO, BootstrapFewShot, and GEPA
5. Exporting optimized prompts to other frameworks
6. MLFlow integration for experiment tracking

## Prerequisites

- Python 3.10+
- OpenAI API key (or Anthropic)
- Understanding of ML concepts

---

## Why DSPy?

| Traditional Prompting | DSPy Approach |
|----------------------|---------------|
| Manual prompt engineering | Automated optimization |
| Trial and error | Systematic metrics |
| Prompts as strings | Prompts as programs |
| Hard to maintain | Modular and composable |

## 1. Installation & Setup

In [None]:
# !pip install dspy httpx beautifulsoup4 python-dotenv mlflow

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")
api_base = os.getenv("OPENAI_API_BASE", "https://api.openai.com/v1")
model_name = os.getenv("MODEL_NAME", "gpt-4o-mini")

print(f"Model: {model_name}")

In [None]:
import dspy

# Configure DSPy
lm = dspy.LM(
    model=f"openai/{model_name}",
    api_key=api_key,
    api_base=api_base
)
dspy.configure(lm=lm)

print("DSPy configured")

## 2. Core Concepts

### 2.1 Signatures

Signatures define the input/output interface:

```python
class MySignature(dspy.Signature):
    """Docstring becomes the task description."""
    input_field: str = dspy.InputField(desc="What this input represents")
    output_field: str = dspy.OutputField(desc="What to generate")
```

### 2.2 Modules

Modules implement reasoning patterns:

- `dspy.Predict`: Direct generation
- `dspy.ChainOfThought`: Step-by-step reasoning
- `dspy.ReAct`: Reasoning + Actions (tool use)

### 2.3 Optimizers (Teleprompters)

Optimizers improve prompts automatically:

- `BootstrapFewShot`: Generate examples from training data
- `MIPRO`: Multi-stage instruction optimization
- `GEPA`: Genetic evolution of prompts

## 3. Define Signatures

In [None]:
# Weather Verification Signature
class WeatherVerificationSignature(dspy.Signature):
    """Verify weather conditions for an insurance claim location."""
    
    task: str = dspy.InputField(
        desc="Task describing location and date to verify"
    )
    
    weather_report: str = dspy.OutputField(
        desc="Structured weather report with location, coordinates, events, thunderstorm status, wind status"
    )

# Eligibility Signature
class EligibilitySignature(dspy.Signature):
    """Determine CAT event eligibility based on weather verification report."""
    
    weather_report: str = dspy.InputField(
        desc="Weather verification report with events and conditions"
    )
    
    decision: str = dspy.OutputField(
        desc="One of: APPROVED, REVIEW, or DENIED"
    )
    
    reasoning: str = dspy.OutputField(
        desc="Brief explanation for the decision"
    )
    
    confidence: str = dspy.OutputField(
        desc="Confidence level: High, Medium, or Low"
    )

print("Signatures defined")

## 4. Define Tools for ReAct Agent

In [None]:
import httpx
from bs4 import BeautifulSoup

def geocode_location(location: str) -> str:
    """
    Convert a location name to latitude/longitude coordinates.
    Use this to get coordinates before fetching weather data.
    
    Args:
        location: Address or place name (e.g., 'Brisbane, QLD')
    
    Returns:
        String with location details and coordinates
    """
    try:
        with httpx.Client() as client:
            r = client.get(
                "https://nominatim.openstreetmap.org/search",
                params={"q": f"{location}, Australia", "format": "json", "limit": 1},
                headers={"User-Agent": "InsuranceBot/1.0"},
                timeout=10.0
            )
            if r.status_code == 200 and r.json():
                d = r.json()[0]
                return f"Location: {d['display_name']}\nLatitude: {d['lat']}\nLongitude: {d['lon']}"
            return "Error: Location not found"
    except Exception as e:
        return f"Error: {e}"

def get_bom_weather(latitude: str, longitude: str, date: str) -> str:
    """
    Fetch weather observations from Australian Bureau of Meteorology.
    
    Args:
        latitude: Latitude coordinate as string
        longitude: Longitude coordinate as string
        date: Date in YYYY-MM-DD format
    
    Returns:
        Weather report with events found
    """
    try:
        year, month, day = date.split("-")
        url = "https://reg.bom.gov.au/cgi-bin/climate/storms/get_storms.py"
        params = {
            "begin_day": day, "begin_month": month, "begin_year": year,
            "end_day": day, "end_month": month, "end_year": year,
            "lat": float(latitude), "lng": float(longitude),
            "event": "all", "distance_from_point": "50", "states": "all"
        }
        
        with httpx.Client() as client:
            r = client.get(url, params=params, timeout=15.0)
            if r.status_code != 200:
                return f"Error: HTTP {r.status_code}"
            
            soup = BeautifulSoup(r.text, 'html.parser')
            events = []
            for row in soup.find_all('tr')[1:]:
                cells = row.find_all('td')
                if len(cells) >= 2:
                    event = cells[0].get_text(strip=True)
                    if event:
                        events.append(event)
            
            has_thunder = any('thunder' in e.lower() or 'lightning' in e.lower() for e in events)
            has_wind = any('wind' in e.lower() or 'gust' in e.lower() for e in events)
            
            return f"""Date: {date}
Events: {', '.join(events) if events else 'None'}
Has Thunderstorm: {has_thunder}
Has Strong Wind: {has_wind}"""
    except Exception as e:
        return f"Error: {e}"

# Test
print(geocode_location("Brisbane, QLD"))

## 5. Create DSPy Modules

In [None]:
# Weather Agent using ReAct
weather_agent = dspy.ReAct(
    signature=WeatherVerificationSignature,
    tools=[geocode_location, get_bom_weather],
    max_iters=5
)

# Test
print("Testing Weather Agent...")
weather_result = weather_agent(task="Verify weather for Brisbane, QLD on 2025-03-07")
print(f"\nWeather Report:\n{weather_result.weather_report}")

In [None]:
# Eligibility Agent using ChainOfThought
eligibility_agent = dspy.ChainOfThought(EligibilitySignature)

# Test
print("Testing Eligibility Agent...")
elig_result = eligibility_agent(weather_report=weather_result.weather_report)
print(f"\nDecision: {elig_result.decision}")
print(f"Reasoning: {elig_result.reasoning}")
print(f"Confidence: {elig_result.confidence}")

## 6. Create Training Examples

In [None]:
# Training examples for eligibility determination
training_examples = [
    dspy.Example(
        weather_report="""Location: Brisbane, Queensland, Australia
Coordinates: -27.4698, 153.0251
Date: 2025-03-07
Events: Thunderstorm, Wind Gust 85km/h, Heavy Rain
Has Thunderstorm: True
Has Strong Wind: True""",
        decision="APPROVED",
        reasoning="Both thunderstorm and strong wind confirmed at valid Australian location within last 90 days",
        confidence="High"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Location: Sydney, New South Wales, Australia
Coordinates: -33.8688, 151.2093
Date: 2025-03-07
Events: Light Rain, Overcast
Has Thunderstorm: False
Has Strong Wind: False""",
        decision="DENIED",
        reasoning="No severe weather events detected - only light rain and overcast conditions",
        confidence="High"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Location: Melbourne, Victoria, Australia
Coordinates: -37.8136, 144.9631
Date: 2025-03-07
Events: Thunderstorm
Has Thunderstorm: True
Has Strong Wind: False""",
        decision="REVIEW",
        reasoning="Only thunderstorm detected, strong wind not confirmed - needs manual review",
        confidence="Medium"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Location: Perth, Western Australia, Australia
Coordinates: -31.9505, 115.8605
Date: 2025-01-15
Events: Wind Gust 70km/h
Has Thunderstorm: False
Has Strong Wind: True""",
        decision="REVIEW",
        reasoning="Only strong wind detected without thunderstorm - partial conditions met",
        confidence="Medium"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Location: Darwin, Northern Territory, Australia
Coordinates: -12.4634, 130.8456
Date: 2025-02-20
Events: Thunderstorm, Wind Gust 95km/h, Hail
Has Thunderstorm: True
Has Strong Wind: True""",
        decision="APPROVED",
        reasoning="Severe weather confirmed with both thunderstorm and strong winds plus hail",
        confidence="High"
    ).with_inputs("weather_report"),
]

print(f"Created {len(training_examples)} training examples")

## 7. Define Evaluation Metric

In [None]:
def eligibility_metric(example, prediction, trace=None):
    """
    Evaluate eligibility prediction.
    
    Returns score based on:
    - Decision match (primary)
    - Confidence appropriateness (secondary)
    """
    # Check decision match
    expected_decision = example.decision.upper().strip()
    predicted_decision = prediction.decision.upper().strip()
    
    decision_match = expected_decision == predicted_decision
    
    if not decision_match:
        return 0.0
    
    # Bonus for appropriate confidence
    expected_conf = example.confidence.lower().strip()
    predicted_conf = prediction.confidence.lower().strip()
    
    conf_match = expected_conf == predicted_conf
    
    return 1.0 if conf_match else 0.8

# Test metric
from dspy.evaluate import Evaluate

evaluator = Evaluate(
    devset=training_examples,
    metric=eligibility_metric,
    num_threads=1,
    display_progress=True
)

# Evaluate baseline
baseline_score = evaluator(eligibility_agent)
print(f"\nBaseline Score: {baseline_score}%")

## 8. Optimize with BootstrapFewShot

In [None]:
from dspy.teleprompt import BootstrapFewShot

# BootstrapFewShot: Generate examples from the model itself
bootstrap_optimizer = BootstrapFewShot(
    metric=eligibility_metric,
    max_bootstrapped_demos=3,
    max_labeled_demos=3
)

# Compile
print("Optimizing with BootstrapFewShot...")
optimized_bootstrap = bootstrap_optimizer.compile(
    eligibility_agent,
    trainset=training_examples
)

# Evaluate
bootstrap_score = evaluator(optimized_bootstrap)
print(f"\nBootstrap Score: {bootstrap_score}%")
print(f"Improvement: {bootstrap_score - baseline_score:.1f}%")

## 9. Optimize with MIPRO (Advanced)

In [None]:
try:
    from dspy.teleprompt import MIPRO
    
    # MIPRO: Multi-stage instruction optimization
    mipro_optimizer = MIPRO(
        metric=eligibility_metric,
        num_candidates=3,
        init_temperature=1.0
    )
    
    print("Optimizing with MIPRO (this may take a while)...")
    optimized_mipro = mipro_optimizer.compile(
        eligibility_agent,
        trainset=training_examples,
        num_batches=2
    )
    
    mipro_score = evaluator(optimized_mipro)
    print(f"\nMIPRO Score: {mipro_score}%")
except ImportError:
    print("MIPRO not available in this DSPy version")
    optimized_mipro = optimized_bootstrap

## 10. Extract and Export Optimized Prompts

In [None]:
def extract_optimized_prompt(module, name="Eligibility") -> str:
    """
    Extract optimized prompt from DSPy module for use in other frameworks.
    """
    prompt_parts = [f"# {name} Agent (DSPy Optimized)\n"]
    
    # Get signature
    sig = getattr(module, 'signature', None)
    if sig is None and hasattr(module, 'predict'):
        sig = module.predict.signature
    
    if sig and sig.__doc__:
        prompt_parts.append(f"## Task\n{sig.__doc__}\n")
    
    # Get demos (few-shot examples)
    demos = getattr(module, 'demos', [])
    if demos:
        prompt_parts.append("## Examples\n")
        for i, demo in enumerate(demos, 1):
            prompt_parts.append(f"### Example {i}")
            for key, value in demo.items() if hasattr(demo, 'items') else vars(demo).items():
                if not key.startswith('_'):
                    val_str = str(value)[:200] + "..." if len(str(value)) > 200 else str(value)
                    prompt_parts.append(f"**{key}**: {val_str}")
            prompt_parts.append("")
    
    # Get field descriptions
    if sig:
        prompt_parts.append("## Input Fields")
        for name, field in sig.input_fields.items():
            desc = ""
            if hasattr(field, 'json_schema_extra') and field.json_schema_extra:
                desc = field.json_schema_extra.get('desc', '')
            prompt_parts.append(f"- **{name}**: {desc}")
        
        prompt_parts.append("\n## Output Fields")
        for name, field in sig.output_fields.items():
            desc = ""
            if hasattr(field, 'json_schema_extra') and field.json_schema_extra:
                desc = field.json_schema_extra.get('desc', '')
            prompt_parts.append(f"- **{name}**: {desc}")
    
    return "\n".join(prompt_parts)

# Extract
optimized_prompt = extract_optimized_prompt(optimized_bootstrap)
print("Extracted Optimized Prompt:")
print("=" * 60)
print(optimized_prompt)

In [None]:
# Save optimized prompt for use in other frameworks
def save_for_framework(prompt: str, framework: str):
    """
    Format and save optimized prompt for specific framework.
    """
    
    if framework == "pydantic_ai":
        return f'''"""DSPy-optimized system prompt for Pydantic AI."""

SYSTEM_PROMPT = """
{prompt}

## Business Rules
- APPROVED: Both thunderstorms AND strong winds in valid Australian location
- REVIEW: Only one severe weather type detected
- DENIED: No severe weather or outside Australia
"""
'''
    
    elif framework == "langchain":
        return f'''from langchain_core.prompts import ChatPromptTemplate

# DSPy-optimized prompt template
ELIGIBILITY_PROMPT = ChatPromptTemplate.from_messages([
    ("system", """{prompt}"""),
    ("human", "Determine eligibility for:\\n\\n{{weather_report}}")
])
'''
    
    return prompt

# Generate for different frameworks
pydantic_prompt = save_for_framework(optimized_prompt, "pydantic_ai")
print("Pydantic AI format:")
print(pydantic_prompt[:500] + "...")

## 11. Complete Claims Pipeline

In [None]:
def process_claim_dspy(location: str, date: str, use_optimized: bool = True):
    """
    Process insurance claim using DSPy agents.
    """
    
    print("=" * 60)
    print(f"Processing: {location} on {date}")
    print(f"Using {'optimized' if use_optimized else 'baseline'} model")
    print("=" * 60)
    
    # Step 1: Weather verification with ReAct
    print("\n[Weather Agent] Verifying...")
    weather_result = weather_agent(task=f"Verify weather for {location} on {date}")
    print(f"Report: {weather_result.weather_report[:200]}...")
    
    # Step 2: Eligibility with optimized module
    print("\n[Eligibility Agent] Determining...")
    agent = optimized_bootstrap if use_optimized else eligibility_agent
    elig_result = agent(weather_report=weather_result.weather_report)
    
    print("\n" + "=" * 60)
    print("DECISION:")
    print("=" * 60)
    print(f"Decision: {elig_result.decision}")
    print(f"Reasoning: {elig_result.reasoning}")
    print(f"Confidence: {elig_result.confidence}")
    
    return {
        "weather_report": weather_result.weather_report,
        "decision": elig_result.decision,
        "reasoning": elig_result.reasoning,
        "confidence": elig_result.confidence
    }

# Test
result = process_claim_dspy("Brisbane, QLD", "2025-03-07", use_optimized=True)

---

## 12. MLFlow Integration

In [None]:
import mlflow
from datetime import datetime

mlflow.set_experiment("dspy-claims-optimization")

def run_tracked_claim(location: str, date: str, use_optimized: bool = True):
    """Run claim with MLFlow tracking."""
    
    run_name = f"dspy_{location.split(',')[0]}_{'opt' if use_optimized else 'base'}"
    
    with mlflow.start_run(run_name=run_name):
        mlflow.log_params({
            "framework": "dspy",
            "model": model_name,
            "location": location,
            "date": date,
            "use_optimized": use_optimized,
            "optimizer": "BootstrapFewShot" if use_optimized else "None"
        })
        
        start = datetime.now()
        result = process_claim_dspy(location, date, use_optimized)
        duration = (datetime.now() - start).total_seconds()
        
        mlflow.log_metrics({"duration_seconds": duration})
        mlflow.log_text(result["weather_report"], "weather_report.txt")
        mlflow.set_tags({
            "decision": result["decision"],
            "confidence": result["confidence"]
        })
        
        print(f"\nLogged: {result['decision']}, {duration:.2f}s")
        return result

# Run tracked
run_tracked_claim("Brisbane, QLD", "2025-03-07", use_optimized=True)

In [None]:
# Log optimization results
with mlflow.start_run(run_name="optimization_comparison"):
    mlflow.log_params({
        "optimizer": "BootstrapFewShot",
        "training_examples": len(training_examples),
        "max_bootstrapped_demos": 3
    })
    
    mlflow.log_metrics({
        "baseline_score": baseline_score,
        "optimized_score": bootstrap_score,
        "improvement": bootstrap_score - baseline_score
    })
    
    mlflow.log_text(optimized_prompt, "optimized_prompt.md")
    
    print("Optimization comparison logged")

In [None]:
# Compare baseline vs optimized
print("Comparing baseline vs optimized...\n")

test_cases = [
    ("Brisbane, QLD", "2025-03-07"),
    ("Sydney, NSW", "2025-03-07"),
]

for loc, dt in test_cases:
    print(f"\n{'='*60}")
    print(f"Testing: {loc}")
    
    # Baseline
    base = run_tracked_claim(loc, dt, use_optimized=False)
    
    # Optimized  
    opt = run_tracked_claim(loc, dt, use_optimized=True)
    
    print(f"\nBaseline: {base['decision']}")
    print(f"Optimized: {opt['decision']}")

In [None]:
# View experiment
exp = mlflow.get_experiment_by_name("dspy-claims-optimization")
runs = mlflow.search_runs(experiment_ids=[exp.experiment_id])

print("\nExperiment Runs:")
cols = ['run_id', 'params.location', 'params.use_optimized', 'metrics.duration_seconds', 'tags.decision']
cols = [c for c in cols if c in runs.columns]
print(runs[cols].to_string())

## 13. Summary & Key Takeaways

### What We Covered

1. **DSPy Signatures**: Declarative input/output specifications
2. **DSPy Modules**: ChainOfThought, ReAct for different reasoning patterns
3. **Tools**: Integrating Python functions with ReAct agents
4. **Optimization**: BootstrapFewShot for automatic prompt improvement
5. **Export**: Extracting prompts for use in other frameworks
6. **MLFlow**: Tracking experiments and comparing results

### DSPy Strengths

- **Systematic optimization** - no more manual prompt tweaking
- **Composable modules** - build complex pipelines from simple parts
- **Portable prompts** - export optimized prompts to any framework
- **Metrics-driven** - optimize based on actual performance

### DSPy Challenges

- Learning curve (different paradigm)
- Less control over exact prompt text
- Optimization requires good training data

### For Insurance Teams

- **Use DSPy when**: You need to optimize prompts systematically
- **Combine with**: Other frameworks for deployment (Pydantic AI for production)
- **Best practice**: Optimize with DSPy, export to your production framework

### Next Steps

1. Add more training examples
2. Try different optimizers (MIPRO, GEPA)
3. Export optimized prompts to Pydantic AI
4. Set up automated optimization pipelines

In [None]:
print("Tutorial complete!")