# Haystack Tutorial: Building AI Agents for Insurance

This tutorial introduces **Haystack** by deepset, a framework designed for building production-ready NLP pipelines and AI agents. Haystack is particularly strong for RAG (Retrieval-Augmented Generation) applications.

## What You'll Learn

1. Haystack core concepts: Components and Pipelines
2. Building a Weather Verification Agent with tools
3. Building a Claims Eligibility Agent
4. Creating custom pipeline components
5. Integrating DSPy for prompt optimization
6. Using MLFlow for experiment tracking

## Prerequisites

- Python 3.10+
- OpenAI API key
- Basic Python knowledge

## 1. Installation & Setup

In [None]:
# Install Haystack
# !pip install haystack-ai

# Additional dependencies
# !pip install httpx beautifulsoup4 python-dotenv

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("Please set OPENAI_API_KEY")

api_base = os.getenv("OPENAI_API_BASE", "https://api.openai.com/v1")
model_name = os.getenv("MODEL_NAME", "gpt-4o-mini")

print(f"Model: {model_name}")
print(f"API: {api_base}")

## 2. Core Concepts: Components and Pipelines

Haystack uses a **component-based architecture**:

### Components
- Self-contained units of functionality
- Have defined inputs and outputs
- Can be generators, retrievers, converters, etc.

### Pipelines
- Connect components together
- Define data flow between components
- Can be linear or branching

```python
from haystack import Pipeline
from haystack.components.generators.chat import OpenAIChatGenerator

pipeline = Pipeline()
pipeline.add_component("llm", OpenAIChatGenerator())
```

In [None]:
from haystack import Pipeline
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

# Create a simple chat generator
generator = OpenAIChatGenerator(
    model=model_name,
    api_key=api_key,
    api_base_url=api_base if api_base != "https://api.openai.com/v1" else None
)

# Test
messages = [ChatMessage.from_user("Hello! What's 2+2?")]
result = generator.run(messages=messages)
print(result["replies"][0].text)

## 3. Define Tools as Custom Components

In [None]:
import httpx
from bs4 import BeautifulSoup
from typing import Dict, Any
from haystack import component

@component
class GeocodeComponent:
    """Component to geocode locations."""
    
    @component.output_types(result=Dict[str, Any])
    def run(self, location: str) -> Dict[str, Any]:
        try:
            with httpx.Client() as client:
                response = client.get(
                    "https://nominatim.openstreetmap.org/search",
                    params={"q": f"{location}, Australia", "format": "json", "limit": 1},
                    headers={"User-Agent": "InsuranceBot/1.0"},
                    timeout=10.0
                )
                
                if response.status_code != 200 or not response.json():
                    return {"result": {"error": f"Location not found: {location}"}}
                
                data = response.json()[0]
                return {
                    "result": {
                        "location": data["display_name"],
                        "latitude": float(data["lat"]),
                        "longitude": float(data["lon"])
                    }
                }
        except Exception as e:
            return {"result": {"error": str(e)}}

# Test
geocoder = GeocodeComponent()
print(geocoder.run("Brisbane, QLD"))

In [None]:
@component
class BOMWeatherComponent:
    """Component to fetch BOM weather data."""
    
    @component.output_types(result=Dict[str, Any])
    def run(self, latitude: float, longitude: float, date: str) -> Dict[str, Any]:
        try:
            year, month, day = date.split("-")
            
            url = "https://reg.bom.gov.au/cgi-bin/climate/storms/get_storms.py"
            params = {
                "begin_day": day, "begin_month": month, "begin_year": year,
                "end_day": day, "end_month": month, "end_year": year,
                "lat": latitude, "lng": longitude,
                "event": "all", "distance_from_point": "50", "states": "all"
            }
            
            with httpx.Client() as client:
                response = client.get(url, params=params, timeout=15.0)
                
                if response.status_code != 200:
                    return {"result": {"error": f"HTTP {response.status_code}"}}
                
                soup = BeautifulSoup(response.text, 'html.parser')
                events = []
                for row in soup.find_all('tr')[1:]:
                    cells = row.find_all('td')
                    if len(cells) >= 2:
                        event = cells[0].get_text(strip=True)
                        if event:
                            events.append(event)
                
                has_thunder = any('thunder' in e.lower() or 'lightning' in e.lower() for e in events)
                has_wind = any('wind' in e.lower() or 'gust' in e.lower() for e in events)
                
                return {
                    "result": {
                        "date": date,
                        "events": events,
                        "has_thunderstorm": has_thunder,
                        "has_strong_wind": has_wind
                    }
                }
        except Exception as e:
            return {"result": {"error": str(e)}}

# Test
weather = BOMWeatherComponent()
print(weather.run(-27.4698, 153.0251, "2025-03-07"))

## 4. Create Weather Verification Pipeline

In [None]:
import json

@component
class WeatherReportGenerator:
    """Generate weather verification report using LLM."""
    
    def __init__(self, model: str, api_key: str, api_base: str = None):
        self.generator = OpenAIChatGenerator(
            model=model,
            api_key=api_key,
            api_base_url=api_base
        )
        
    @component.output_types(report=str)
    def run(self, location_data: Dict, weather_data: Dict) -> Dict[str, str]:
        system_msg = """You are a Weather Verification Agent. Create a structured weather report.
        Include: Location, coordinates, date, events found, thunderstorm status, wind status."""
        
        user_msg = f"""Create weather verification report from:
        
Location Data: {json.dumps(location_data)}
Weather Data: {json.dumps(weather_data)}"""
        
        messages = [
            ChatMessage.from_system(system_msg),
            ChatMessage.from_user(user_msg)
        ]
        
        result = self.generator.run(messages=messages)
        return {"report": result["replies"][0].text}

# Create component
report_gen = WeatherReportGenerator(
    model=model_name,
    api_key=api_key,
    api_base=api_base if api_base != "https://api.openai.com/v1" else None
)

In [None]:
def run_weather_pipeline(location: str, date: str) -> str:
    """Run weather verification pipeline."""
    
    print(f"\n[Weather Pipeline] Starting for {location} on {date}")
    
    # Step 1: Geocode
    geo_result = geocoder.run(location)
    location_data = geo_result["result"]
    print(f"  Geocoded: {location_data}")
    
    if "error" in location_data:
        return f"Error: {location_data['error']}"
    
    # Step 2: Get weather
    weather_result = weather.run(
        latitude=location_data["latitude"],
        longitude=location_data["longitude"],
        date=date
    )
    weather_data = weather_result["result"]
    print(f"  Weather: {weather_data}")
    
    # Step 3: Generate report
    report_result = report_gen.run(location_data, weather_data)
    
    print("  Report generated")
    return report_result["report"]

# Test
weather_report = run_weather_pipeline("Brisbane, QLD", "2025-03-07")
print("\n" + "=" * 60)
print(weather_report)

## 5. Create Eligibility Pipeline

In [None]:
@component
class EligibilityDecisionComponent:
    """Determine CAT event eligibility."""
    
    def __init__(self, model: str, api_key: str, api_base: str = None):
        self.generator = OpenAIChatGenerator(
            model=model,
            api_key=api_key,
            api_base_url=api_base
        )
        
    @component.output_types(decision=str)
    def run(self, weather_report: str) -> Dict[str, str]:
        system_msg = """You are a Claims Eligibility Agent.
        
Rules:
- APPROVED: Both thunderstorms AND strong winds detected in Australia
- REVIEW: Only one severe weather type detected
- DENIED: No severe weather or outside Australia

Provide: DECISION, REASONING, CONFIDENCE (High/Medium/Low)"""
        
        messages = [
            ChatMessage.from_system(system_msg),
            ChatMessage.from_user(f"Determine eligibility:\n\n{weather_report}")
        ]
        
        result = self.generator.run(messages=messages)
        return {"decision": result["replies"][0].text}

# Create component
eligibility = EligibilityDecisionComponent(
    model=model_name,
    api_key=api_key,
    api_base=api_base if api_base != "https://api.openai.com/v1" else None
)

In [None]:
def run_full_pipeline(location: str, date: str) -> Dict[str, str]:
    """Run complete claims pipeline."""
    
    print("=" * 60)
    print(f"Processing claim: {location} on {date}")
    print("=" * 60)
    
    # Step 1: Weather verification
    weather_report = run_weather_pipeline(location, date)
    
    # Step 2: Eligibility
    print("\n[Eligibility] Determining...")
    elig_result = eligibility.run(weather_report)
    
    print("\n" + "=" * 60)
    print("FINAL DECISION:")
    print("=" * 60)
    print(elig_result["decision"])
    
    return {
        "weather_report": weather_report,
        "decision": elig_result["decision"]
    }

# Test
result = run_full_pipeline("Brisbane, QLD", "2025-03-07")

---

## 6. DSPy Integration

In [None]:
# !pip install dspy

In [None]:
import dspy

dspy_lm = dspy.LM(model=f"openai/{model_name}", api_key=api_key, api_base=api_base)
dspy.configure(lm=dspy_lm)

class EligibilitySignature(dspy.Signature):
    """Determine CAT event eligibility."""
    weather_report: str = dspy.InputField()
    decision: str = dspy.OutputField(desc="APPROVED, REVIEW, or DENIED")
    reasoning: str = dspy.OutputField()

elig_module = dspy.ChainOfThought(EligibilitySignature)

# Test
test = "Brisbane. -27.47, 153.02. Thunderstorm, Wind Gust. Has Thunderstorm: True. Has Strong Wind: True."
r = elig_module(weather_report=test)
print(f"Decision: {r.decision}")

In [None]:
# Training examples
examples = [
    dspy.Example(weather_report="Brisbane. Thunder, Wind. Has Thunderstorm: True. Has Strong Wind: True.",
                 decision="APPROVED", reasoning="Both conditions met").with_inputs("weather_report"),
    dspy.Example(weather_report="Sydney. Rain. Has Thunderstorm: False. Has Strong Wind: False.",
                 decision="DENIED", reasoning="No severe weather").with_inputs("weather_report"),
    dspy.Example(weather_report="Melbourne. Thunder only. Has Thunderstorm: True. Has Strong Wind: False.",
                 decision="REVIEW", reasoning="One condition met").with_inputs("weather_report"),
]

# Optimize
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate

def metric(ex, pred, trace=None):
    return ex.decision.upper() == pred.decision.upper()

evaluator = Evaluate(devset=examples, metric=metric, num_threads=1)
baseline = evaluator(elig_module)
print(f"Baseline: {baseline}%")

optimizer = BootstrapFewShot(metric=metric, max_bootstrapped_demos=2)
optimized = optimizer.compile(elig_module, trainset=examples)
opt_score = evaluator(optimized)
print(f"Optimized: {opt_score}%")

In [None]:
# Create DSPy-enhanced Haystack component
@component
class DSPyEligibilityComponent:
    """Haystack component using DSPy-optimized module."""
    
    def __init__(self, dspy_module):
        self.module = dspy_module
        
    @component.output_types(decision=str, reasoning=str)
    def run(self, weather_report: str) -> Dict[str, str]:
        result = self.module(weather_report=weather_report)
        return {
            "decision": result.decision,
            "reasoning": result.reasoning
        }

dspy_elig = DSPyEligibilityComponent(optimized)

# Test
r = dspy_elig.run(test)
print(f"DSPy Decision: {r['decision']}")

---

## 7. MLFlow Integration

In [None]:
import mlflow
from datetime import datetime

mlflow.set_experiment("haystack-claims")

def run_tracked(location: str, date: str, use_dspy: bool = False):
    run_name = f"haystack_{location.split(',')[0]}_{date}"
    if use_dspy:
        run_name += "_dspy"
    
    with mlflow.start_run(run_name=run_name):
        mlflow.log_params({"framework": "haystack", "model": model_name, "location": location, "use_dspy": use_dspy})
        
        start = datetime.now()
        
        weather_report = run_weather_pipeline(location, date)
        
        if use_dspy:
            elig_result = dspy_elig.run(weather_report)
            decision_text = f"Decision: {elig_result['decision']}\nReasoning: {elig_result['reasoning']}"
        else:
            elig_result = eligibility.run(weather_report)
            decision_text = elig_result["decision"]
        
        duration = (datetime.now() - start).total_seconds()
        
        decision = "UNKNOWN"
        for d in ["APPROVED", "DENIED", "REVIEW"]:
            if d in decision_text.upper():
                decision = d
                break
        
        mlflow.log_metrics({"duration": duration, "success": 1})
        mlflow.set_tags({"decision": decision})
        
        print(f"Logged: {decision}, {duration:.2f}s")
        return {"decision": decision, "duration": duration}

# Run
run_tracked("Brisbane, QLD", "2025-03-07", use_dspy=False)
run_tracked("Brisbane, QLD", "2025-03-07", use_dspy=True)

## 8. Summary

### Covered
- Haystack components and pipelines
- Custom tool components
- DSPy integration via components
- MLFlow tracking

### Strengths
- Great for RAG applications
- Production-ready design
- Modular components

### Challenges
- Learning curve for component system
- More verbose than some frameworks

In [None]:
print("Tutorial complete!")