# CrewAI Tutorial: Building AI Agent Crews for Insurance

This tutorial introduces **CrewAI**, a framework designed around the metaphor of AI "crews" - teams of agents with defined roles working together on tasks.

## What You'll Learn

1. CrewAI's core concepts: Agents, Tasks, Tools, and Crews
2. Building a Weather Verification Agent with tools
3. Building a Claims Eligibility Agent for business logic
4. Orchestrating agent crews with task dependencies
5. Integrating DSPy for prompt optimization
6. Using MLFlow for experiment tracking

## Prerequisites

- Python 3.10+
- OpenAI API key (or compatible API)
- Basic Python knowledge

---

## CrewAI Philosophy

CrewAI takes a **role-based approach** to agents:

| Concept | Description |
|---------|-------------|
| **Agent** | An autonomous unit with a role, goal, and backstory |
| **Task** | A specific piece of work assigned to an agent |
| **Tool** | A capability (function) that agents can use |
| **Crew** | A team of agents that collaborate on tasks |

The framework is intentionally **verbose** - you define rich persona descriptions for each agent. This can improve outputs but increases token usage.

## 1. Installation & Setup

In [None]:
# Install CrewAI
# !pip install crewai crewai-tools

# Additional dependencies
# !pip install httpx beautifulsoup4 python-dotenv

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# CrewAI uses OPENAI_API_KEY by default
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("Please set OPENAI_API_KEY environment variable")

# For alternative APIs (e.g., z.ai), set these:
api_base = os.getenv("OPENAI_API_BASE", "https://api.openai.com/v1")
model_name = os.getenv("MODEL_NAME", "gpt-4o-mini")

# Set environment variables for CrewAI
os.environ["OPENAI_API_BASE"] = api_base
os.environ["OPENAI_MODEL_NAME"] = model_name

print(f"Using model: {model_name}")
print(f"API base: {api_base}")

## 2. Core Concepts: The Crew Metaphor

CrewAI uses a team/crew metaphor that maps well to business processes:

### 2.1 Agents = Team Members

Each agent has:
- **Role**: Their job title/function
- **Goal**: What they're trying to achieve
- **Backstory**: Context that shapes their behavior

```python
from crewai import Agent

agent = Agent(
    role="Senior Weather Analyst",
    goal="Accurately verify weather conditions for insurance claims",
    backstory="You are an experienced meteorologist who specializes in...",
    tools=[my_tool],
    verbose=True
)
```

### 2.2 Tasks = Work Items

Tasks define specific work:

```python
from crewai import Task

task = Task(
    description="Verify weather for {location} on {date}",
    expected_output="Structured weather report",
    agent=weather_agent
)
```

### 2.3 Crews = Teams

Crews orchestrate agents and tasks:

```python
from crewai import Crew, Process

crew = Crew(
    agents=[agent1, agent2],
    tasks=[task1, task2],
    process=Process.sequential  # or Process.hierarchical
)
```

## 3. Define Tools for Weather Verification

In CrewAI, tools are defined using the `@tool` decorator.

In [None]:
import httpx
from bs4 import BeautifulSoup
from crewai.tools import tool

@tool("Geocode Location")
def geocode_location(location: str) -> str:
    """
    Convert a location name to latitude/longitude coordinates.
    Use this to get coordinates before fetching weather data.
    
    Args:
        location: Address or place name (e.g., 'Brisbane, QLD, Australia')
    
    Returns:
        String with coordinates or error message
    """
    try:
        with httpx.Client() as client:
            response = client.get(
                "https://nominatim.openstreetmap.org/search",
                params={
                    "q": f"{location}, Australia",
                    "format": "json",
                    "limit": 1
                },
                headers={"User-Agent": "InsuranceWeatherBot/1.0"},
                timeout=10.0
            )
            
            if response.status_code != 200:
                return f"Error: Geocoding failed with status {response.status_code}"
            
            data = response.json()
            if not data:
                return f"Error: Location not found: {location}"
            
            result = data[0]
            return f"""Location: {result['display_name']}
Latitude: {result['lat']}
Longitude: {result['lon']}"""
    except Exception as e:
        return f"Error geocoding: {str(e)}"

# Test the tool
print(geocode_location("Brisbane, QLD"))

In [None]:
@tool("Get BOM Weather")
def get_bom_weather(latitude: float, longitude: float, date: str) -> str:
    """
    Fetch weather observations from Australian Bureau of Meteorology.
    
    Args:
        latitude: Latitude coordinate (e.g., -27.4698)
        longitude: Longitude coordinate (e.g., 153.0251)
        date: Date in YYYY-MM-DD format
    
    Returns:
        String with weather events or error message
    """
    try:
        # Parse date
        year, month, day = date.split("-")
        
        url = "https://reg.bom.gov.au/cgi-bin/climate/storms/get_storms.py"
        params = {
            "begin_day": day,
            "begin_month": month,
            "begin_year": year,
            "end_day": day,
            "end_month": month,
            "end_year": year,
            "lat": latitude,
            "lng": longitude,
            "event": "all",
            "distance_from_point": "50",
            "states": "all"
        }
        
        with httpx.Client() as client:
            response = client.get(url, params=params, timeout=15.0)
            
            if response.status_code != 200:
                return f"Error: BOM API returned status {response.status_code}"
            
            # Parse HTML
            soup = BeautifulSoup(response.text, 'html.parser')
            rows = soup.find_all('tr')
            
            events = []
            for row in rows[1:]:
                cells = row.find_all('td')
                if len(cells) >= 2:
                    event_type = cells[0].get_text(strip=True)
                    if event_type:
                        events.append(event_type)
            
            # Analyze events
            has_thunderstorm = any('thunder' in e.lower() or 'lightning' in e.lower() for e in events)
            has_strong_wind = any('wind' in e.lower() or 'gust' in e.lower() for e in events)
            
            return f"""Weather Report for {date}
Coordinates: ({latitude}, {longitude})
Events Found: {', '.join(events) if events else 'None'}
Has Thunderstorm: {has_thunderstorm}
Has Strong Wind: {has_strong_wind}
Total Events: {len(events)}"""
    except Exception as e:
        return f"Error fetching weather: {str(e)}"

# Test the tool
print(get_bom_weather(-27.4698, 153.0251, "2025-03-07"))

## 4. Create Agents

Now let's create our agents with rich personas.

In [None]:
from crewai import Agent, LLM

# Configure LLM (for non-OpenAI APIs)
llm = LLM(
    model=f"openai/{model_name}",
    api_key=api_key,
    base_url=api_base
)

# Weather Verification Agent
weather_agent = Agent(
    role="Senior Weather Verification Specialist",
    goal="Accurately verify weather conditions for insurance claim locations using official BOM data",
    backstory="""You are an experienced meteorologist with 15 years at the Bureau of Meteorology.
    Now working as a specialist consultant for insurance companies, you verify whether severe
    weather events actually occurred at claimed locations. You are meticulous, data-driven,
    and never speculate beyond what the data shows. Your reports are used as evidence in
    claim decisions.""",
    tools=[geocode_location, get_bom_weather],
    llm=llm,
    verbose=True,
    allow_delegation=False  # This agent does its own work
)

print(f"Created agent: {weather_agent.role}")

In [None]:
# Claims Eligibility Agent
eligibility_agent = Agent(
    role="Senior Claims Eligibility Officer",
    goal="Determine CAT event eligibility based on verified weather data and business rules",
    backstory="""You are a senior claims officer with 10 years experience in catastrophe claims.
    You've processed thousands of CAT claims and know the eligibility rules inside out.
    You make fair, consistent decisions based on evidence and policy guidelines.
    You always explain your reasoning clearly for audit purposes.
    
    CAT Event Rules:
    - APPROVED: BOTH thunderstorms AND strong winds detected in valid Australian location
    - REVIEW: Only ONE severe weather type detected
    - DENIED: No severe weather or location outside Australia""",
    tools=[],  # No tools - pure reasoning
    llm=llm,
    verbose=True,
    allow_delegation=False
)

print(f"Created agent: {eligibility_agent.role}")

## 5. Define Tasks

Tasks specify what work needs to be done. Note the task dependencies.

In [None]:
from crewai import Task

def create_tasks(location: str, date: str):
    """
    Create the task pipeline for a specific claim.
    """
    
    # Task 1: Weather Verification
    weather_task = Task(
        description=f"""Verify weather conditions for insurance claim:
        
        Location: {location}
        Date of Incident: {date}
        
        Steps:
        1. Use the geocode tool to get coordinates for the location
        2. Use the BOM weather tool to fetch weather data for those coordinates
        3. Compile a structured weather verification report
        
        Your report must include:
        - Verified location and coordinates
        - Date checked
        - All weather events found
        - Whether thunderstorms were detected (True/False)
        - Whether strong winds were detected (True/False)""",
        expected_output="""A structured weather verification report containing:
        - Location details with coordinates
        - Date of verification
        - List of weather events
        - Thunderstorm presence (True/False)
        - Strong wind presence (True/False)""",
        agent=weather_agent
    )
    
    # Task 2: Eligibility Determination (depends on Task 1)
    eligibility_task = Task(
        description="""Review the weather verification report and determine CAT event eligibility.
        
        Apply these rules:
        - APPROVED: Both thunderstorms AND strong winds were detected at a valid Australian location
        - REVIEW: Only ONE weather type (thunderstorms OR strong winds) was detected
        - DENIED: Neither was detected, or location is outside Australia
        
        Provide your decision with:
        1. DECISION: (APPROVED/REVIEW/DENIED)
        2. REASONING: Why this decision was made
        3. CONFIDENCE: (High/Medium/Low)
        4. RECOMMENDATIONS: Any follow-up actions""",
        expected_output="""A formal eligibility determination with:
        - Clear DECISION (APPROVED, REVIEW, or DENIED)
        - REASONING explaining the decision
        - CONFIDENCE level
        - Any RECOMMENDATIONS""",
        agent=eligibility_agent,
        context=[weather_task]  # This task receives output from weather_task
    )
    
    return [weather_task, eligibility_task]

# Create tasks for a test case
tasks = create_tasks("Brisbane, QLD, 4000", "2025-03-07")
print(f"Created {len(tasks)} tasks")

## 6. Assemble and Run the Crew

In [None]:
from crewai import Crew, Process

def run_claims_crew(location: str, date: str):
    """
    Run the claims verification crew for a specific claim.
    """
    
    # Create tasks for this claim
    tasks = create_tasks(location, date)
    
    # Assemble the crew
    crew = Crew(
        agents=[weather_agent, eligibility_agent],
        tasks=tasks,
        process=Process.sequential,  # Tasks run in order
        verbose=True
    )
    
    print("=" * 60)
    print(f"Processing claim for {location} on {date}")
    print("=" * 60)
    
    # Execute the crew
    result = crew.kickoff()
    
    return result

# Run the crew
result = run_claims_crew("Brisbane, QLD, 4000", "2025-03-07")
print("\n" + "=" * 60)
print("FINAL RESULT:")
print("=" * 60)
print(result)

## 7. Alternative: Hierarchical Process

CrewAI also supports hierarchical processes where a manager agent coordinates the crew.

In [None]:
# Create a manager agent
manager_agent = Agent(
    role="Claims Processing Manager",
    goal="Coordinate the claims verification process efficiently and ensure quality outcomes",
    backstory="""You are an experienced claims manager overseeing a team of specialists.
    You delegate work appropriately and ensure all steps are completed thoroughly.
    You review outputs before final decisions.""",
    llm=llm,
    verbose=True,
    allow_delegation=True  # Manager can delegate
)

def run_hierarchical_crew(location: str, date: str):
    """
    Run with hierarchical process - manager coordinates agents.
    """
    tasks = create_tasks(location, date)
    
    crew = Crew(
        agents=[weather_agent, eligibility_agent],
        tasks=tasks,
        manager_agent=manager_agent,
        process=Process.hierarchical,  # Manager coordinates
        verbose=True
    )
    
    return crew.kickoff()

# Uncomment to test hierarchical process
# result = run_hierarchical_crew("Sydney, NSW", "2025-03-07")

## 8. Testing Multiple Claims

In [None]:
import time

test_claims = [
    ("Brisbane, QLD, 4000", "2025-03-07"),
    ("Sydney, NSW, 2000", "2025-03-07"),
    ("Perth, WA, 6000", "2025-01-15"),
]

def test_all_claims():
    """Run all test claims."""
    results = []
    
    for location, date in test_claims:
        print(f"\n{'='*60}")
        print(f"Testing: {location} on {date}")
        print("="*60)
        
        try:
            result = run_claims_crew(location, date)
            results.append((location, date, "Success", str(result)[:200]))
        except Exception as e:
            results.append((location, date, "Error", str(e)))
        
        time.sleep(2)  # Rate limiting
    
    return results

# Uncomment to run all tests
# all_results = test_all_claims()

---

## 9. DSPy Integration for Prompt Optimization

DSPy can optimize the verbose backstories and prompts used in CrewAI agents.

In [None]:
# Install DSPy if needed
# !pip install dspy

In [None]:
import dspy

# Configure DSPy
dspy_lm = dspy.LM(
    model=f"openai/{model_name}",
    api_key=api_key,
    api_base=api_base
)
dspy.configure(lm=dspy_lm)

print("DSPy configured")

In [None]:
# Create a DSPy module for eligibility determination
class EligibilitySignature(dspy.Signature):
    """Determine CAT event eligibility based on weather verification."""
    
    weather_report: str = dspy.InputField(
        desc="Weather verification report with events, location, and date"
    )
    
    decision: str = dspy.OutputField(
        desc="APPROVED, REVIEW, or DENIED"
    )
    
    reasoning: str = dspy.OutputField(
        desc="Explanation for the decision"
    )


eligibility_module = dspy.ChainOfThought(EligibilitySignature)

# Test
test_report = """Location: Brisbane, QLD. Lat: -27.47, Lon: 153.02.
Date: 2025-03-07. Events: Thunderstorm, Wind Gust.
Has Thunderstorm: True. Has Strong Wind: True."""

result = eligibility_module(weather_report=test_report)
print(f"Decision: {result.decision}")
print(f"Reasoning: {result.reasoning}")

In [None]:
# Training examples for optimization
training_examples = [
    dspy.Example(
        weather_report="""Brisbane, QLD. -27.47, 153.02. 2025-03-07.
        Events: Thunderstorm, Wind Gust 85km/h. Has Thunderstorm: True. Has Strong Wind: True.""",
        decision="APPROVED",
        reasoning="Both thunderstorm and strong wind confirmed at valid Australian location"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Sydney, NSW. -33.87, 151.21. 2025-03-07.
        Events: Light Rain. Has Thunderstorm: False. Has Strong Wind: False.""",
        decision="DENIED",
        reasoning="No severe weather events detected"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Melbourne, VIC. -37.81, 144.96. 2025-03-07.
        Events: Thunderstorm only. Has Thunderstorm: True. Has Strong Wind: False.""",
        decision="REVIEW",
        reasoning="Only one severe weather type detected"
    ).with_inputs("weather_report"),
]

print(f"Created {len(training_examples)} training examples")

In [None]:
# Optimize with BootstrapFewShot
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate

def metric(example, prediction, trace=None):
    return example.decision.upper() == prediction.decision.upper()

# Evaluate baseline
evaluator = Evaluate(devset=training_examples, metric=metric, num_threads=1)
baseline_score = evaluator(eligibility_module)
print(f"Baseline accuracy: {baseline_score}%")

# Optimize
optimizer = BootstrapFewShot(metric=metric, max_bootstrapped_demos=2)
optimized_module = optimizer.compile(eligibility_module, trainset=training_examples)

optimized_score = evaluator(optimized_module)
print(f"Optimized accuracy: {optimized_score}%")

### 9.1 Export Optimized Prompt to CrewAI

Use the optimized reasoning as an enhanced backstory.

In [None]:
# Extract demos from optimized module
def build_enhanced_backstory(module) -> str:
    """
    Build an enhanced backstory from DSPy optimization results.
    """
    backstory = """You are a senior claims officer with 10 years experience in catastrophe claims.
You've processed thousands of CAT claims and make consistent decisions based on evidence.

## Decision Rules
- APPROVED: Both thunderstorms AND strong winds detected in valid Australian location
- REVIEW: Only ONE severe weather type detected
- DENIED: No severe weather or location outside Australia

## Example Decisions (from training)
"""
    
    # Add demos if available
    if hasattr(module, 'demos') and module.demos:
        for i, demo in enumerate(module.demos, 1):
            backstory += f"\n### Example {i}\n"
            if hasattr(demo, 'weather_report'):
                backstory += f"Report: {demo.weather_report[:100]}...\n"
            if hasattr(demo, 'decision'):
                backstory += f"Decision: {demo.decision}\n"
            if hasattr(demo, 'reasoning'):
                backstory += f"Reasoning: {demo.reasoning}\n"
    
    return backstory

enhanced_backstory = build_enhanced_backstory(optimized_module)
print("Enhanced backstory:")
print("=" * 60)
print(enhanced_backstory)

In [None]:
# Create enhanced eligibility agent
enhanced_eligibility_agent = Agent(
    role="Senior Claims Eligibility Officer (DSPy-Enhanced)",
    goal="Determine CAT event eligibility using optimized decision logic",
    backstory=enhanced_backstory,
    tools=[],
    llm=llm,
    verbose=True,
    allow_delegation=False
)

print("Created enhanced eligibility agent")

---

## 10. MLFlow Integration for Experiment Tracking

In [None]:
# Install MLFlow if needed
# !pip install mlflow

In [None]:
import mlflow
from datetime import datetime

# Set up experiment
mlflow.set_experiment("crewai-weather-claims")

print(f"MLFlow tracking URI: {mlflow.get_tracking_uri()}")

In [None]:
def run_tracked_crew(location: str, date: str, use_enhanced: bool = False):
    """
    Run crew with MLFlow tracking.
    """
    
    run_name = f"crew_{location.split(',')[0]}_{date}"
    if use_enhanced:
        run_name += "_enhanced"
    
    with mlflow.start_run(run_name=run_name):
        # Log parameters
        mlflow.log_params({
            "framework": "crewai",
            "model": model_name,
            "location": location,
            "date": date,
            "process_type": "sequential",
            "use_enhanced_agent": use_enhanced
        })
        
        start_time = datetime.now()
        
        try:
            # Select agent
            elig_agent = enhanced_eligibility_agent if use_enhanced else eligibility_agent
            
            # Create tasks
            tasks = create_tasks(location, date)
            # Update the eligibility task to use the selected agent
            tasks[1].agent = elig_agent
            
            # Create and run crew
            crew = Crew(
                agents=[weather_agent, elig_agent],
                tasks=tasks,
                process=Process.sequential,
                verbose=False  # Less noise for tracking
            )
            
            result = crew.kickoff()
            
            end_time = datetime.now()
            duration = (end_time - start_time).total_seconds()
            
            # Extract decision
            result_str = str(result)
            decision = "UNKNOWN"
            for d in ["APPROVED", "DENIED", "REVIEW"]:
                if d in result_str.upper():
                    decision = d
                    break
            
            # Log metrics
            mlflow.log_metrics({
                "duration_seconds": duration,
                "success": 1
            })
            
            # Log artifacts
            mlflow.log_text(result_str, "result.txt")
            
            # Log tags
            mlflow.set_tags({
                "decision": decision,
                "status": "success"
            })
            
            print(f"\nRun logged to MLFlow")
            print(f"Decision: {decision}")
            print(f"Duration: {duration:.2f}s")
            
            return {"decision": decision, "duration": duration, "result": result_str}
            
        except Exception as e:
            mlflow.log_metrics({"success": 0})
            mlflow.set_tags({"status": "error", "error": str(e)[:100]})
            raise

# Run with tracking
result = run_tracked_crew("Brisbane, QLD", "2025-03-07", use_enhanced=False)

In [None]:
# Compare standard vs enhanced agent
print("Comparing standard vs enhanced agent...")

# Standard
standard_result = run_tracked_crew("Sydney, NSW", "2025-03-07", use_enhanced=False)

# Enhanced
enhanced_result = run_tracked_crew("Sydney, NSW", "2025-03-07", use_enhanced=True)

print(f"\nStandard: {standard_result['decision']} ({standard_result['duration']:.2f}s)")
print(f"Enhanced: {enhanced_result['decision']} ({enhanced_result['duration']:.2f}s)")

In [None]:
# Log DSPy optimization results
with mlflow.start_run(run_name="dspy_optimization"):
    mlflow.log_params({
        "optimizer": "BootstrapFewShot",
        "training_examples": len(training_examples)
    })
    
    mlflow.log_metrics({
        "baseline_accuracy": baseline_score,
        "optimized_accuracy": optimized_score,
        "improvement": optimized_score - baseline_score
    })
    
    mlflow.log_text(enhanced_backstory, "enhanced_backstory.txt")
    
    print("DSPy optimization logged")

In [None]:
# View experiment results
experiment = mlflow.get_experiment_by_name("crewai-weather-claims")
runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id])

print("\nExperiment Runs:")
print("=" * 60)
if len(runs) > 0:
    cols = [c for c in runs.columns if c.startswith(('run_id', 'status', 'params.', 'metrics.', 'tags.'))]
    print(runs[cols[:10]].to_string())

## 11. Summary & Key Takeaways

### What We Covered

1. **CrewAI Concepts**: Agents with roles/goals/backstories, Tasks, Crews
2. **Tool Definition**: `@tool` decorator for functions
3. **Process Types**: Sequential and hierarchical orchestration
4. **DSPy Integration**: Optimize reasoning and enhance backstories
5. **MLFlow Tracking**: Log experiments and compare results

### CrewAI Strengths

- Intuitive role-based agent design
- Good for complex multi-agent workflows
- Task dependencies handled cleanly
- Manager/hierarchical patterns built-in

### CrewAI Challenges

- Verbose (lots of boilerplate)
- Token-heavy (detailed backstories consume tokens)
- Less type safety than Pydantic AI
- Debugging can be challenging

### For Insurance Teams

- **Good for**: Teams that think in roles and workflows
- **Consider alternatives if**: You need strict type safety or minimal token usage

### Next Steps

1. Experiment with hierarchical processes
2. Add more training examples to DSPy
3. Try different models and compare in MLFlow
4. Build more complex crew configurations

In [None]:
print("Tutorial complete!")