# AutoGen 0.4 Tutorial: Building AI Agents for Insurance

This tutorial introduces **AutoGen 0.4+**, Microsoft's framework for building multi-agent AI systems. AutoGen excels at creating conversational agents that can collaborate through structured conversations.

## What You'll Learn

1. AutoGen's core concepts: Agents, Tools, and Group Chat
2. Building a Weather Verification Agent with tool calling
3. Building a Claims Eligibility Agent for business logic
4. Orchestrating multi-agent conversations
5. Integrating DSPy for prompt optimization
6. Using MLFlow for experiment tracking

## Prerequisites

- Python 3.10+
- OpenAI API key (or compatible API like z.ai)
- Basic understanding of async Python

---

## Important: AutoGen 0.4 Breaking Changes

If you've used AutoGen before (v0.2.x), **version 0.4 is a complete rewrite**:

| AutoGen 0.2.x | AutoGen 0.4+ |
|---------------|---------------|
| `autogen.AssistantAgent` | `autogen_agentchat.agents.AssistantAgent` |
| `autogen.UserProxyAgent` | Not used - tools go directly on agents |
| Synchronous API | Async-first (`await agent.on_messages()`) |
| `initiate_chat()` | `GroupChat` or direct message passing |

**Your old AutoGen code will NOT work.** This tutorial covers the new 0.4 API.

## 1. Installation & Setup

AutoGen 0.4 is split into multiple packages. Install the core packages:

In [None]:
# Install AutoGen 0.4+ packages
# !pip install autogen-agentchat autogen-ext[openai]

# Additional dependencies for our weather example
# !pip install httpx beautifulsoup4 python-dotenv

In [None]:
import os
import asyncio
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Verify API key is set
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("Please set OPENAI_API_KEY environment variable")

# Optional: Use alternative API endpoint (e.g., z.ai)
api_base = os.getenv("OPENAI_API_BASE", "https://api.openai.com/v1")
model_name = os.getenv("MODEL_NAME", "gpt-4o-mini")

print(f"Using model: {model_name}")
print(f"API base: {api_base}")

## 2. Core Concepts: AutoGen's Architecture

AutoGen 0.4 is built around several key concepts:

### 2.1 Agents

Agents are autonomous entities that can:
- Process messages
- Call tools (functions)
- Generate responses using an LLM

```python
# The main agent type in AutoGen 0.4
from autogen_agentchat.agents import AssistantAgent

agent = AssistantAgent(
    name="my_agent",
    model_client=model_client,  # LLM connection
    tools=[my_function],         # Python functions the agent can call
    system_message="Your role..."
)
```

### 2.2 Model Clients

Model clients handle communication with LLM providers:

```python
from autogen_ext.models.openai import OpenAIChatCompletionClient

client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key="...",
    base_url="..."  # For alternative APIs
)
```

### 2.3 Tools (Functions)

Tools are regular Python functions that agents can call. AutoGen automatically:
- Generates JSON schemas from function signatures
- Handles argument passing
- Supports both sync and async functions

```python
async def my_tool(param: str) -> str:
    """Tool description for the LLM."""
    return f"Result for {param}"
```

### 2.4 Group Chat

Multi-agent orchestration happens through group chats:

```python
from autogen_agentchat.teams import RoundRobinGroupChat

team = RoundRobinGroupChat(
    participants=[agent1, agent2],
    termination_condition=MaxMessageTermination(max_messages=10)
)
```

## 3. Define Tools for Weather Verification

Let's create the tools our Weather Agent will use. In AutoGen, tools are simply async functions with docstrings.

In [None]:
import httpx
from bs4 import BeautifulSoup
from typing import Optional

async def geocode_location(location: str, country: str = "Australia") -> dict:
    """
    Convert a location name to latitude/longitude coordinates using Nominatim.
    
    Args:
        location: Address or place name (e.g., 'Brisbane, QLD')
        country: Country to search within (default: Australia)
    
    Returns:
        Dictionary with lat, lon, and display_name, or error message
    """
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "https://nominatim.openstreetmap.org/search",
            params={
                "q": f"{location}, {country}",
                "format": "json",
                "limit": 1
            },
            headers={"User-Agent": "InsuranceWeatherBot/1.0"},
            timeout=10.0
        )
        
        if response.status_code != 200:
            return {"error": f"Geocoding failed: HTTP {response.status_code}"}
        
        data = response.json()
        if not data:
            return {"error": f"Location not found: {location}"}
        
        result = data[0]
        return {
            "lat": float(result["lat"]),
            "lon": float(result["lon"]),
            "display_name": result["display_name"]
        }

# Test the geocoding tool
result = await geocode_location("Brisbane, QLD")
print(f"Geocoding result: {result}")

In [None]:
async def get_bom_weather(
    latitude: float,
    longitude: float,
    date: str
) -> dict:
    """
    Fetch weather observations from Australian Bureau of Meteorology.
    
    Args:
        latitude: Latitude coordinate (e.g., -27.4698)
        longitude: Longitude coordinate (e.g., 153.0251)
        date: Date in YYYY-MM-DD format
    
    Returns:
        Dictionary with weather events found at this location
    """
    # Parse date components
    try:
        year, month, day = date.split("-")
    except ValueError:
        return {"error": f"Invalid date format: {date}. Use YYYY-MM-DD"}
    
    # Query BOM storms database
    url = "https://reg.bom.gov.au/cgi-bin/climate/storms/get_storms.py"
    params = {
        "begin_day": day,
        "begin_month": month,
        "begin_year": year,
        "end_day": day,
        "end_month": month,
        "end_year": year,
        "lat": latitude,
        "lng": longitude,
        "event": "all",
        "distance_from_point": "50",  # 50km radius
        "states": "all"
    }
    
    async with httpx.AsyncClient() as client:
        response = await client.get(url, params=params, timeout=15.0)
        
        if response.status_code != 200:
            return {"error": f"BOM API failed: HTTP {response.status_code}"}
        
        # Parse HTML response
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Extract weather events from table
        events = []
        rows = soup.find_all('tr')
        
        for row in rows[1:]:  # Skip header row
            cells = row.find_all('td')
            if len(cells) >= 2:
                event_type = cells[0].get_text(strip=True)
                if event_type:
                    events.append(event_type)
        
        # Categorize events
        has_thunderstorm = any(
            'thunder' in e.lower() or 'lightning' in e.lower()
            for e in events
        )
        has_strong_wind = any(
            'wind' in e.lower() or 'gust' in e.lower()
            for e in events
        )
        
        return {
            "date": date,
            "latitude": latitude,
            "longitude": longitude,
            "events_found": events,
            "has_thunderstorm": has_thunderstorm,
            "has_strong_wind": has_strong_wind,
            "event_count": len(events)
        }

# Test the weather tool
weather = await get_bom_weather(-27.4698, 153.0251, "2025-03-07")
print(f"Weather data: {weather}")

## 4. Create the Weather Verification Agent

Now let's create an AutoGen agent that uses these tools to verify weather conditions.

In [None]:
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

# Create the model client
# CRITICAL: For non-OpenAI APIs, you MUST set base_url
model_client = OpenAIChatCompletionClient(
    model=model_name,
    api_key=api_key,
    base_url=api_base if api_base != "https://api.openai.com/v1" else None
)

# Create the Weather Verification Agent
weather_agent = AssistantAgent(
    name="WeatherVerificationAgent",
    model_client=model_client,
    tools=[geocode_location, get_bom_weather],
    system_message="""You are a Weather Verification Agent for an Australian insurance company.

Your job is to verify weather conditions for insurance claims by:
1. Using geocode_location to convert addresses to coordinates
2. Using get_bom_weather to fetch weather data from the Bureau of Meteorology

Always provide a structured summary including:
- Location verified (with coordinates)
- Date checked
- Weather events found
- Whether thunderstorms were detected
- Whether strong winds were detected

Be precise and factual. Do not speculate about events not found in the data."""
)

print(f"Weather Agent created: {weather_agent.name}")
print(f"Tools available: {[t.__name__ for t in weather_agent._tools]}")

## 5. Create the Claims Eligibility Agent

This agent applies business rules to determine CAT event eligibility. It has no tools - just LLM reasoning.

In [None]:
# Create the Claims Eligibility Agent
eligibility_agent = AssistantAgent(
    name="ClaimsEligibilityAgent",
    model_client=model_client,
    tools=[],  # No tools - pure reasoning
    system_message="""You are a Claims Eligibility Agent for an Australian insurance company.

Your job is to determine whether a weather event qualifies as a Catastrophic (CAT) event
based on the weather verification data provided.

## CAT Event Eligibility Rules

**APPROVED** - Qualifies as CAT event if ALL conditions met:
- Location is within Australia (lat: -44 to -10, lon: 112 to 154)
- Date is within the last 90 days and not in the future
- BOTH thunderstorms AND strong winds were detected

**REVIEW** - Needs manual review if:
- Only ONE weather type (thunderstorms OR strong winds) was detected
- Location is near Australian borders

**DENIED** - Does not qualify if:
- Neither thunderstorms nor strong winds detected
- Location is outside Australia
- Date is invalid or too old

## Response Format

Always provide:
1. **Decision**: APPROVED, REVIEW, or DENIED
2. **Reasoning**: Brief explanation of why
3. **Confidence**: High, Medium, or Low
4. **Recommendations**: Any follow-up actions needed"""
)

print(f"Eligibility Agent created: {eligibility_agent.name}")

## 6. Single Agent Interaction

Before setting up multi-agent communication, let's test each agent individually.

In [None]:
from autogen_agentchat.messages import TextMessage
from autogen_core import CancellationToken

async def test_weather_agent():
    """Test the weather agent with a sample query."""
    
    # Create a user message
    user_message = TextMessage(
        content="Please verify weather conditions for Brisbane, QLD on 2025-03-07",
        source="user"
    )
    
    # Send to agent and get response
    response = await weather_agent.on_messages(
        messages=[user_message],
        cancellation_token=CancellationToken()
    )
    
    print("=" * 60)
    print("Weather Agent Response:")
    print("=" * 60)
    print(response.chat_message.content)
    
    return response.chat_message.content

# Run the test
weather_report = await test_weather_agent()

## 7. Multi-Agent Orchestration with Group Chat

AutoGen's power comes from its multi-agent orchestration. Let's set up a group chat where agents collaborate.

In [None]:
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination, MaxMessageTermination
from autogen_agentchat.ui import Console

async def run_claims_pipeline(location: str, date: str):
    """
    Run the full claims verification pipeline with both agents.
    
    Flow: User -> Weather Agent -> Eligibility Agent -> Final Decision
    """
    
    # Create termination conditions
    # Stop when "APPROVED", "DENIED", or "REVIEW" appears in message
    termination = TextMentionTermination("DECISION:") | MaxMessageTermination(max_messages=10)
    
    # Create the group chat team
    # RoundRobinGroupChat: Agents take turns in order
    team = RoundRobinGroupChat(
        participants=[weather_agent, eligibility_agent],
        termination_condition=termination
    )
    
    # Create the task
    task = f"""Process this insurance claim:
    
Location: {location}
Date of Incident: {date}

Weather Agent: Please verify the weather conditions for this location and date.
Eligibility Agent: Once weather data is available, determine CAT event eligibility.

Eligibility Agent must end with 'DECISION: [APPROVED/REVIEW/DENIED]'"""
    
    print("=" * 60)
    print(f"Processing claim for {location} on {date}")
    print("=" * 60)
    
    # Run the team and stream to console
    result = await Console(team.run_stream(task=task))
    
    return result

# Test the full pipeline
result = await run_claims_pipeline("Brisbane, QLD", "2025-03-07")

## 8. Alternative: Selector Group Chat

For more complex workflows, you can use `SelectorGroupChat` where an LLM decides which agent speaks next.

In [None]:
from autogen_agentchat.teams import SelectorGroupChat

async def run_selector_pipeline(location: str, date: str):
    """
    Run claims pipeline with intelligent agent selection.
    The model decides which agent should respond based on context.
    """
    
    termination = TextMentionTermination("DECISION:") | MaxMessageTermination(max_messages=10)
    
    # SelectorGroupChat: LLM picks the next speaker
    team = SelectorGroupChat(
        participants=[weather_agent, eligibility_agent],
        model_client=model_client,  # Uses LLM to select speaker
        termination_condition=termination,
        selector_prompt="""Select the next agent based on the conversation:
        
- WeatherVerificationAgent: Select when location/weather needs to be verified
- ClaimsEligibilityAgent: Select when weather data is available and eligibility needs to be determined

Current agents: {participants}

Read the conversation and select the most appropriate next speaker."""
    )
    
    task = f"""New insurance claim to process:
    
Location: {location}
Incident Date: {date}

Please verify weather and determine CAT eligibility.
End with 'DECISION: [APPROVED/REVIEW/DENIED]'"""
    
    print("=" * 60)
    print(f"Selector GroupChat: {location} on {date}")
    print("=" * 60)
    
    result = await Console(team.run_stream(task=task))
    return result

# Uncomment to test (uses more API calls due to selector)
# result = await run_selector_pipeline("Sydney, NSW", "2025-03-07")

## 9. Testing Multiple Claims

Let's test the pipeline with multiple scenarios to see different outcomes.

In [None]:
# Test cases representing different scenarios
test_claims = [
    # Likely to have storm activity (QLD)
    ("Brisbane, QLD, 4000", "2025-03-07"),
    # Suburban area (QLD)
    ("Mcdowall, QLD, 4053", "2025-03-07"),
    # Major city (NSW)
    ("Sydney, NSW, 2000", "2025-03-07"),
    # Likely calm weather (WA summer)
    ("Perth, WA, 6000", "2025-01-15"),
]

async def test_all_claims():
    """Run all test claims through the pipeline."""
    results = []
    
    for location, date in test_claims:
        print(f"\n{'=' * 60}")
        print(f"Testing: {location} on {date}")
        print("=" * 60)
        
        try:
            result = await run_claims_pipeline(location, date)
            results.append((location, date, "Success", result))
        except Exception as e:
            results.append((location, date, "Error", str(e)))
            print(f"Error: {e}")
        
        # Small delay to avoid rate limiting
        await asyncio.sleep(1)
    
    return results

# Run one test (comment out to run all)
result = await run_claims_pipeline("Brisbane, QLD, 4000", "2025-03-07")

# Uncomment to run all tests
# results = await test_all_claims()

---

## 10. DSPy Integration for Prompt Optimization

DSPy can optimize the prompts used in AutoGen agents. This section shows how to:
1. Create DSPy modules that mirror your agent behavior
2. Optimize prompts using DSPy compilers
3. Export optimized prompts back to AutoGen

### Why DSPy + AutoGen?

- **AutoGen**: Great for multi-agent orchestration
- **DSPy**: Great for prompt optimization
- **Combined**: Optimized prompts in sophisticated agent workflows

In [None]:
# Install DSPy if needed
# !pip install dspy

In [None]:
import dspy

# Configure DSPy to use the same LLM
dspy_lm = dspy.LM(
    model=f"openai/{model_name}",
    api_key=api_key,
    api_base=api_base
)
dspy.configure(lm=dspy_lm)

print(f"DSPy configured with model: {model_name}")

In [None]:
# Define a DSPy Signature for eligibility determination
class EligibilitySignature(dspy.Signature):
    """Determine if a weather event qualifies as a CAT event for insurance."""
    
    weather_report: str = dspy.InputField(
        desc="Weather verification report with events, location, and date"
    )
    
    decision: str = dspy.OutputField(
        desc="One of: APPROVED, REVIEW, or DENIED"
    )
    
    reasoning: str = dspy.OutputField(
        desc="Brief explanation for the decision"
    )
    
    confidence: str = dspy.OutputField(
        desc="Confidence level: High, Medium, or Low"
    )


# Create a DSPy module using Chain of Thought
eligibility_module = dspy.ChainOfThought(EligibilitySignature)

# Test it
test_report = """
Location: Brisbane, Queensland, Australia
Coordinates: -27.4698, 153.0251
Date: 2025-03-07
Events Found: Thunderstorm, Strong Wind Gust
Has Thunderstorm: True
Has Strong Wind: True
"""

result = eligibility_module(weather_report=test_report)
print(f"Decision: {result.decision}")
print(f"Reasoning: {result.reasoning}")
print(f"Confidence: {result.confidence}")

### 10.1 Optimize with Training Examples

DSPy can learn from examples to improve its prompts.

In [None]:
# Create training examples
training_examples = [
    dspy.Example(
        weather_report="""Location: Brisbane, QLD. Coordinates: -27.47, 153.02.
        Date: 2025-03-07. Events: Thunderstorm, Wind Gust 85km/h.
        Has Thunderstorm: True. Has Strong Wind: True.""",
        decision="APPROVED",
        reasoning="Both thunderstorm and strong wind confirmed in valid Australian location within 90 days",
        confidence="High"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Location: Sydney, NSW. Coordinates: -33.87, 151.21.
        Date: 2025-03-07. Events: Light Rain.
        Has Thunderstorm: False. Has Strong Wind: False.""",
        decision="DENIED",
        reasoning="No severe weather events detected - only light rain",
        confidence="High"
    ).with_inputs("weather_report"),
    
    dspy.Example(
        weather_report="""Location: Melbourne, VIC. Coordinates: -37.81, 144.96.
        Date: 2025-03-07. Events: Thunderstorm.
        Has Thunderstorm: True. Has Strong Wind: False.""",
        decision="REVIEW",
        reasoning="Only one severe weather type detected - needs manual review",
        confidence="Medium"
    ).with_inputs("weather_report"),
]

print(f"Created {len(training_examples)} training examples")

In [None]:
# Define a simple metric for evaluation
def eligibility_metric(example, prediction, trace=None):
    """Check if the prediction matches the expected decision."""
    # Normalize decisions for comparison
    expected = example.decision.upper().strip()
    predicted = prediction.decision.upper().strip()
    
    return expected == predicted

# Test current performance
from dspy.evaluate import Evaluate

evaluator = Evaluate(
    devset=training_examples,
    metric=eligibility_metric,
    num_threads=1,
    display_progress=True
)

# Evaluate baseline
baseline_score = evaluator(eligibility_module)
print(f"Baseline accuracy: {baseline_score}%")

In [None]:
# Optimize with BootstrapFewShot (simple optimizer)
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(
    metric=eligibility_metric,
    max_bootstrapped_demos=2,
    max_labeled_demos=2
)

# Compile the optimized module
optimized_eligibility = optimizer.compile(
    eligibility_module,
    trainset=training_examples
)

# Evaluate optimized version
optimized_score = evaluator(optimized_eligibility)
print(f"Optimized accuracy: {optimized_score}%")

### 10.2 Export Optimized Prompt to AutoGen

Extract the optimized prompt and use it in AutoGen.

In [None]:
def extract_dspy_prompt(module) -> str:
    """
    Extract the prompt from a DSPy module for use in other frameworks.
    """
    # Get the signature
    sig = module.signature if hasattr(module, 'signature') else module.predict.signature
    
    # Build prompt from signature
    prompt_parts = []
    
    # Add docstring as instructions
    if sig.__doc__:
        prompt_parts.append(f"Task: {sig.__doc__}\n")
    
    # Add any demos (few-shot examples)
    if hasattr(module, 'demos') and module.demos:
        prompt_parts.append("## Examples\n")
        for i, demo in enumerate(module.demos, 1):
            prompt_parts.append(f"### Example {i}")
            for field, value in demo.items():
                prompt_parts.append(f"{field}: {value}")
            prompt_parts.append("")
    
    # Add input/output field descriptions
    prompt_parts.append("## Input Fields")
    for name, field in sig.input_fields.items():
        desc = field.json_schema_extra.get('desc', '') if hasattr(field, 'json_schema_extra') else ''
        prompt_parts.append(f"- {name}: {desc}")
    
    prompt_parts.append("\n## Output Fields")
    for name, field in sig.output_fields.items():
        desc = field.json_schema_extra.get('desc', '') if hasattr(field, 'json_schema_extra') else ''
        prompt_parts.append(f"- {name}: {desc}")
    
    return "\n".join(prompt_parts)

# Extract the optimized prompt
optimized_prompt = extract_dspy_prompt(optimized_eligibility)
print("Extracted prompt for AutoGen:")
print("=" * 60)
print(optimized_prompt)

In [None]:
# Create an optimized AutoGen agent with the DSPy-enhanced prompt
enhanced_system_message = f"""You are a Claims Eligibility Agent optimized with DSPy.

{optimized_prompt}

## Business Rules

**APPROVED**: Both thunderstorms AND strong winds detected in valid Australian location
**REVIEW**: Only one severe weather type detected
**DENIED**: No severe weather or invalid location/date

Always end your response with:
DECISION: [APPROVED/REVIEW/DENIED]
"""

# Create the enhanced agent
enhanced_eligibility_agent = AssistantAgent(
    name="EnhancedClaimsAgent",
    model_client=model_client,
    tools=[],
    system_message=enhanced_system_message
)

print("Enhanced eligibility agent created with DSPy-optimized prompt")

---

## 11. MLFlow Integration for Experiment Tracking

MLFlow helps you track experiments, compare agent performance, and manage model versions.

In [None]:
# Install MLFlow if needed
# !pip install mlflow

In [None]:
import mlflow
import json
from datetime import datetime

# Set up MLFlow experiment
mlflow.set_experiment("autogen-weather-claims")

print(f"MLFlow tracking URI: {mlflow.get_tracking_uri()}")
print(f"Experiment: autogen-weather-claims")

In [None]:
async def run_tracked_pipeline(location: str, date: str, run_name: str = None):
    """
    Run the claims pipeline with MLFlow tracking.
    """
    
    run_name = run_name or f"claim_{location.split(',')[0]}_{date}"
    
    with mlflow.start_run(run_name=run_name):
        # Log parameters
        mlflow.log_params({
            "framework": "autogen",
            "autogen_version": "0.4",
            "model": model_name,
            "location": location,
            "date": date,
            "orchestration_type": "RoundRobinGroupChat"
        })
        
        # Track timing
        start_time = datetime.now()
        
        try:
            # Create team for this run
            termination = TextMentionTermination("DECISION:") | MaxMessageTermination(max_messages=10)
            team = RoundRobinGroupChat(
                participants=[weather_agent, enhanced_eligibility_agent],
                termination_condition=termination
            )
            
            task = f"""Process insurance claim:
            Location: {location}
            Date: {date}
            End with 'DECISION: [APPROVED/REVIEW/DENIED]'"""
            
            # Run and collect messages
            messages = []
            async for message in team.run_stream(task=task):
                if hasattr(message, 'content'):
                    messages.append({
                        "source": getattr(message, 'source', 'unknown'),
                        "content": str(message.content)[:500]  # Truncate long messages
                    })
            
            end_time = datetime.now()
            duration = (end_time - start_time).total_seconds()
            
            # Extract decision from final message
            final_content = messages[-1]["content"] if messages else ""
            decision = "UNKNOWN"
            for d in ["APPROVED", "DENIED", "REVIEW"]:
                if d in final_content.upper():
                    decision = d
                    break
            
            # Log metrics
            mlflow.log_metrics({
                "duration_seconds": duration,
                "message_count": len(messages),
                "success": 1
            })
            
            # Log artifacts
            mlflow.log_dict(messages, "conversation.json")
            
            # Log tags
            mlflow.set_tags({
                "decision": decision,
                "status": "success"
            })
            
            print(f"\nRun logged to MLFlow")
            print(f"Decision: {decision}")
            print(f"Duration: {duration:.2f}s")
            print(f"Messages: {len(messages)}")
            
            return {
                "decision": decision,
                "duration": duration,
                "messages": messages
            }
            
        except Exception as e:
            mlflow.log_metrics({"success": 0})
            mlflow.set_tags({"status": "error", "error": str(e)[:100]})
            raise

# Run with tracking
result = await run_tracked_pipeline("Brisbane, QLD", "2025-03-07")

In [None]:
# Log DSPy optimization results to MLFlow
with mlflow.start_run(run_name="dspy_optimization"):
    mlflow.log_params({
        "optimizer": "BootstrapFewShot",
        "training_examples": len(training_examples),
        "max_bootstrapped_demos": 2
    })
    
    mlflow.log_metrics({
        "baseline_accuracy": baseline_score,
        "optimized_accuracy": optimized_score,
        "improvement": optimized_score - baseline_score
    })
    
    # Save the optimized prompt as artifact
    mlflow.log_text(optimized_prompt, "optimized_prompt.txt")
    mlflow.log_text(enhanced_system_message, "enhanced_system_message.txt")
    
    print("DSPy optimization results logged to MLFlow")

In [None]:
# View experiment results
experiment = mlflow.get_experiment_by_name("autogen-weather-claims")
runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id])

print("\nExperiment Runs:")
print("=" * 60)
if len(runs) > 0:
    display_cols = ['run_id', 'status', 'start_time']
    # Add metric columns if they exist
    for col in runs.columns:
        if col.startswith('metrics.') or col.startswith('params.') or col.startswith('tags.'):
            display_cols.append(col)
    print(runs[[c for c in display_cols if c in runs.columns]].to_string())
else:
    print("No runs found")

## 12. Summary & Key Takeaways

### What We Covered

1. **AutoGen 0.4 Architecture**: Agents, Model Clients, Tools, and Group Chats
2. **Breaking Changes**: Complete rewrite from 0.2.x - old code won't work
3. **Tool Definition**: Simple async functions with type hints
4. **Multi-Agent Orchestration**: RoundRobinGroupChat and SelectorGroupChat
5. **DSPy Integration**: Optimize prompts and export to AutoGen
6. **MLFlow Tracking**: Log experiments, metrics, and artifacts

### AutoGen Strengths

- Powerful multi-agent orchestration
- Clean async-first design
- Flexible conversation patterns
- Good for complex agent interactions

### AutoGen Challenges

- Steep learning curve (especially migrating from 0.2.x)
- Less type safety than Pydantic AI
- Documentation still catching up to 0.4 changes

### For Insurance Teams

- **Consider AutoGen if**: You need complex multi-agent workflows with conversation
- **Consider alternatives if**: You need maximum type safety (Pydantic AI) or simpler workflows

### Next Steps

1. Experiment with different GroupChat types
2. Add more training examples to DSPy
3. Compare results across different models
4. Explore AutoGen's human-in-the-loop features

In [None]:
# Cleanup - reset agents for fresh runs
await weather_agent.reset()
await eligibility_agent.reset()
await enhanced_eligibility_agent.reset()

print("Agents reset. Tutorial complete!")