# Pydantic-AI Prompt Engineering Experiment

## Overview

This notebook demonstrates the Pydantic-AI-based symptom diagnosis classification approach:

- **Framework**: Pydantic-AI with Agent-based prompting
- **Approach**: Zero-shot prompt engineering (no training required)
- **Model**: Uses agent with structured output validation

## Key Differences from DSPy and LangChain

Unlike DSPy which requires training/optimization:
- Pydantic-AI uses agent-based prompts (no optimizer selection needed)
- Built-in output validation with automatic retries via `ModelRetry`
- "Tuning" validates the agent configuration on train/val sets
- Evaluation recreates the agent from configuration

**Advantages over LangChain**:
- Better type safety with full Pydantic validation
- Simpler validation logic with built-in `ModelRetry`
- Cleaner system prompts via decorators
- More reliable structured output with native mode

## Experiment Configuration

- **Project**: 9-langchain
- **Experiment**: pydantic-ai-pipeline
- **LLM Model**: ollama/qwen3:0.6b
- **Dataset**: 15 train examples, 20 validation examples

## MLflow Tracking

Experiments are logged to MLflow with:
- Metrics (train/validation accuracy)
- Parameters (model config, agent details)
- Artifacts (prediction samples)
- Execution traces with token usage (via autolog)
- Model registration in MLflow registry

In [None]:
"""Import required modules for Pydantic-AI tuning experiments."""
import pandas as pd
from symptom_diagnosis_explorer.commands.classify.tune import (
    TuneCommand,
    TuneRequest,
)
from symptom_diagnosis_explorer.commands.classify.evaluate import (
    EvaluateCommand,
    EvaluateRequest,
)
from symptom_diagnosis_explorer.models.model_development import FrameworkType

In [None]:
import nest_asyncio

# Apply nest_asyncio to allow nested event loops
# This fixes "RuntimeError: This event loop is already running" errors
# when using Pydantic-AI agents in Jupyter notebooks
nest_asyncio.apply()

In [None]:
"""Configuration for Pydantic-AI experiment."""

# Experiment configuration
PROJECT = "9-langchain"
EXPERIMENT_NAME = "pydantic-ai-pipeline"
FULL_EXPERIMENT_NAME = f"/symptom-diagnosis-explorer/{PROJECT}/{EXPERIMENT_NAME}"

# MLflow configuration
MLFLOW_TRACKING_URI = "http://localhost:5001"

# Model configuration
LM_MODEL = "ollama/qwen3:30b"
MODEL_NAME = "symptom-classifier-pydantic-ai"

# Dataset configuration
TRAIN_SIZE = 15
VAL_SIZE = 5
TEST_SIZE = 10

print("Configuration set")
print(f"Experiment: {FULL_EXPERIMENT_NAME}")
print(f"MLflow URI: {MLFLOW_TRACKING_URI}")
print(f"LLM: {LM_MODEL}")
print(f"Dataset: {TRAIN_SIZE} train, {VAL_SIZE} validation, {TEST_SIZE} test")
print(f"Framework: Pydantic-AI (agent-based prompting, no training required)")

In [None]:
"""Run Pydantic-AI agent creation and validation experiment."""

print("=" * 80)
print("EXPERIMENT: Pydantic-AI Agent-Based Prompting")
print("=" * 80)

# Create request with Pydantic-AI framework configuration
tune_request = TuneRequest(
    framework=FrameworkType.PYDANTIC_AI,
    train_size=TRAIN_SIZE,
    val_size=VAL_SIZE,
    model_name=MODEL_NAME,
    experiment_name=FULL_EXPERIMENT_NAME,
    experiment_project=PROJECT,
    lm_model=LM_MODEL,
    mlflow_tracking_uri=MLFLOW_TRACKING_URI,
    pydantic_ai_num_few_shot_examples=0,  # Zero-shot
)

# Execute tuning (creates agent and validates on train/val sets)
print("\nStarting Pydantic-AI agent creation...")
print("Note: Pydantic-AI doesn't require training - this validates the agent.")
tune_command = TuneCommand(tune_request)
tune_response = tune_command.execute()

print("\nPydantic-AI agent validation complete!")

In [None]:
"""Display Pydantic-AI tuning results."""

print("\nPYDANTIC-AI AGENT VALIDATION RESULTS")
print("-" * 80)

# Display metrics
print("\nMetrics:")
print(f"  Train Accuracy:      {tune_response.metrics.train_accuracy:.4f}")
print(f"  Validation Accuracy: {tune_response.metrics.validation_accuracy:.4f}")
print(f"  Train Examples:      {tune_response.metrics.num_train_examples}")
print(f"  Validation Examples: {tune_response.metrics.num_val_examples}")

# Display model info
print("\nModel Registry:")
print(f"  Name:    {tune_response.model_info.name}")
print(f"  Version: {tune_response.model_info.version}")
print(f"  Run ID:  {tune_response.run_id}")

print("\nNote: Pydantic-AI agents use configuration, not learned parameters.")
print("The model registration stores metadata and allows version tracking.")
print("\nCheck MLFlow for execution traces with token usage metrics!")

In [None]:
"""Evaluate Pydantic-AI agent on test set."""

print("\n" + "=" * 80)
print("TEST SET EVALUATION")
print("=" * 80)

print(f"\nEvaluating Pydantic-AI agent: {MODEL_NAME}")
print(f"  Validation Accuracy: {tune_response.metrics.validation_accuracy:.4f}")

# Evaluate on test examples
print(f"\nRunning evaluation on first {TEST_SIZE} test examples...")

# Create evaluation request
eval_request = EvaluateRequest(
    framework=FrameworkType.PYDANTIC_AI,
    model_name=MODEL_NAME,
    model_version=None,  # Use latest version
    split="test",
    eval_size=TEST_SIZE,
    experiment_name=FULL_EXPERIMENT_NAME,
    experiment_project=PROJECT,
    mlflow_tracking_uri=MLFLOW_TRACKING_URI,
)

# Execute evaluation
eval_command = EvaluateCommand()
eval_response = eval_command.execute(eval_request)

# Display results
print("\n" + "-" * 80)
print("TEST SET RESULTS")
print("-" * 80)
print("\nMetrics:")
print(f"  Test Accuracy:  {eval_response.accuracy:.4f}")
print(f"  Test Examples:  {eval_response.num_examples}")
print(f"  Split:          {eval_response.split}")
print(f"  Run ID:         {eval_response.run_id}")

print("\n" + "=" * 80)
print("Evaluation complete! Check MLflow for detailed prediction artifacts.")
print("=" * 80)