# 03 â€” PydanticAI report + tools

This notebook reads `artifacts/last_result.json` and generates a typed report.

- With an API key configured, it uses an LLM via `pydantic-ai`.
- Without an API key, it still demonstrates **typed outputs + tool calls** using `TestModel` (offline).

## API keys

Any of these work:

```bash
export OPENAI_API_KEY="..."
# or
export ANTHROPIC_API_KEY="..."

# or using the app prefix (also supports .env)
export CLEANLAB_DEMO_OPENAI_API_KEY="..."
```


In [1]:
from pathlib import Path
import sys

cwd = Path.cwd()
if (cwd / "src").exists():
    sys.path.insert(0, str(cwd / "src"))
elif (cwd.parent / "src").exists():
    sys.path.insert(0, str(cwd.parent / "src"))


## 1) Ensure we have a result file

Run notebook `01_quickstart_cleanlab.ipynb` or the UI, or run:

```bash
cleanlab-demo run --dataset adult_income --max-rows 12000 --save-json artifacts/last_result.json
```


In [2]:
from cleanlab_demo.settings import settings

settings.ensure_dirs()
path = settings.artifacts_dir / "last_result.json"
path.exists(), path


(True,
 PosixPath('/Users/rezami/PycharmProjects/Cleanlab_demo/artifacts/last_result.json'))

## 2) Generate report (deterministic fallback + optional AI)

`generate_ai_report(..., use_ai=True)` will use an LLM if keys are available; otherwise it falls back.

In [3]:
from cleanlab_demo.ai.report import generate_ai_report

print(generate_ai_report(path, use_ai=False))


{
  "headline": "california_housing / random_forest (regression)",
  "summary": "Found 0 potential label issues. Baseline primary metric: 0.8076667616499951. Best variant: baseline (0.8076667616499951).",
  "key_metrics": {
    "r2": 0.8076667616499951,
    "mae": 0.32582848840116263,
    "rmse": 0.5020310770739131
  },
  "recommended_next_steps": [
    "Start with outliers and near-duplicates detected by Datalab, then retrain and compare.",
    "Inspect top-ranked label issues and verify labels manually.",
    "Try training after removing/relabelling the worst issues and compare metrics.",
    "Compare at least 2 different models (linear + tree/boosting) to validate robustness.",
    "Consider feature engineering or trying more complex models - AUC is below 0.7.",
    "Review 1179 potential outliers flagged by Datalab.",
    "Review 175 potential near-duplicates flagged by Datalab."
  ]
}


In [4]:
# Uses LLM if keys are configured, otherwise falls back
print(generate_ai_report(path, use_ai=True))


{
  "headline": "california_housing / random_forest (regression)",
  "summary": "Found 0 potential label issues. Baseline primary metric: 0.8076667616499951. Best variant: baseline (0.8076667616499951).",
  "key_metrics": {
    "r2": 0.8076667616499951,
    "mae": 0.32582848840116263,
    "rmse": 0.5020310770739131
  },
  "recommended_next_steps": [
    "Set `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` to enable the LLM-backed report.",
    "Start with outliers and near-duplicates detected by Datalab, then retrain and compare.",
    "Inspect top-ranked label issues and verify labels manually.",
    "Try training after removing/relabelling the worst issues and compare metrics.",
    "Compare at least 2 different models (linear + tree/boosting) to validate robustness.",
    "Consider feature engineering or trying more complex models - AUC is below 0.7.",
    "Review 1179 potential outliers flagged by Datalab.",
    "Review 175 potential near-duplicates flagged by Datalab."
  ]
}


## 3) Direct `pydantic-ai` usage (typed output + tools)

This demonstrates:
- typed output (`MiniAnswer`)
- tools (Python functions exposed to the model)
- tool call trace (we list which tools were called)


In [5]:
import json
import os
from dataclasses import dataclass
from typing import Any

from pydantic import BaseModel, Field

from pydantic_ai import Agent, RunContext
from pydantic_ai.models.test import TestModel


class MiniAnswer(BaseModel):
    headline: str
    recommendations: list[str] = Field(default_factory=list)


@dataclass
class Deps:
    result: dict[str, Any]


def select_model() -> object:
    if os.getenv("OPENAI_API_KEY"):
        return "openai:gpt-4o-mini"
    if os.getenv("ANTHROPIC_API_KEY"):
        return "anthropic:claude-3-haiku-20240307"
    # Offline demo: still shows typed outputs + tools
    return TestModel(
        custom_output_args={
            "headline": "Offline pydantic-ai demo (TestModel)",
            "recommendations": [
                "Set OPENAI_API_KEY or ANTHROPIC_API_KEY to use a real LLM.",
                "Compare baseline vs pruned_retrain metrics in the variants table.",
                "Check Datalab issue summary for outliers/near-duplicates and iterate.",
            ],
        }
    )


agent = Agent(
    model=select_model(),
    output_type=MiniAnswer,
    deps_type=Deps,
    system_prompt=(
        "You are an ML engineer. Use the tools to inspect the experiment result and propose next steps. "
        "Return ONLY JSON matching the schema."
    ),
)


@agent.tool
def get_variant_table(ctx: RunContext[Deps]) -> list[dict[str, Any]]:
    """Return the model variants table (baseline/pruned/etc)."""
    return ctx.deps.result.get("variants", [])


@agent.tool
def get_datalab_summary(ctx: RunContext[Deps]) -> list[dict[str, Any]]:
    """Return Datalab issue summary rows."""
    return (ctx.deps.result.get("cleanlab_summary", {}) or {}).get("datalab_issue_summary", [])


deps = Deps(result=json.loads(path.read_text(encoding="utf-8")))
run = agent.run_sync("Generate a short headline + next steps. Call tools.", deps=deps)
run.output


RuntimeError: This event loop is already running

In [None]:
# Tool call trace (which tools were called)
tool_calls = []
for msg in run.all_messages():
    for part in getattr(msg, "parts", []):
        tool_name = getattr(part, "tool_name", None)
        if tool_name:
            tool_calls.append((type(part).__name__, tool_name))
tool_calls
