# Eval Runner and Custom Evaluators

The eval runner discovers `eval_*.py` files, pairs datasets with target functions,
and produces structured reports. This notebook demonstrates discovery, execution,
and the custom evaluators provided by the core library.

In [None]:
from pathlib import Path

from agentic_patterns.core.evals import (
    discover_datasets,
    find_eval_files,
    run_evaluation,
    PrintOptions,
)

## Discovery

The runner scans a directory for `eval_*.py` files, then discovers
`dataset_*` objects, `target_*` functions, and optional `scorer_*` functions.

In [None]:
evals_dir = Path(".")
eval_files = find_eval_files(evals_dir)
print(f"Found eval files: {[f.name for f in eval_files]}")

datasets = discover_datasets(eval_files, verbose=True)

## Running an Evaluation

Each discovered dataset is run against its target function.
The report shows per-case results and aggregate scores.

In [None]:
print_options = PrintOptions(
    include_input=True,
    include_output=True,
    include_reasons=True,
)

for ds in datasets:
    name, success, report = await run_evaluation(ds, print_options, verbose=True)
    print(f"\nResult: {'PASSED' if success else 'FAILED'}")

## Custom Evaluators

The core library provides four agent-specific evaluators that go beyond
basic string or type checks. They can be used in any dataset alongside
the built-in pydantic-evals evaluators.

In [None]:
from pydantic import BaseModel
from pydantic_evals import Case, Dataset

from agentic_patterns.core.evals import (
    OutputContainsJson,
    OutputMatchesSchema,
)

In [None]:
class CityInfo(BaseModel):
    city: str
    country: str


dataset_schema = Dataset(
    cases=[
        Case(
            name="valid_json",
            inputs="valid",
            expected_output='{"city": "Paris", "country": "France"}',
        ),
        Case(
            name="invalid_json",
            inputs="invalid",
            expected_output="not json at all",
        ),
    ],
    evaluators=[
        OutputContainsJson(),
        OutputMatchesSchema(schema=CityInfo),
    ],
)


async def passthrough(inp: str) -> str:
    """Return the expected output as-is for demonstration."""
    # In a real eval, this would call an agent
    case = [c for c in dataset_schema.cases if c.inputs == inp][0]
    return case.expected_output


report = await dataset_schema.evaluate(passthrough)
report.print(include_input=True, include_output=True, include_reasons=True)

## CLI Usage

The same discovery and execution logic is available from the command line:

```bash
# Run all eval_*.py files in a directory
python -m agentic_patterns.core.evals --evals-dir agentic_patterns/examples/evals --verbose

# Filter to a specific dataset
python -m agentic_patterns.core.evals --evals-dir agentic_patterns/examples/evals --filter capitals
```

The CLI returns a non-zero exit code when any evaluation fails, making it suitable as a CI gate.