# 02 — Custom Task Creation

This notebook shows how to:

- Create a new task definition (YAML) compatible with MedAISure
- Validate and load the task via `EvaluationHarness`
- Run a simple local model against the new task

We'll write a small YAML file into `bench/tasks/custom_demo.yaml`.


In [None]:
from pathlib import Path
import yaml
from bench.evaluation.harness import EvaluationHarness

tasks_dir = Path("bench/tasks")
custom_path = tasks_dir / "custom_demo.yaml"

custom_task = {
    "name": "Custom Demo QA",
    "description": "A tiny example QA task created from a notebook.",
    "task_type": "qa",
    "input_schema": {
        "type": "object",
        "properties": {"text": {"type": "string"}},
        "required": ["text"],
    },
    "output_schema": {
        "type": "object",
        "properties": {"label": {"type": "string"}},
        "required": ["label"],
    },
    "metrics": [{"name": "clinical_accuracy"}, {"name": "reasoning_quality"}],
    "dataset": [
        {
            "input": {"text": "Patient presents with fever and cough."},
            "output": {"label": "flu-like"},
        },
        {
            "input": {"text": "Complaints of chest pain during exertion."},
            "output": {"label": "cardiac-risk"},
        },
    ],
}
custom_path.write_text(yaml.safe_dump(custom_task, sort_keys=False), encoding="utf-8")
print("Wrote", custom_path)

Load the new task with `EvaluationHarness` and run the local demo model `bench.examples.mypkg.mylocal`.


In [None]:
harness = EvaluationHarness(
    tasks_dir=str(tasks_dir), results_dir="results", cache_dir="cache"
)
info = harness.get_task_info("custom_demo")
info

In [None]:
report = harness.evaluate(
    model_id="demo-local",
    task_ids=["custom_demo"],
    model_type="local",
    module_path="bench.examples.mypkg.mylocal",
    model_path=None,
)
report.overall_scores

You can inspect `report.detailed_results` for per-example predictions and metrics metadata.


In [None]:
(
    len(report.detailed_results),
    report.detailed_results[0].metrics_results,
    report.detailed_results[0].metadata.get("predictions"),
)

Cleanup note: If you want to remove the demo task file, delete `bench/tasks/custom_demo.yaml`.
