# 02b — Python Task Interface & Registration

This notebook shows how to:

- Define a `MedicalTask` programmatically (Python)
- Save it to YAML for reuse
- Register/load it via `TaskRegistry` / `TaskLoader`
- Evaluate it using `EvaluationHarness`

This complements `02_custom_task_creation.ipynb` (YAML-first).

In [None]:
from pathlib import Path
from bench.models.medical_task import MedicalTask, TaskType
from bench.evaluation.task_registry import TaskRegistry
from bench.evaluation.harness import EvaluationHarness

# 1) Define a small QA task in Python
py_task = MedicalTask(
    task_id="python_demo_qa",
    task_type=TaskType.QA,
    name="Python Demo QA",
    description="Tiny QA task defined via Python API",
    inputs=[{"question": "What does BP stand for?"}],
    expected_outputs=[{"answer": "blood pressure"}],
    metrics=["clinical_correctness", "accuracy"],
    input_schema={"required": ["question"]},
    output_schema={"required": ["answer"]},
    dataset=[
        {
            "input": {"question": "Patient has fever and cough? diagnosis?"},
            "output": {"answer": "flu-like"},
        },
        {
            "input": {"question": "What does BP stand for?"},
            "output": {"answer": "blood pressure"},
        },
    ],
)
py_task

2) Save to YAML under `bench/tasks/` so it can be loaded like other tasks.

In [None]:
tasks_dir = Path("bench/tasks")
tasks_dir.mkdir(parents=True, exist_ok=True)
yaml_path = tasks_dir / f"{py_task.task_id}.yaml"
py_task.save(yaml_path, format="yaml")
yaml_path, yaml_path.exists()

3) Register or discover the task. `TaskRegistry` wraps `TaskLoader` and makes listing/filtering easier.

In [None]:
reg = TaskRegistry(tasks_dir="bench/tasks")
reg.discover()
summary = [
    r for r in reg.list_available(has_metrics=True) if r.task_id == py_task.task_id
]
[(s.task_id, s.num_examples, s.metrics) for s in summary]

4) Evaluate using the local demo model.

In [None]:
h = EvaluationHarness(tasks_dir="bench/tasks", results_dir="results", cache_dir="cache")
rep = h.evaluate(
    model_id="demo-local",
    task_ids=[py_task.task_id],
    model_type="local",
    module_path="bench.examples.mypkg.mylocal",
    model_path=None,
)
rep.overall_scores

Notes:
- For larger datasets, store your data externally and assemble the `dataset` entries (or implement a loader to feed `EvaluationHarness` inputs).
- The YAML route (`02_custom_task_creation.ipynb`) remains the simplest for sharing tasks. The Python API is helpful for dynamic generation or programmatic pipelines.
- For direct registration without writing files, see `TaskRegistry.register(MedicalTask(...))`.
