Release v0.3.0

Latest

Latest

github-actions released this 28 Jun 03:20

· 1 commit to main since this release

0830385

Added

Self-describing results. A RunResult (and the persisted
cases/<key>/result.json) now carries the sample's input (the prompt turns
sent) and expected (the reference value, when the dataset provides one), so a
saved result can be read back without the original dataset. Both are optional
on the wire — input omitted when empty, expected when absent.
Docs diagrams. Five new committed SVGs visualise the model, each in its
topical guide: the end-to-end workflow (mira-workflow.svg — author →
plan → execute → score → report) in getting-started.md,
the entity hierarchy (mira-entities.svg — study ▸ eval ▸
dataset/subject/scorers/targets/axes, expanded into cases · trials ·
transcripts · scores) in authoring.md, the host ⇄
study run lifecycle (mira-run-lifecycle.svg — the protocol sequence for one
run) in how-it-works.md, the subject fan-in
(mira-subjects.svg — the three subject shapes normalising into one
Transcript) in subjects.md, and the scoring flow
(mira-scoring.svg — transcript surfaces → scorers → case verdict) in
scorers.md. Indexed in
docs/README.md.
JSONL and CSV report formats (--format jsonl / --format csv) for
un-aggregated, analysis-ready exports. jsonl writes one RunResult per line
(lossless — the line-delimited dual of json); csv is long-format, one row
per (case × score) with the case columns repeated and open-vocabulary
metrics/metadata flattened into stable metric.*/meta.* columns. Both
work anywhere --out/--format do (run, report, score); a --group-by
view is intentionally not folded in — the consumer aggregates the rows.
Per-case wall-clock timeout: give up on a case after a budget of seconds,
cancelling the in-flight run (best-effort cancel over the protocol) and
recording it as a failed case. Set it on the CLI (mira run --timeout SECONDS,
all targets), per target in mira.toml ([targets.LABEL].timeout), or as a
preset default ([presets.NAME].timeout). Precedence, first set wins:
--timeout > per-target > preset; unset ⇒ no limit. A timeout is non-retryable
(retrying would burn the same budget) and counts as a target failure.
Glob case selection. --targets, --samples (new), and --evals (new)
match the target label / sample id / eval name by glob (*, ?, [set],
{a,b}); a literal value stays an exact match. --axis values are globbed
too. A small dep-free matcher (mira::glob_match) backs both the host and the
in-process Runner (Runner::samples(…), glob-aware Runner::targets(…)).

Changed

BREAKING (preset): the preset filter key is replaced by per-dimension
samples (glob on sample id). targets/samples/evals in [presets.NAME]
now glob-match and accept either a single string or a list. The cross-cutting
case-key substring stays available as the positional mira run [filter].

Assets 8