Skip to content

Release v0.3.0

Latest

Choose a tag to compare

@github-actions github-actions released this 28 Jun 03:20
· 1 commit to main since this release
0830385

Added

  • Self-describing results. A RunResult (and the persisted
    cases/<key>/result.json) now carries the sample's input (the prompt turns
    sent) and expected (the reference value, when the dataset provides one), so a
    saved result can be read back without the original dataset. Both are optional
    on the wire — input omitted when empty, expected when absent.
  • Docs diagrams. Five new committed SVGs visualise the model, each in its
    topical guide: the end-to-end workflow (mira-workflow.svg — author →
    plan → execute → score → report) in getting-started.md,
    the entity hierarchy (mira-entities.svg — study ▸ eval ▸
    dataset/subject/scorers/targets/axes, expanded into cases · trials ·
    transcripts · scores) in authoring.md, the host ⇄
    study run lifecycle
    (mira-run-lifecycle.svg — the protocol sequence for one
    run) in how-it-works.md, the subject fan-in
    (mira-subjects.svg — the three subject shapes normalising into one
    Transcript) in subjects.md, and the scoring flow
    (mira-scoring.svg — transcript surfaces → scorers → case verdict) in
    scorers.md. Indexed in
    docs/README.md.
  • JSONL and CSV report formats (--format jsonl / --format csv) for
    un-aggregated, analysis-ready exports. jsonl writes one RunResult per line
    (lossless — the line-delimited dual of json); csv is long-format, one row
    per (case × score) with the case columns repeated and open-vocabulary
    metrics/metadata flattened into stable metric.*/meta.* columns. Both
    work anywhere --out/--format do (run, report, score); a --group-by
    view is intentionally not folded in — the consumer aggregates the rows.
  • Per-case wall-clock timeout: give up on a case after a budget of seconds,
    cancelling the in-flight run (best-effort cancel over the protocol) and
    recording it as a failed case. Set it on the CLI (mira run --timeout SECONDS,
    all targets), per target in mira.toml ([targets.LABEL].timeout), or as a
    preset default ([presets.NAME].timeout). Precedence, first set wins:
    --timeout > per-target > preset; unset ⇒ no limit. A timeout is non-retryable
    (retrying would burn the same budget) and counts as a target failure.
  • Glob case selection. --targets, --samples (new), and --evals (new)
    match the target label / sample id / eval name by glob (*, ?, [set],
    {a,b}); a literal value stays an exact match. --axis values are globbed
    too. A small dep-free matcher (mira::glob_match) backs both the host and the
    in-process Runner (Runner::samples(…), glob-aware Runner::targets(…)).

Changed

  • BREAKING (preset): the preset filter key is replaced by per-dimension
    samples (glob on sample id). targets/samples/evals in [presets.NAME]
    now glob-match and accept either a single string or a list. The cross-cutting
    case-key substring stays available as the positional mira run [filter].