Skip to content

Release v0.2.0

Choose a tag to compare

@github-actions github-actions released this 24 Jun 03:16
6c7202b

Added

  • skills.sh — install the Mira agent skill into a Claude Code skills directory
    so an agent can author and run evals. --global targets ~/.claude/skills/mira,
    --local (the default) targets ./.claude/skills/mira. It copies from a local
    checkout when present, else fetches from GitHub raw (--ref), so
    curl -fsSL .../skills.sh | sh works on a box that only has the prebuilt
    binary. Each run is a clean replace, so it also serves as the upgrade path.
  • Native TypeScript SDK (sdks/typescript, mira-eval) — author
    eval studies in TypeScript/Node with no Rust dependency: a zero-runtime-dep
    library over the protocol, with wire types and protocol metadata generated from
    schema/v1/ (a self-contained codegen.mjs --check drift guard, the TS dual of
    the Rust/Python guards), a serve() loop (incl. the execute/score split and
    list_samples pagination), a parity authoring API, and conformance + behaviour
    tests. Worked example: examples/greet-typescript. Publishes to npm as
    mira-eval via OIDC trusted publishing (publish-typescript in publish.yml),
    mirroring the Python PyPI flow.
  • Named launchers in mira.toml: [launchers.NAME] saves a study invocation
    (bin/example/cmd/uv/python/python3 + package/manifest_path),
    selected with --launcher NAME. default_launcher makes a bare mira run
    work; explicit launch flags override the named launcher, mirroring --preset.
  • cargo binstall mira-cli support: [package.metadata.binstall] points binstall
    at the prebuilt release tarballs, so the mira binary installs without a compile.
  • Polyglot launcher flagsmira --uv / --python / --python3 SCRIPT
    drive a non-Rust study directly (e.g. mira --python3 study.py run), replacing
    the verbose --cmd "python3 study.py". --cmd still works for an arbitrary
    command line.
  • mira help --full now surfaces a GUIDES section (each docs/ guide with a
    one-line scope, for progressive disclosure) and a link to the mira agent skill
    in LINKS, so an agent can self-orient to the docs and skill in one read. A
    drift guard keeps the guide list in sync with docs/README.md.
  • Run folders, save-by-default, and resume. Every mira run/mira score now
    saves a run folder under the results dir by default — <run_id>/ with
    meta.json, report.json/report.html, and one cases/<key>/result.json per
    finished case (written atomically as it lands). --dry-run opts out.
  • mira run --resume <run_id> reopens an interrupted run's folder, skips the cases
    already recorded under cases/, and runs only what's missing.
  • mira report <run_id> — new subcommand that re-renders a saved run's reports
    from its stored per-case results, with no study process and no re-execution.

Changed

  • The execution unit (one eval × sample × target × axis × trial) is now called a
    case (was "cell"): Cell/CellSpecCase/CaseSpec, run_cells
    run_cases, etc. The dataset-row builder .case(id, prompt).sample(id, prompt), and the prebuilt-Sample adder .sample(Sample).add_sample(Sample).
    The pub type Case = Sample alias is removed.

Removed

  • --checkpoint, --fresh, and --save on mira run/mira score, plus the
    mira::session::Session type. The single-file checkpoint is superseded by the
    always-saved run folder; resume is now explicit via --resume <run_id> (a fresh
    run mints a new id and reuses nothing, so there is no silent stale-result reuse).
    Configure the results dir via [results].dir in mira.toml (the --save <dir>
    override is gone).