You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Added
skills.sh — install the Mira agent skill into a Claude Code skills directory
so an agent can author and run evals. --global targets ~/.claude/skills/mira, --local (the default) targets ./.claude/skills/mira. It copies from a local
checkout when present, else fetches from GitHub raw (--ref), so curl -fsSL .../skills.sh | sh works on a box that only has the prebuilt
binary. Each run is a clean replace, so it also serves as the upgrade path.
Native TypeScript SDK (sdks/typescript, mira-eval) — author
eval studies in TypeScript/Node with no Rust dependency: a zero-runtime-dep
library over the protocol, with wire types and protocol metadata generated from schema/v1/ (a self-contained codegen.mjs --check drift guard, the TS dual of
the Rust/Python guards), a serve() loop (incl. the execute/score split and list_samples pagination), a parity authoring API, and conformance + behaviour
tests. Worked example: examples/greet-typescript. Publishes to npm as mira-eval via OIDC trusted publishing (publish-typescript in publish.yml),
mirroring the Python PyPI flow.
Named launchers in mira.toml: [launchers.NAME] saves a study invocation
(bin/example/cmd/uv/python/python3 + package/manifest_path),
selected with --launcher NAME. default_launcher makes a bare mira run
work; explicit launch flags override the named launcher, mirroring --preset.
cargo binstall mira-cli support: [package.metadata.binstall] points binstall
at the prebuilt release tarballs, so the mira binary installs without a compile.
Polyglot launcher flags — mira --uv / --python / --python3 SCRIPT
drive a non-Rust study directly (e.g. mira --python3 study.py run), replacing
the verbose --cmd "python3 study.py". --cmd still works for an arbitrary
command line.
mira help --full now surfaces a GUIDES section (each docs/ guide with a
one-line scope, for progressive disclosure) and a link to the mira agent skill
in LINKS, so an agent can self-orient to the docs and skill in one read. A
drift guard keeps the guide list in sync with docs/README.md.
Run folders, save-by-default, and resume. Every mira run/mira score now
saves a run folder under the results dir by default — <run_id>/ with meta.json, report.json/report.html, and one cases/<key>/result.json per
finished case (written atomically as it lands). --dry-run opts out.
mira run --resume <run_id> reopens an interrupted run's folder, skips the cases
already recorded under cases/, and runs only what's missing.
mira report <run_id> — new subcommand that re-renders a saved run's reports
from its stored per-case results, with no study process and no re-execution.
Changed
The execution unit (one eval × sample × target × axis × trial) is now called a case (was "cell"): Cell/CellSpec → Case/CaseSpec, run_cells → run_cases, etc. The dataset-row builder .case(id, prompt) → .sample(id, prompt), and the prebuilt-Sample adder .sample(Sample) → .add_sample(Sample).
The pub type Case = Sample alias is removed.
Removed
--checkpoint, --fresh, and --save on mira run/mira score, plus the mira::session::Session type. The single-file checkpoint is superseded by the
always-saved run folder; resume is now explicit via --resume <run_id> (a fresh
run mints a new id and reuses nothing, so there is no silent stale-result reuse).
Configure the results dir via [results].dir in mira.toml (the --save <dir>
override is gone).