Release Release v0.1.0 · everruns/mira

Initial public release. The crates, the Python SDK, and the protocol all start at
this version.

Highlights

Code-first eval framework — Eval = Dataset(Sample…) + Subject + [Scorer…] crossed with a provider-agnostic Target matrix, a broad built-in scorer vocabulary (text, tools, budgets, files, combinators, LLM-judge), and an in-process runner (#2).
The eval protocol (1.0) — newline-delimited JSON over stdio between the study and the host, with MAJOR.MINOR versioning, capability negotiation, and a machine-readable JSON Schema generated from the wire types (#16).
Native Python SDK — author studies in pure-stdlib Python (no Rust dependency); wire types and protocol metadata are generated from the schema with a drift guard (#22, #25).
Trials, pass@k, and seeds — first-class N-sampling for pass-rate and variance with an unbiased pass@k estimator and reproducible per-trial seeds (#24).
Multimodal & interactive evals — typed multimodal content (input attachments + graded output) and simulated-user multi-turn dialogs folded into one transcript (#28).
Provider-backed LLM judge + N/A semantics — LlmJudge scorers over OpenAI/Anthropic and a third "couldn't evaluate" state, so infra failures degrade to N/A instead of a false fail (#6, #8).
Adaptive matrix concurrency — bounded, provider-aware throttling that multiplexes runs over one pipe and backs off on rate limits (#4).

What's Changed

Targets, not models: rename ModelSpec→Target + --axis/--preset selection (#34) by @chaliy
feat(protocol): reserve the study→host reverse-request channel seam (#32) by @chaliy
feat(protocol): cursor-paginated sample listing (1.10) (#31) by @chaliy
feat(protocol): promote multimodal output + capability params to the wire (1.11) (#30) by @chaliy
feat(protocol): cancel an in-flight run by id (protocol 1.8) (#29) by @chaliy
feat: multimodality, interactive multi-turn evals, and structured capability params (#28) by @chaliy
feat(protocol): typed, correlated event/log notifications (1.9) (#27) by @chaliy
feat(protocol): metadata columns for samples/models + report --group-by (#26) by @chaliy
feat(sdks): generate protocol metadata for the Python SDK drift guard (#25) by @chaliy
feat(protocol): trials/repetitions + seed with pass@k aggregation (#24) by @chaliy
feat(protocol): structured RPC errors (protocol 1.5) (#23) by @chaliy
feat(sdks): native Python SDK for authoring eval studies (#22) by @chaliy
feat(protocol): make metadata open-ended (string → JSON) (#21) by @chaliy
feat(cli): record environment metadata in saved runs (#20) by @chaliy
feat(cli): add AI-friendly mira help --full and reword tagline (#18) by @chaliy
feat(protocol): machine-readable JSON Schema generated from wire types (#16) by @chaliy
feat(cli): --save run archive with run ids, timestamps, and mira.toml (#15) by @chaliy
feat: split subject execution from scoring (execute/score, rescore) (#11) by @chaliy
feat(metrics): extensible numeric metrics map + generic budget scorers (#10) by @chaliy
feat: surface infrastructure errors as N/A (not failures), retryable (#8) by @chaliy
feat(scorer): N/A score state + provider-backed LLM judge (#6) by @chaliy
feat(exec): bounded, provider-aware, adaptive matrix concurrency (#4) by @chaliy
feat: live progress bar and session-backed checkpoints for mira run (#3) by @chaliy
Productionize the Mira eval-framework PoC into a published workspace (#2) by @chaliy
chore(protocol): reset protocol version to the 1.0 baseline (#33) by @chaliy
chore(just): add install recipe (#17) by @chaliy
chore(ship): resolve addressed PR review comments (#13) by @chaliy
chore(skills): adopt ship skill and split public/internal skill layout (#9) by @chaliy
docs: finish Target/expected rename in docs and examples (follow-up to #34) (#35) by @chaliy
docs: add docs index + public-docs spec, reconcile drift (#19) by @chaliy
docs(contributing): document main branch-protection gate (#14) by @chaliy
docs(readme): reframe as evals toolkit with overview diagram (#12) by @chaliy
docs: surface agentic-trajectory eval as a headline strength (#7) by @chaliy
docs: extensibility guide + custom-subject example (#5) by @chaliy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

Uh oh!