·
2 commits
to main
since this release
What's New
A benchmark-driven deepening of v3.11.0's opt-in "cheap-first" test generation — everything still opt-in and off by default, so existing setups are unchanged.
We ran real model-vs-model benchmarks scored by a real oracle (mutation kill-rate + coverage) and shipped only what the measurements earned:
- Best-of-k generation — try a few diverse candidates, keep the first that passes the objective check (extra cost only when the first fails).
- Cross-model best-of-k — draw candidates from different model families (e.g. local + cloud) so they cover each other's failures. Measured ~+6 quality points over a single model.
- Self-test Goodhart guard — only an objective check (real run, coverage, mutation) can lift routing confidence; a model's own "passing" self-test never does.
- Broadened to requirements → BDD/Gherkin generation.
@ruvector/adversarial-verify— a reusable blind-refuter verification primitive + an opt-in output gate that drops unverified findings before they're emitted.- MCP self-governance (default-deny policy + CI gate), cost-Pareto value scoring, witnessed finding delivery (Ed25519, fail-closed), and graceful optional-module loading.
Note: the default free-tier model is now qwen3:30b-a3b (the 8B was below the test-generation quality floor). Only affects users who opt into the free tier — set freeTierModel to fit your hardware.
Getting Started
npx agentic-qe init --autoNew to the cheap-first lane? See the Darwin-QE Self-Learning guide.
See CHANGELOG for full details.