Skip to content

v3.11.1: Benchmark-driven cheap-first test generation

Latest

Choose a tag to compare

@proffesor-for-testing proffesor-for-testing released this 25 Jun 10:37
· 2 commits to main since this release
c092dce

What's New

A benchmark-driven deepening of v3.11.0's opt-in "cheap-first" test generation — everything still opt-in and off by default, so existing setups are unchanged.

We ran real model-vs-model benchmarks scored by a real oracle (mutation kill-rate + coverage) and shipped only what the measurements earned:

  • Best-of-k generation — try a few diverse candidates, keep the first that passes the objective check (extra cost only when the first fails).
  • Cross-model best-of-k — draw candidates from different model families (e.g. local + cloud) so they cover each other's failures. Measured ~+6 quality points over a single model.
  • Self-test Goodhart guard — only an objective check (real run, coverage, mutation) can lift routing confidence; a model's own "passing" self-test never does.
  • Broadened to requirements → BDD/Gherkin generation.
  • @ruvector/adversarial-verify — a reusable blind-refuter verification primitive + an opt-in output gate that drops unverified findings before they're emitted.
  • MCP self-governance (default-deny policy + CI gate), cost-Pareto value scoring, witnessed finding delivery (Ed25519, fail-closed), and graceful optional-module loading.

Note: the default free-tier model is now qwen3:30b-a3b (the 8B was below the test-generation quality floor). Only affects users who opt into the free tier — set freeTierModel to fit your hardware.

Getting Started

npx agentic-qe init --auto

New to the cheap-first lane? See the Darwin-QE Self-Learning guide.

See CHANGELOG for full details.