Release v3.11.1: Benchmark-driven cheap-first test generation · proffesor-for-testing/agentic-qe

What's New

A benchmark-driven deepening of v3.11.0's opt-in "cheap-first" test generation — everything still opt-in and off by default, so existing setups are unchanged.

We ran real model-vs-model benchmarks scored by a real oracle (mutation kill-rate + coverage) and shipped only what the measurements earned:

Best-of-k generation — try a few diverse candidates, keep the first that passes the objective check (extra cost only when the first fails).
Cross-model best-of-k — draw candidates from different model families (e.g. local + cloud) so they cover each other's failures. Measured ~+6 quality points over a single model.
Self-test Goodhart guard — only an objective check (real run, coverage, mutation) can lift routing confidence; a model's own "passing" self-test never does.
Broadened to requirements → BDD/Gherkin generation.
@ruvector/adversarial-verify — a reusable blind-refuter verification primitive + an opt-in output gate that drops unverified findings before they're emitted.
MCP self-governance (default-deny policy + CI gate), cost-Pareto value scoring, witnessed finding delivery (Ed25519, fail-closed), and graceful optional-module loading.

Note: the default free-tier model is now qwen3:30b-a3b (the 8B was below the test-generation quality floor). Only affects users who opt into the free tier — set freeTierModel to fit your hardware.

Getting Started

npx agentic-qe init --auto

New to the cheap-first lane? See the Darwin-QE Self-Learning guide.

See CHANGELOG for full details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.11.1: Benchmark-driven cheap-first test generation

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

Getting Started

Uh oh!