Skip to content

v3.14.0 β€” testgen Test-Driven Repair via headless `claude -p`

Latest

Choose a tag to compare

@ruvnet ruvnet released this 22 Jun 21:16
ec1a187

✨ New feature β€” Test-Driven Repair

Closes the loop the existing TDD plugins didn't: we generate tests; now we also fix the code to satisfy them. Inspired by agent-harness-generator ADR-175; implemented via headless `claude -p` for tighter integration with our stack.

What you can do now

```bash

Failing CI test β†’ verified fix PR for pennies

npx ruflo@latest hooks testgen tdd-repair
--repo .
--test tests/failing.test.ts
--test-command "npx vitest run tests/failing.test.ts"
--confirm
```

Or call `mcp__claude-flow__testgen_tdd_repair` from any agent.

How it works

  1. Pre-flight: run the test command. Refuse if it already passes (catches `--test-command` typos).
  2. Spawn `claude -p` with capability-restricted toolset (`--allowedTools Read,Edit,Bash`) and hard-capped budget (`--max-budget-usd`).
  3. Focused prompt: read the failing test, fix the source, do NOT modify the test, verify by running it.
  4. Re-run the test. The exit code IS the fitness function β€” no LLM-as-judge.
  5. If green: emit success + per-attempt usage (`cost_usd`). If red after `--max-attempts`: emit failure receipts; workspace left as-is for review.

Safety posture

  • `--confirm` REQUIRED (dry-run otherwise)
  • Hard `--max-budget-usd` cap (default $5)
  • `--allowedTools Read,Edit,Bash` (no MCP, no network, no arbitrary writes)
  • Prompt forbids modifying the test or adding dependencies
  • 15 min hard timeout

Smoke proof

`docs/benchmarks/tdd-repair/SMOKE-2026-06-22.md` β€” built fixture with `add(a,b) β†’ a-b` bug; pre-repair 2/2 fail, post-repair 2/2 pass.

Cost ladder

Tier Model Per-attempt typical
1 Haiku (default) $0.02 – $0.20
2 Sonnet $0.30 – $2.00
3 Opus $1.50 – $8.00

Conformant mode (`--no-test-oracle`)

Scoped for a follow-up ADR. Today `--no-test-oracle` returns a clear config error.

Distribution

Package latest alpha v3alpha
`@claude-flow/cli` 3.14.0 3.14.0 3.14.0
`claude-flow` 3.14.0 3.14.0 3.14.0
`ruflo` 3.14.0 3.14.0 3.14.0

Cross-references

  • πŸ”— Inspired by: agent-harness-generator/packages/darwin-mode ADR-175
  • πŸ”— Smoke receipts: docs/benchmarks/tdd-repair/SMOKE-2026-06-22.md
  • πŸ”— Companion: testgen `test-gaps` skill (gap β†’ fix loop scoped for next iteration)

πŸ€– Generated with RuFlo