Summary
Integrate ClawTeam persona tests into CI/CD to catch regressions before merge.
Background
ClawTeam's 5 personas found 15 issues across 2 rounds of testing:
- Claw-1 (Quickstart): CLI backwards-compat, --version, port conflict
- Claw-2 (Builder): concurrency, system prompt, streaming
- Claw-3 (Embedder): 108MB memory leak, compiler warnings
- Claw-4 (Optimizer): Metal speedup, KV compression RSS
- Claw-5 (Researcher): GQA broken, non-determinism
Currently these are run manually. Every code change risks regression.
Proposed CI Pipeline
On every PR (fast, ~3min):
clawteam-smoke:
- Claw-1 S1: pip install + quantcpp --help
- Claw-3 S3: leaks --atExit (memory leak check)
- Build: cmake + ctest 35/35
Weekly (full, ~30min):
clawteam-full:
- All 5 personas, all scenarios
- Acme 7/7 regression
- Wikitext 10/10 regression
- Model compatibility matrix
Existing Assets
docs/clawteam/personas/claw-{1-5}-*.md — scenario definitions
docs/feedback/2026-04-12_1400_claw-*.md — baseline results
.claude/commands/clawteam.md — skill definition
bench/rlv/eval/eval_acme.py — Acme benchmark
bench/rlv/eval/eval_wikitext.py — Wikitext benchmark
Priority: P1
Summary
Integrate ClawTeam persona tests into CI/CD to catch regressions before merge.
Background
ClawTeam's 5 personas found 15 issues across 2 rounds of testing:
Currently these are run manually. Every code change risks regression.
Proposed CI Pipeline
On every PR (fast, ~3min):
Weekly (full, ~30min):
Existing Assets
docs/clawteam/personas/claw-{1-5}-*.md— scenario definitionsdocs/feedback/2026-04-12_1400_claw-*.md— baseline results.claude/commands/clawteam.md— skill definitionbench/rlv/eval/eval_acme.py— Acme benchmarkbench/rlv/eval/eval_wikitext.py— Wikitext benchmarkPriority: P1