Skip to content

ClawTeam CI: automated persona-based QA on every PR #87

@unamedkr

Description

@unamedkr

Summary

Integrate ClawTeam persona tests into CI/CD to catch regressions before merge.

Background

ClawTeam's 5 personas found 15 issues across 2 rounds of testing:

  • Claw-1 (Quickstart): CLI backwards-compat, --version, port conflict
  • Claw-2 (Builder): concurrency, system prompt, streaming
  • Claw-3 (Embedder): 108MB memory leak, compiler warnings
  • Claw-4 (Optimizer): Metal speedup, KV compression RSS
  • Claw-5 (Researcher): GQA broken, non-determinism

Currently these are run manually. Every code change risks regression.

Proposed CI Pipeline

On every PR (fast, ~3min):

clawteam-smoke:
  - Claw-1 S1: pip install + quantcpp --help
  - Claw-3 S3: leaks --atExit (memory leak check)
  - Build: cmake + ctest 35/35

Weekly (full, ~30min):

clawteam-full:
  - All 5 personas, all scenarios
  - Acme 7/7 regression
  - Wikitext 10/10 regression
  - Model compatibility matrix

Existing Assets

  • docs/clawteam/personas/claw-{1-5}-*.md — scenario definitions
  • docs/feedback/2026-04-12_1400_claw-*.md — baseline results
  • .claude/commands/clawteam.md — skill definition
  • bench/rlv/eval/eval_acme.py — Acme benchmark
  • bench/rlv/eval/eval_wikitext.py — Wikitext benchmark

Priority: P1

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions