# PHUC Skills Secret Sauce: Kung-Fu Move Bench (A/B/AB/ABC)

**Goal.** Generate *receipts* (artifacts you can replay) showing how each Stillwater skill pack changes model behavior on harsh, measurable scenarios.

This is not a vibes essay. It is a local-first benchmark harness.

### What it measures (move cards)
- **Iron Shield** (`prime-safety`): prompt-injection defense (dangerous output rate)
- **Breathe & Ask** (`phuc-context`): missing-assets honesty (NEED_INFO vs fabricated diffs)
- **Scout Formation** (`phuc-swarms`): typed JSON artifact compliance (schema pass rate)
- **Compass Form** (`phuc-forecast`): decision-structured plans (DREAM/FORECAST/DECIDE/ACT/VERIFY coverage)
- **Counter Bypass** (`prime-math`): exact counting via a CPU tool (exactness + tool-use rate)
- **One-Inch Patch** (`prime-coder`): minimal unified diffs + real tests green on toy repos (apply rate + tests-pass rate)

### Arms
- `A_baseline_white_belt`: no skills injected
- `B_*`: single-skill move
- `AB_guarded_coder`: safety + coder
- `ABC_master_stack`: full stack

### Outputs
- `artifacts/skills_ab/results.json`
- `artifacts/skills_ab/report.md`

### How to run
Preferred (no Jupyter kernel required):
```bash
PYTHONPATH=cli/src STILLWATER_AB_BACKEND=mock STILLWATER_AB_CACHE=0 \
  python -m stillwater.skills_ab
```

Or via the helper CLI:
```bash
PYTHONPATH=cli/src stillwater skills-ab --backend mock --no-cache
```


In [None]:
from __future__ import annotations

import os
import sys
from pathlib import Path

# Allow `import stillwater` without requiring an editable install.
# Works for both legacy `src/` and current `cli/src/` layouts.
repo = Path(".").resolve()
for candidate in [repo / "src", repo / "cli" / "src"]:
    if candidate.exists():
        sys.path.insert(0, str(candidate))
        break

from stillwater.skills_ab import SkillsABConfig, run_skills_ab

# Default to offline deterministic mode unless the user explicitly opts into a real backend.
backend = os.environ.get("STILLWATER_AB_BACKEND", "mock")
use_cache = os.environ.get("STILLWATER_AB_CACHE", "1") == "1"

cfg = SkillsABConfig(
    repo_root=Path("."),
    skills_dir=Path("skills"),
    artifacts_dir=Path("artifacts") / "skills_ab",
    backend=backend,
    ollama_url=os.environ.get("STILLWATER_OLLAMA_URL", "http://localhost:11434"),
    model=os.environ.get("STILLWATER_AB_MODEL", "mock-kungfu-v1"),
    use_cache=use_cache,
    seed=int(os.environ.get("STILLWATER_AB_SEED", "1337")),
)

results = run_skills_ab(cfg)
print("Wrote:")
print("-", cfg.artifacts_dir / "results.json")
print("-", cfg.artifacts_dir / "report.md")
print("runs:", len(results.get("runs", [])))


In [None]:
from pathlib import Path

report_path = Path("artifacts") / "skills_ab" / "report.md"
print(report_path.read_text(encoding="utf-8"))
