apex is a Claude Code plugin that turns a rough request into an elite, grounded, human-gated prompt — and never executes it for you.
A deterministic prompt forge: you give it a rough idea, it grounds the idea in your codebase, drafts a structured prompt, has an isolated agent red-team it against a falsifiable rubric, refines only on real defects, and hands you back a layered prompt at a human gate.
Install
claude plugin marketplace add null0xxx/apex
claude plugin install apex@apex
# then restart Claude Code (plugins load at session start)Use
/apex "fix average() crashing on empty list in src/stats.py; return 0.0; add a test; don't touch median()"
Flow: clarify? → ground → draft → red-team critique → refine → human-gated output. Full flag/usage docs live in apex/README.md.
Design principle: every deterministic decision (schema validation, path verification, refine/terminate, final status, persistence) is offloaded to small TDD'd Python scripts; the model is left only genuine judgment (clarify, draft, critique). Only two subagents exist, each justified — context-scout (asymmetry: reads repo bytes the orchestrator hasn't) and red-team-critic (isolation: judges without seeing the drafter's reasoning).
| Path | What it is |
|---|---|
apex/ |
⭐ The plugin: command + 2 agents + a knowledge skill + TDD'd scripts. Start at apex/README.md. |
.claude-plugin/marketplace.json |
Marketplace manifest so apex installs in one command. |
docs/superpowers/specs/apex-spec.md |
Frozen engineering spec for apex. |
docs/superpowers/implementation-plans/apex-v1.md |
The V1 thin-core, TDD, task-by-task build plan. |
apex/tests/ACCEPTANCE.md |
Honest acceptance record (what's proven live vs pending). |
apex began as the deliverable of a larger prompt-engineering analysis — the original
promptcproject, mapped withgraphify. That analysis lives in the separatenull0xxx/promptcrepo.
cd apex && python3 -m pytest -q # 54 tests: TDD'd scripts + structural constitution + eval harnessapex V1 thin-core — built, tested (54 passing), installed. A self-measurement foundation (held-out eval harness in apex/evals/, per-stage telemetry, P0 correctness fixes) now lets apex score its own output. Adaptive features (auto routing, parallel lenses, execution-simulator, monorepo policy) remain intentionally locked until ≥20 logged runs show a 0 grounding-failure rate — reliability is earned before adaptivity.
MIT — see LICENSE.