Read the terrain. Then route.
Compass is a spec-driven development framework that refuses to treat a typo fix and a payments rewrite the same way. It reads the context of each task — how risky, how big, new code or old, who's asking — and computes the right amount of process for that change. Heavy when it needs to be. Out of the way when it doesn't.
Built for Claude Code. The methodology and kit layers are tool-agnostic, so it ports — only the thin adapter layer is Claude-Code-specific.
Working name. "Compass" and "the Needle" are placeholders — rename freely.
Every spec-driven framework eventually picks a ceremony and applies it to everything. Then one of two things happens:
- The process costs more than the change, so people route around it. It gets used for the demo and abandoned for the real work.
- The same flat process that's too heavy for a typo is too light for a schema migration. You can't calibrate one fixed pipeline for both ends.
The usual fix is levels — a fixed ladder of five or six tiers. Better, but a ladder is a one-dimensional answer to a multi-dimensional question. "How risky" and "how big" and "is this greenfield" and "who's invoking" are different axes. A migration that touches one file is small but not safe. A prototype is large but low-risk.
Compass computes process intensity per task instead of selecting it from a menu. Every task starts with Frame — a triage step where the Needle reads four dimensions:
| Dimension | Question |
|---|---|
| Blast radius | If this goes wrong, how bad and how wide? |
| Terrain | New code or existing code — and how well is it mapped? |
| Magnitude | How much work is this actually? |
| Intent & role | Who's invoking, and what outcome are they really after? |
The Needle writes route.md: what it assessed, the route it chose, the gates
that apply, and exactly what it's skipping and why that's safe. De-scoping
is a written, auditable decision — never an accident.
Every route runs the same eight phases. What changes is how much each one costs.
Frame → Specify → Clarify → Plan → Distribute → Build → Verify → Land
On the Express route, Clarify/Plan/Distribute collapse to almost nothing. On Expedition, Plan produces a distribution map and Distribute spins up a swarm of agents across git worktrees. Same vocabulary, different weight — so anyone who's run one Compass task can read the artifacts of any other.
Compass is governed by two kinds of thing, kept deliberately separate:
- Guardrails — few, hard, checkable, blocking. The things that must never happen. The Needle adapts ceremony around them; it never crosses one.
- Strategies — many, soft, directional, assessed. How the team tends to work. A strategy biases a decision; it doesn't block one.
The framework ships five default guardrails: (G1) tested before it lands, (G2) acceptance defined before it's built, (G3) traceability holds, (G4) evidence not assertion, (G5) a human signs off on the irreversible.
The move that keeps Compass from being a sledgehammer: BDD and TDD are default strategies, not guardrails. The hard line is the outcome — code is tested, acceptance is checkable. Given/When/Then and red-green-refactor are the strong, shipped-on way to get there — and the Spike route can suspend them. A one-line typo fix still has to be tested before it lands; it doesn't have to perform the full ritual to do so.
And governance is a gradient, not a threshold: the defaults ship active,
so /compass:init is optional and /compass:frame works on day one. A team
accretes its own strategies as it forms opinions. See governance/.
Two related v1.1.0 capabilities extend the governance story without adding to the hard guardrail count:
- Fitness functions as project guardrails. A project declares a fitness
function as a guardrail in
governance/guardrails.ymlwithcheck: command-passesplus the command to run;compass checkruns the command at Verify and refuses to clear the gate if it does not exit 0. This lets a team encode "the build is under N MB", "the API never returns 500 in the smoke suite", "performance does not regress past P95 = X ms" as guardrails — checkable, blocking, evidence-backed — without having to invent new check types in the framework. Seearchitecture/decisions/ADR-009. - Intermittent-test integrity. Tests that rerun to green are the classic
way a guardrail becomes silently advisory. The
no-trusted-rerunrule on G4 (evidence, not assertion) refuses to clear a test-run when a rerun was needed unless either the root cause is fixed or the test is explicitly quarantined ingovernance/quarantine.ymlwith a tracking task. The detail lives ingovernance/strategies.md§6; the discipline is part of how Compass takes G4 seriously.
Compass isn't an engineering framework with bolted-on hooks for everyone else. The non-engineering roles have their own entry points and artifacts that plug into the same pipeline. The shared BDD scenario file is what makes it work — every role reads it through their own lens:
| Role | Entry point | Reads the spec for… |
|---|---|---|
| Product owner / manager | /compass:intent |
intent fidelity — do these scenarios deliver the brief? |
| Product marketer | /compass:position |
claims — every line of launch copy points at a backing scenario |
| Designer | /compass:design |
UI contracts, written as scenarios that flow into Specify |
| Engineer | /compass:frame → pipeline |
tests — scenarios become the acceptance suite |
| QA | /compass:verify |
coverage — which scenarios are exercised, which edges aren't |
The product owner enters upstream of the spec. The marketer works parallel to it. The designer feeds into it. Nobody is just a downstream consumer of a finished engineering process.
A lens does not always have an entry point. The table above lists five
entry-point roles — each starts a Compass task with its own /compass:…
command. The framework ships ten agents though, not five, because some lenses
apply during the pipeline rather than starting it. The 10th — the
architect-lens — reads the project's architecture/ artifacts at Frame
and annotates plan.md via architecture-notes.md at Plan; it is consulted
by spec-author and planner, not invoked as its own entry point. See
docs/roles-guide.md for the full lens treatment.
Routes are composed from the dimension readings — these five are starting shapes the Needle tunes, not a fixed ladder.
| Route | Typical reading | Shape |
|---|---|---|
| Express | atomic · contained · mapped | Frame → Specify (1 scenario) → Build → Verify. Still tested before it lands. One gate. |
| Standard | standard · contained | Full pipeline, solo or pair. Two gates. |
| Expedition | large · cross-cutting · greenfield | Full weight. Governance check, BDD discovery, distribution map, agent swarm across worktrees. All gates. |
| Hotfix | critical · small · brownfield | Reproduce-first — a failing regression test is the spec. Expedited Build, mandatory post-incident backfill. All Verify gates. |
| Spike | intent is exploration | Explore freely — the TDD strategy is suspended, the hook doesn't block. Then graduate (re-frame into a real route) or discard. Nothing lands from a Spike. |
An adaptive framework owes an answer to the obvious objection: if the process can flex, what stops it flexing to nothing? Part of that answer is the routing policy (above). The other part is now mechanical.
There is a line through Compass. On one side is judgement — the Needle
reading the four dimensions. That cannot be mechanized; that judgement is the
adaptivity. On the other side is mechanism — everything that happens once
the readings exist: composing the route, applying the floors and caps, running
the guardrail checks. Same readings + same policy => the same route, every
time. Compass puts that mechanism in a CLI, cli/compass, so it is actually
deterministic, not deterministic-in-principle. See docs/methodology.md §6.
compass route evaluate apply routing-policy.yml to a task's readings → the route
compass check run the guardrails.yml checks against task.yml + evidence/
compass tdd-red -- CMD run a test, assert it FAILS, record the red
compass tdd-green -- CMD run a test, assert it PASSES, clear the red marker
compass policy lint structurally validate the governance YAML
compass task lint structurally validate a task.yml
compass task receipt render a one-screen receipt for a landed task —
readings → route → typed evidence → gate verdicts
compass analyze cross-artifact coherence check: orphaned scenarios,
route disagreements, orphan claims (advisory or
gate-clearing if verify.analyze is in the route)
compass adr new create a new numbered ADR in architecture/decisions/
compass rework-scan scan tasks for rework patterns (window from signals.yml)
compass flow [--digest] cross-task flow view; --digest writes a dated digest
with the rework-scan section and calibration signal
compass next surface the next action on the current task
compass backfill pay mark a backfill as paid in a task's task.yml
compass calibration aggregate the re-frame log — is routing well-sized?
compass ci the full mechanical gate suite, for CI — honour the exit code
Its only hard dependency is PyYAML (jsonschema is optional — it turns on
full JSON Schema validation in the lint commands). The slash commands call the
CLI under the hood — /compass:frame runs compass route evaluate,
/compass:verify runs compass check — so you rarely invoke it directly, but
it is there, and it is the part that makes the framework's checks real rather
than aspirational. Gate evidence in task.yml is typed — a {type, path}
record, not a bare path — so a mechanical gate cannot be cleared with a written
note. And compass calibration is the framework's own feedback loop: it reads
the re-frame log across every task and reports whether the Needle is
systematically over- or under-sizing routes.
Compass CI does not replace your normal project CI. It does not re-run your full test suite unless you explicitly configure your pipeline to do so. Compass checks whether required evidence exists, is valid, and is traceable to the task route. Your application pipeline should still run tests, linting, type checks, security scans, build validation, and deployment checks.
The two are complementary: project CI proves the code is correct;
Compass CI proves the process — that the route was framed, scenarios
have tests, changed files trace to scenarios, gates carry evidence of the
right type, approvals are recorded where they must be. Run them as
separate jobs in the same workflow, with compass-ci gated on
project-ci so a failing test suite stops the pipeline before Compass
even runs:
# .github/workflows/ci.yml
name: CI
on:
pull_request:
push:
branches: [main]
jobs:
project-ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# your normal pipeline: tests, lint, type-check, security scans,
# build validation, deploy checks — whatever your project requires.
- run: make test
- run: make lint
compass-ci:
runs-on: ubuntu-latest
needs: project-ci # only run Compass once the code is green
env:
COMPASS_CLI: cli/compass # adjust to wherever Compass lives in your repo
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install pyyaml jsonschema
- run: python3 "$COMPASS_CLI" cici/github-actions.yml is the reference workflow for just the
compass-ci job; ci/README.md is the full contract. Pin
COMPASS_CLI to a specific commit SHA — see docs/security.md for the
supply-chain stance. When you are first piloting Compass and do not
want it blocking PRs yet, set mode: advisory in .compass/config.yml
and compass ci will report failures without exiting non-zero. Flip to
mode: enforced when the team is ready.
The fastest path is the plugin marketplace — no clone, no install script:
# In Claude Code:
/plugin marketplace add jed72/compass
/plugin install compass@compass
pip install pyyaml # the CLI's one dependencyEnabling the plugin namespaces the commands as /compass:…, registers the
hooks, and puts the compass CLI on your PATH (Claude Code adds the plugin's
bin/ to PATH automatically — that is the plugin-path-only convenience).
Or install from source — install.sh wires the slash commands, agents,
skills, and hooks in by symlink, so edits to your clone are picked up live:
git clone https://github.com/jed72/compass.git
cd compass && bash scripts/install.sh --global
pip install pyyamlinstall.sh does not modify your PATH — the source install does not have
the plugin runtime to do it for you. To make compass invokable from your
shell, add $PWD/bin to your PATH (or invoke the CLI as
python3 $COMPASS_HOME/cli/compass). The slash commands run the CLI on your
behalf, so this only matters when you call it directly.
Either way, start a task right away — the default guardrails ship active, so the Needle frames it and picks the route with zero setup:
/compass:frame "Add rate limiting to the public API"
# Walk the pipeline (or let the route auto-advance). The slash commands
# call the `compass` CLI under the hood for the deterministic parts —
# route evaluation, the TDD red/green records, the guardrail checks.
/compass:specify
/compass:plan
/compass:build
/compass:verify
/compass:land
# Optional, whenever you have opinions to encode — not a prerequisite:
/compass:init # adopt project guardrails and strategies into governance/A product owner would instead start with /compass:intent, a marketer with
/compass:position. To see across every task in flight — triage, blockers,
the periodic digest — run /compass:flow. See docs/quickstart.md.
Compass is built in three layers — see docs/methodology.md §9. The
methodology layer is the framework (plain markdown). The kit layer is the
deterministic mechanism (a plain CLI, PyYAML its one dependency, not
Claude-Code-specific). The adapter layer wires both into Claude Code.
compass/
├── governance/ Guardrails + strategies + routing policy — .md (prose)
│ AND .yml (the machine-readable governance the CLI runs;
│ including signals.yml and quarantine.yml)
├── architecture/ The project's cross-task architectural artifacts:
│ system-context.md, relations.md, ownership.md, and
│ ADRs in decisions/. Compass ships its own (the
│ framework's founding ADRs) as a worked example for
│ adopters; another project drops its own here
├── routes/ The Needle (router.md) + the 5 reference routes
├── schemas/ Executable JSON Schema (.schema.json) for the .yml +
│ task.yml, with human-readable .reference.yml companions
├── cli/ compass — the deterministic CLI (route evaluate, check,
│ tdd-red/green, lint, calibration, ci); the kit's mechanism
├── bin/ compass — plugin CLI shim that execs cli/compass.
│ Claude Code adds the plugin's bin/ to PATH when the
│ plugin is enabled, so `compass <subcommand>` resolves
│ without a manual symlink or alias
├── ci/ CI integration: the reference workflow + the contract
│ ("run compass ci, honour the exit code")
├── commands/ Slash commands: the pipeline + role entry points
├── agents/ Subagent definitions, including the swarm orchestrator
├── skills/ Procedural knowledge: routing, BDD, TDD, worktrees…
├── hooks/ Mechanical enforcement of the guardrails + TDD strategy
├── templates/ Artifact templates for every phase and role —
│ including task.yml, the machine-readable task spine
├── scripts/ install, swarm, integrate, validate
├── .claude-plugin/ Claude Code plugin manifest (plugin.json) +
│ marketplace manifest — the install path used by
│ `/plugin install`, parallel to scripts/install.sh
└── docs/ methodology.md is the canonical design doc — start there
The methodology layer is docs/, governance/*.md, routes/, templates/.
The kit layer is cli/, governance/*.yml, schemas/, and the task.yml
spine. The Claude Code adapter layer is commands/, agents/, skills/,
hooks/, CLAUDE.md — and the commands and agents call the kit.
docs/five-minutes.md— the shortest path from "what is this" to "I've shipped a task with it." Start here.docs/safety-contract.md— the seven things Compass 1.0 guarantees, and what it explicitly does not claim.docs/methodology.md— the canonical design doc. Everything else is downstream of it.docs/quickstart.md— your first task, per role.docs/install-smoke-test.md— manual install verification checklist.docs/security.md— hook surface, dependencies, supply-chain stance.docs/routing-deep-dive.md— how the Needle actually decides.docs/roles-guide.md— one scenario, seen four ways.docs/portability.md— the three layers, and what porting Compass to another runtime involves (rewrite the adapter; keep methodology and kit).schemas/README.md— the shape of the machine-readable files the CLI reads.
Apache 2.0. See LICENSE.