Skip to content

johnpatrickwarren-oss/sprag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sprag

A sprag is a one-way clutch — it lets a mechanism turn forward but locks against reverse motion. That's the invariant: architecture only moves forward, never rots back.

Enforce human-authored architectural invariants as a gate on every change, ratcheted against a baseline, so AI-built codebases don't silently rot (the k10s failure mode: https://blog.k10s.dev/im-going-back-to-writing-code-by-hand/). Mechanical + deterministic (no model), so no oracle-quality ceiling and no "who-verifies-the-verifier" problem.

And — the part nothing else does — when an AI agent is told "make the gate pass," it can't cheat by silencing the gate. node demo-threat-model.mjs watches it try, and fail:

  told "make the gate pass", the agent tries to silence the gate instead of fixing it:

  🛡  BLOCKED  raise the limit (maxLines 20 → 50)
  🛡  BLOCKED  re-baseline (0 → 1, grandfather the debt)
  🛡  BLOCKED  delete the rule
  🛡  BLOCKED  downgrade severity (block → warn)
  🛡  BLOCKED  stage-then-revert the relaxed config
  🛡  BLOCKED  kill the analysis engine

  Every bypass blocked; the gate can only be passed honestly. ✅

The only two ways to green: fix the code, or loosen it on the record (ARCH_ALLOW_RELAX=1, which still prints what it loosened). Full bypass catalogue + honest limits: THREAT-MODEL.md.

The part nothing else does: a gate that can't be silently weakened

Every quality gate has two silent-failure modes — and an AI agent told "make the gate pass" reaches for both. sprag is the only gate built to close them:

  • It can't die quietly. If the analysis engine can't load (not installed, wrong-platform binary, ABI mismatch), sprag fails closed — errors, exit 2 — instead of scoring 0 and passing everything. A no-op gate is the worst possible failure for a gate, so a dead engine must be loud, not invisible.
  • It can't be relaxed to pass. The meta-ratchet gates sprag's own config + baseline against a git ref: raising a max, dropping a rule, downgrading severity, or re-baselining upward blocks. Loosening is forward-only and visible — a deliberate, reviewed ARCH_ALLOW_RELAX=1 that still prints every relaxation. A Betterer snapshot can be --updated, an ESLint rule disabled, a baseline rewritten; sprag's ruleset can only move forward, or the change shows up blocked.

The ratchet itself is well-trodden (ArchUnit, Betterer, Sonar "Clean as You Code", Semgrep --baseline-commit) — sprag stands on that. What's uncontested, and what matters most when the author of the code is also trying to get past the gate, is gating the gate against the two ways it goes no-op: by accident (dead engine) and on purpose (relaxed config).

See THREAT-MODEL.md for the full bypass catalogue and a runnable, self-verifying proof — node demo-threat-model.mjs watches an agent try every shortcut and sprag block each one. The idea, in essay form: The gate that can't be weakened.

Install

npm i -g @johnpwarren.dev/sprag    # provides the `sprag` command
npx @johnpwarren.dev/sprag init <src-dir>

# Or straight from GitHub (no build step either way):
#   npm i -g github:johnpatrickwarren-oss/sprag

Or clone and run directly: git clone https://github.com/johnpatrickwarren-oss/sprag && cd sprag && npm install && node arch.mjs <cmd>.

Adopt on your repo (ratchet from where you are)

One unified CLI (arch, or node arch.mjs <cmd>):

npm install                                              # ast-grep engines
node arch.mjs init <src-dir>                             # scaffold generic invariants + baseline (lang auto-detected)
node arch.mjs check <src-dir> --invariants arch-invariants.json --baseline-in arch-invariants.baseline.json
node arch.mjs install-hook <repo-dir> <src-rel> arch-invariants.json   # pre-commit gate (blocks new debt vs HEAD)
node arch.mjs loop  <src-dir> --fixer "<your builder cmd>"             # AI-loop feedback gate
node arch.mjs trend <repo> <src-rel> --invariants arch-invariants.json # debt trend over history

init scaffolds the generic, no-tuning checks (god-files, god-functions, coupling fan-in); add project-specific tenets from library/tenets.json or any raw ast-grep rule via the ast_grep_rule check kind. The ratchet means you don't need perfect thresholds up front: start from your current state, and the gate refuses to make it worse.

Run

node arch-gate.mjs sample --baseline   # record the clean sample's metrics as the accepted baseline
node arch-gate.mjs sample              # check -> PASS (exit 0)
node test-arch-gate.mjs                # self-contained proof: passes clean, blocks 3 rot diffs (exit 0)

Exit codes: 0 pass · 3 blocked (rot) · 64 usage.

What it does

invariants.json declares human-authored invariants; arch-gate.mjs computes each metric over the codebase and blocks on (a) an absolute max breach or (b) a ratchet regression vs baseline.json (never-get-worse). Three invariants here map to the k10s tenets:

Invariant Check k10s tenet
model-not-god-object Model struct field count (max 8, ratchet) god object (1 & 2)
no-positional-rows magic integer-index access (ra[3]) count (max 0) positional fragility (4)
bounded-dispatch switch m.view case count (ratchet) central per-view dispatch (2)

The ratchet is the key idea: it blocks the 13th Model field, not the 30th — catching the trend as it's introduced, which is what would have flagged k10s at commit ~20 instead of at collapse (commit 234). Demo: adding one Model field is blocked at 6→7, before the absolute max.

Wiring it in (the gate stops rot before it lands)

Two integration points turn the checker into an enforcement gate:

Pre-commit hook — blocks a commit that makes architecture worse than HEAD (ratchet vs the last commit), so rot can't land:

./install-hook.sh <repo-dir> <src-rel-path>     # writes <repo>/.git/hooks/pre-commit
node test-precommit.mjs                          # proof: rot commit blocked + doesn't land; clean allowed

The hook computes the baseline from HEAD on each commit (no manual baseline upkeep). Bypass (discouraged) is git commit --no-verify.

AI-loop feedback gate — runs the gate; on a block, feeds the violation back to a pluggable fixer and re-checks, until it passes or escalates:

node arch-loop.mjs <dir> --fixer "<cmd>" [--max-iters 3]
node test-arch-loop.mjs                          # proof: converges on a good fixer, escalates on a stuck one

The fixer is a stub in tests; in real use it's a claude session (cwd = the code dir, reads ARCH_GATE_FEEDBACK / .arch-feedback.md) — reusing prototypes/verification-harness's watchdog-wrapped session runner. On escalation the loop does NOT relax the invariant — a fixer that can't satisfy the gate means the change is genuinely incompatible with the architecture, which is a human call (the inverse of the behavioral harness's phantom problem: here a stuck loop means the code is wrong, not the invariant).

Auditable suppressions (escape hatch that stays visible)

A per-occurrence check is suppressed on a line carrying // anchor:allow <invariant-id>: <reason>. The instance is not counted as a violation but is reported in a "Suppressions" section — so the escape hatch is visible and auditable, never silent. (A gate with no escape hatch gets disabled wholesale; one with untracked suppression rots silently — this is the middle path.) Metric/ratchet invariants are instead "suppressed" by deliberately re-recording the baseline.

n := legacyRow[2] // anchor:allow no-positional-rows: legacy CSV import, tracked in #123

Debt-trend report (the early warning k10s never had)

node arch-trend.mjs <repo> <src-rel> [--last N=20] [--json]
node test-arch-trend.mjs    # proof: surfaces a Model growing 6->10 over commits + flags the max breach

Walks git history, computes each invariant's metric at every commit, prints the trend, and flags where each invariant first breaches its max. This makes accumulating rot visible early — the article's author only discovered it at collapse because velocity hid the trend.

Engines, languages & extensibility

Checks are computed by a per-invariant engine (engine + lang fields):

  • ast-grepreal AST via @ast-grep/napi (+ @ast-grep/lang-go, @ast-grep/lang-python): Go, TypeScript/JS, and Python. Adding a language is a small parser adapter, not new gate logic.
  • heuristic (default) — lightweight text/brace parsing of Go-flavored source, no deps (fallback).

Built-in check kinds: struct_field_count, switch_case_count, magic_index_count, forbid_pattern, oversized_files, max_function_lines, max_complexity, module_fanin, scope_diff, forbid_path, time_bomb_tests, require_tests, dependency_count, unlocked_dependencies, secret_scan, config_relaxations (see Beyond structure below). For anything bespoke, ast_grep_rule takes a raw ast-grep rule object (matched on the top-level dir) and ast_grep_tree matches the same rule recursively, per file, over the whole tree — so a project can encode its own architectural rules in JSON, on a nested src/, with no code changes.

One check is behavioral, not structural: golden_outputs runs human-declared commands and diffs their output against committed golden files — catching the "AI refactor silently changed behavior" failure with a model-free oracle (the approved output). It executes the code, so unlike every check above it is opt-in / out-of-band (CI / pre-merge, like arch mutate), not the per-commit hot path, and the commands must be deterministic. Record goldens with ARCH_RECORD_GOLDEN=1; the committed golden diff is the auditable approval. It's the first behavioral rung — deeper behavior (property-based, metamorphic) is future work, and a model-based oracle is deliberately out of scope to keep sprag deterministic and unweakenable.

On line counts vs. complexity

Raw line count (max_function_lines, oversized_files) is a cheap proxy, and the specific number is a convention, not a law — a long-but-flat function is fine; a short, deeply-branched one is not. max_complexity is the less-arbitrary signal: it approximates cyclomatic complexity (1 + decision points + short-circuit &&/||) per function from the same AST parse — flagging branchy functions (the ones that are genuinely hard to follow and test; >~10 is the McCabe/NIST anchor), not merely long ones. Same zero-token, deterministic cost as max_function_lines.

Recommended pairing (what arch init scaffolds): max_complexity is the PRIMARY function gate (~12), max_function_lines is a coarse BACKSTOP set high (~150). They are different axes and neither subsumes the other: a short, branchy function (a 40-line, complexity-46 classifier) is invisible to any length rule, while a long, flat function (a 200-line data table or sequential builder with few branches) is invisible to complexity. Run both, but set the length bound high so it only catches genuinely-huge-but-flat functions and doesn't fight the complexity gate by flagging clean ~100-line functions. (Length alone is a trap: decomposing by length crushes the worst-complexity tail but tends to redistribute branches into more moderate functions rather than remove them — complexity is what actually tracks test/maintenance cost.)

What keeps any threshold from being tyrannical is the design, not the number: you author the limit for your codebase, the ratchet enforces "never worse" rather than a magic absolute, and a legitimate overrun is recorded with an auditable suppression (// anchor:allow <id>: <reason>) — visible, not silent. The gate surfaces candidates for judgment, not verdicts.

The checks below go beyond size — layering / dependency-direction and test discipline — classes the metric checks are blind to (learned by refactoring real repos where the rot lived there):

  • require_tests { dirs:[...] } — the deterministic shadow of TDD: flags source modules under dirs with no corresponding test (base-name match, layout-agnostic: foo.tsfoo.test.ts / foo_test.go / test_foo.py). Can't prove test-first, but enforces TDD's durable outcome — "no untested code ships" — as a ratchet (grandfather today's untested, block NEW). Excludes barrel index.* (override via exclude). Suppression-aware.

  • forbid_path { dirs:[...], path:'<regex>' } — flags files under dirs that reference a forbidden path in code (imports / fs reads, not comment citations). Encodes a dependency-direction invariant, e.g. "product (test/, src/) must not read process state (coordination/)". Catches the product-depends-on-process smell.

  • time_bomb_tests { dirs:['test'] } — flags tests that invoke git against a frozen reference (a pinned commit SHA, git diff <ref>..HEAD, --name-only anti-scope diffs, git show <sha> byte-identity). These can only rot — once HEAD moves past the round they fail regardless of product correctness — so the discipline belongs in a round-aware gate (see anti-scope-gate.sh), not the permanent suite. The signal requires both a git invocation and a frozen-ref marker, so a product SHA-256 hash test that never touches git is not falsely flagged.

Beyond structure: supply chain, types, secrets — and the gate's own integrity

Architectural rot is one failure mode of AI-built code, not the only one. These deterministic, model-free, offline checks cover the rest — each as a ratchet or an absolute max: 0, all suppression-aware:

  • Supply chain. dependency_count { manifest, include } ratchets the declared dependency surface (npm / go.mod / requirements.txt) so it can't grow without a deliberate re-baseline. unlocked_dependencies { manifest, lockfile, allow } flags a dep declared but absent from the lockfile — the offline fingerprint of a hallucinated / slopsquatted package, caught with no registry call.
  • Type integrity (ast_grep_tree, recursive). no-new-any (any type), no-non-null-assertion (the x! operator), no-ts-ignore (@ts-ignore / @ts-nocheck directives — not @ts-expect-error, which self-removes). The #1 ways AI silences the type system instead of fixing it.
  • Secrets. secret_scan { dirs } flags an inlined credential (provider key shapes + private-key blocks + a guarded generic secret="…" rule that excludes env refs / placeholders / low-entropy). Tracked files only — a gitignored .env is correctly invisible. High-precision by design (a max:0 gate can't cry wolf), so it's the always-on floor, not a replacement for a dedicated scanner.
  • The gate's own integrity. Two failure modes a gate has that nothing else watches:
    • Fail closed. If the ast-grep engine can't load (not installed, wrong-platform binary, ABI mismatch) the gate errors (exit 2) instead of silently scoring 0 and passing everything — a no-op gate is the worst possible failure for a gate.
    • No silent relaxation (config_relaxations { invariants, baseline, against, from }, the meta-ratchet). The config + baseline may only move forward (stricter) vs a git ref. It blocks every way an agent can make a violation vanish without fixing the code: a raised max, dropped rule, downgraded severity, raised baseline, a removed baseline floor for a still-active rule, or a grown exemption list (adding the hallucinated dep to allow, the new dir to allowed, the untested module to exclude). Because the rule lives in the set it guards, deleting it counts too. In a pre-commit hook, from: "index" checks what's actually being committed (the staged config) — closing the stage-a-relaxation-then-revert-the-working-file trick. A deliberate, reviewed loosening goes through ARCH_ALLOW_RELAX=1 — still printed, never silent. Together with fail-closed these stop the gate becoming a no-op either by accident (dead engine) or on purpose (relaxed config).

sprag enforces all of the above on itself (invariants.harness.json, run over the whole repo by the dogfood test on every npm test).

Starter tenet library

library/tenets.json ships the k10s 5 tenets (T1–T5), 2 layering/test-rot invariants (L1–L2), the complexity gate (Q1), the supply-chain pair (S1–S2), type-strictness (TS1–TS3), secrets (SEC1) and the meta-ratchet (M1) as ready-to-enable invariants. Copy the ones you want into your invariants.json and tune. See library/README.md.

Working with the gate (and the behavioral half)

sprag enforces the deterministic half of quality. The behavioral half — the disciplines that fight a strong model's defaults (cold-eye review, spec-first contract, …) — lives in its companion Anchor, whose DISCIPLINES.md is the pairing for this gate. sprag deliberately carries no behavioral-methodology doc of its own — that would just be a second copy that drifts.

What sprag does ship is library/working-with-the-gate.md: the gate-coupled habits — author the invariants first, write the test with the code (require_tests enforces it; arch property validates a behavioral one), and run the gate before done (don't --no-verify; loosen only on the record). arch init drops it as arch-gate-usage.md; reference it from your CLAUDE.md (@arch-gate-usage.md).

Test efficacy, not test count (arch mutate)

Requiring tests can devolve into test theater — more tests that don't catch more bugs (the classic "2× tests, no better results"). The count-independent answer is mutation testing: flip an operator (&&||, >=>, truefalse…), re-run your suite, and see if a test fails. A mutant that survives is a bug your tests can't catch — a real gap that line-count and even line-coverage are blind to.

arch mutate <dir> --test "npm test" --since main      # mutate only files changed vs main (incremental)
arch mutate <dir> --test "node --test test/*.test.mjs" --all --threshold 70   # full baseline run
arch mutate . --all --test "npm test" --exclude 'corpus/**,test-*.mjs'   # skip fixtures + non-.test. suites

It mutates changed source files only by default (git diff), runs your test command per mutant, and gates on the kill rate. It is deterministic — zero model tokens — but heavy (mutants × suite runtime, not offset by having fewer tests). So it's opt-in and out-of-band: run it in CI / nightly / pre-merge, not as the per-commit gate. The cheap AST checks (complexity, require_tests, god-files) stay on the hot path; mutate is the periodic audit that the tests you do have are worth keeping.

Test files are auto-skipped when they use the .test./.spec. convention; use --exclude <globs> for anything the heuristic misses — in-repo fixtures (deliberately-broken code used as test inputs) and tests named another way (test-*.mjs, etc.). Globs are matched against the repo-relative path (* = within a segment, ** = across). Mutating a fixture or a test file measures nothing, so excluding them keeps the score honest.

Wired into CI (.github/workflows/mutate.yml): every PR runs an incremental mutate over just the source it changed (gating — new code must ship tests that kill its mutants), and a weekly schedule runs the full baseline as a report. This repo dogfoods it.

Rightsizing tests: don't gate on count or a coverage-% target (both reward theater). The amount of testing a function needs is bounded by its cyclomatic complexity (max_complexity caps it → caps the tests needed); require_tests ensures presence; arch mutate confirms the tests that exist actually catch bugs. More tests is never the goal — bug-catching tests are.

Honest scope

  • Mechanical + deterministic (no model) → no "who-verifies-the-verifier" problem; invariants are human-authored (the article's Tenet 1).
  • Real AST on Go, TypeScript/JS, and Python via ast-grep (@ast-grep/napi + @ast-grep/lang-go + @ast-grep/lang-python); the heuristic Go engine remains a no-dep fallback. More languages = a parser adapter, not new gate logic.
  • Generic, no-tuning checks (work on any repo): oversized_files (God file), max_function_lines (God function), module_fanin (a module imported by too many files — the k10s "everything depends on the God object" coupling smell).
  • Remaining (design §12): more generic metrics; richer scope_diff; broader real-repo trials.

Tests

npm test → 14 self-contained suites, covering: gate+ratchet+scope, pre-commit hook, AI-loop (converge/escalate), debt-trend, the generic God-file/God-function/fan-in checks, the custom ast_grep_rule DSL, init scaffolding, real-AST on TypeScript / Go (incl. goroutine-mutation) / Python, scope-dirs, and a dogfood suite that runs the gate on its own source (the tool has no God files/functions/hubs by its own checks).

About

Enforce human-authored architectural invariants as a ratcheted gate, so AI-built codebases don't silently rot. Real-AST (Go/TS/JS/Python) + heuristic; pre-commit + AI-loop + debt-trend; god-object/coupling + layering + no-rotting-tests checks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors