release(v1.1.0): fitness functions + intermittent-test integrity by jed72 · Pull Request #10 · jed72/compass

jed72 · 2026-05-26T06:16:28Z

Summary

Cuts Compass 1.1.0 — semver minor (additive mechanism, backward-compat preserved).

Lands five candidate suggestions:

A1 — Intermittent-test integrity: new strategy S5 "Intermittency is failure", new G4 check no-trusted-rerun, new governance/quarantine.yml, cmd_tdd_green now records attempts + rerun_without_change (source-tree SHA-256 detection)
A2 — Architectural fitness functions: new G4 check command-passes, new gate verify.fitness (route-promoted via RG-FLOOR-006/RG-FLOOR-007), ADR-009 "Architectural fitness functions are project guardrails, not framework guardrails"
B1 — TDD-as-design rebalance: S2 amended to name both governance and design-feedback roles, tdd-discipline skill gains "Listen to your tests" section + "test behaviour, not implementation" anti-pattern, signals.yml gains design_smell advisory category, coverage-as-floor caveat in evidence-gates skill
C1 — Example-first refinement in bdd-specification skill (user-stories refusal kept per ADR-004)
C2 — Commit-vs-acceptance vocabulary in evidence-gates skill (Release/Production stages stay out of scope per safety-contract guarantee 6)

Surface added stays within the USP-5 legibility budget: 1 strategy, 1 gate, 2 checks, 1 evidence-record field pair, 1 signal category, 1 ADR. No sixth guardrail, no fifth routing dimension — ADR-002 honoured.

Version bumped on all five published surfaces: VERSION, .claude-plugin/plugin.json, .claude-plugin/marketplace.json ($.metadata.version and $.plugins[0].version), and cli/compass COMPASS_VERSION.

Test plan

CI self-check passes
Full pytest suite green: 372 passed, 2 skipped locally
compass check --task farley-guidebook passes — all 13 checks (including the new no-trusted-rerun, command-passes, dod-evidence-typed)
compass check --task version-bump-1-1-0 passes — all 13 checks
compass policy lint passes — governance YAML structurally valid
After merge: tag v1.1.0 pushed; GitHub release created with dist/compass-1.1.0.tar.gz (6.7 MB, 262 files)

🤖 Generated with Claude Code

Delivers TRC-C1..C10, TRC-F3, TRC-F6 (12 scenarios) for task comparison-requirements: - cli/compass: new `next` subcommand (+160 lines) reading task.yml + route.md, prints one-line phase/gate status; read-only; <200ms p95. - CLAUDE.md: additive paragraph under "Never skip Frame" — Trigger Frame on intent, not just the literal command. Includes the no-re-frame clause (TRC-F3) so already-framed tasks proceed without re-Frame. - agents/{spec-author,planner,builder,orchestrator}.md: one-sentence intent-trigger reinforcement on each agent's description. 26 tests added (tests/next/, tests/invisible/), all green. No new hooks, no new code surface for triggering — Q4's resolution honoured. Co-Authored-By: Claude (compass:builder) <noreply@anthropic.com>

Implements `compass analyze` (cross-artifact coherence check), the `add_gate` floor property in routing-policy (RG-FLOOR-004/005), the `coherence-check` evidence type + `coherence-check-passes` CHECK_FN under G4, and the full 16-scenario test suite for stream A. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements all 13 stream-B scenarios (TRC-B1 through TRC-B11 and TRC-F2): - derive_system_spec() in cli/compass: walks .compass/work/*/task.yml for status==landed, applies supersession (same intent_id → latest-landed wins, older → archive appendix), writes docs/system-spec.md with DERIVED FILE header; idempotent and reconstructible (ADR-008). - compass _derive-system-spec --internal: private subparser hidden from compass --help via _choices_actions filter + metavar override (DD-4); --internal flag mandatory to prevent accidental shell invocation. - scripts/integrate.sh: post-combined-regression block writes status: landed + land_timestamp to task.yml via Python heredoc, then invokes the private subparser (DD-4, TRC-B11). - schemas/task.schema.json: added optional status (active|landed) and land_timestamp fields; schema_version 1.0 files still valid (Inv-8, DD-3). - templates/task.yml: bumped schema_version to "1.1", added status: active. - ADR-008: records the cross-task derived artifacts decision; added to architecture/decisions/README.md index. - tests/test_self_architecture.py: repaired test_adrs_cover_p1_to_p8 to accept >= 6 ADRs (founding six remain required; expansion ADRs allowed). 35 tests in tests/derive/ all green; compass policy lint PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…task derived artifacts) + comparison-requirements analysis Planning artifacts for the `comparison-requirements` task, captured before Build started: - architecture/decisions/ADR-007 — gates may be conditionally promoted from advisory to blocking via routing-policy floors; advisory gates write evidence but do not block Land - architecture/decisions/ADR-008 — cross-task derived artifacts are generated from landed task scenarios at Land time; the derivation is reconstructible, idempotent, and never a source-of-truth - docs/analysis/comparison-requirements.md — business-analysis pass over docs/proposals/comparison.md; converts the three shortlist candidates (compass analyze, living system spec, compass next + invisible triggering) into buildable BRs, scenarios, and NFRs Both ADRs are in 'proposed' status — they document architecturally-novel patterns (the third gate-lifecycle class for verify.analyze; the framework's first cross-task derived artifact). Promoted to 'accepted' after the build streams ship and the patterns prove out in practice.

# Conflicts: # architecture/decisions/ADR-008-cross-task-derived-artifacts.md

# Conflicts: # cli/compass

…ext.md headings Stream-B's test rewrite at Build introduced "## Boundaries" + "## Principles" expectations that don't match the file (which has "## Boundary conditions" and no Principles section). Aligned at Land integration of comparison-requirements (TRC-D5 honoured).

… ADR index Closes the comparison-requirements task with everything green: - tests/cross_cutting/test_stream_d_invariants.py — 10 cross-cutting invariant tests (TRC-D1..D10) asserting on the integrated state of streams A/B/C: D1: five-point mental model unchanged D2: only `analyze` and `next` added publicly (`_derive-system-spec` private) D3: no tier ladder D4: no new agent personas, role enum unchanged D5: pipeline phases still flex by route D6: immovable gates remain immovable D7: TDD remains a strategy that Spike suspends D8: bare-repo zero-setup preserved D9: route evaluate deterministic D10: no LLM SDK in cli/compass - docs/system-spec.md — first run of the new living system spec derivation (ADR-008). Includes the 51 scenarios from this task plus a DERIVED FILE header. - architecture/decisions/README.md — added ADR-007 to the index alongside ADR-008 (Stream-B added ADR-008 via its branch merge). Combined regression: 260 passed, 2 skipped. `compass check`: PASS — all 11 checks pass. All 7 gates closed (verify.correctness/governance/traceability/regression/ security/clarity/claims). BF-1 paid (latency measurements confirm provisional targets). task.yml.status: landed.

… failure.' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…k roles Amend S2 in governance/strategies.md to declare both purposes of TDD: governance role (satisfies G1) and design-feedback loop. Quotes Farley: "TDD is less about testing and more about good design." Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n evidence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…i-pattern Add "Listen to your tests" section to tdd-discipline explaining that a hard-to-write test is a design smell — change the design, not the test. Add "Test behaviour, not implementation" anti-pattern with the "swap the implementation — does the test survive?" check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… pass Add _source_tree_hash, _load_tdd_state, _save_tdd_state helpers and DD-7 rerun detection to cmd_tdd_green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…1 implementation) TRC-A2 test was already passing because DD-7 rerun detection added in TRC-A1 handles both scenarios. Red step showed test passing (no separate red recorded for TRC-A2). Evidence: evidence/green-TRC-A2.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ls.yml Add design_smell top-level key to governance/signals.yml with three default patterns: tests dominated by mocking/setup, assertions on internal method calls, single test asserts more than one behaviour. Update schemas/signals.schema.json to accept the new optional property. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…aveat Add inline comment to guardrails.yml Q1 coverage-floor example and add a "Coverage as evidence" section to evidence-gates/SKILL.md explaining that coverage is a floor, never a target — a side effect of test discipline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…registry - governance/quarantine.yml: new file, shipped empty (DD-1) - governance/guardrails.yml: no-trusted-rerun check added to checks: and G4.checks; evidence_types.test-run.description updated per DD-9 - cli/compass: _check_no_trusted_rerun + _load_quarantine_registry + quarantine policy lint (_lint_errors_quarantine) added to CHECK_FNS and cmd_policy_lint Also covers TRC-A4, TRC-A5, TRC-FM3 (all exercised by TestNoTrustedRerunCheck). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…r gates for B1 Add negative assertion tests verifying stream-C (B1) adds no new checks or gates to guardrails.yml. Add provenance comment to guardrails.yml confirming the constraint. B1 is judgement-side only — no new mechanism. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d-specification Add "Example-first refinement chain" section to bdd-specification skill: vague idea → concrete examples → acceptance criteria → executable spec. Includes ubiquitous language discipline and "should" naming prefix. Explicitly refuses user stories as a per-role spec format (ADR-004). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Stream A — A1: Intermittent-test integrity Scenarios: TRC-A1, TRC-A2, TRC-A3, TRC-A4, TRC-A5, TRC-A6, TRC-A7, TRC-FM2, TRC-FM3 Tests added: 21 (in tests/test_intermittent_tests.py) Files changed: cli/compass, governance/guardrails.yml, governance/quarantine.yml (new), governance/strategies.md, tests/test_intermittent_tests.py Suite: 291 passed, 2 skipped; policy lint: PASS; task lint: PASS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…vidence-gates Add "Pipeline stage vocabulary" section to evidence-gates/SKILL.md: - verify.correctness as the acceptance/releasability gate ("anything that defines releasable") - tdd-red/tdd-green loop as the commit stage ("anything that can fail fast") - explicitly state Release and Production stages are out of scope (safety-contract guarantee 6) - cite G4 as the standing falsification principle Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s functions (TRC-B1..B8, TRC-FM1) All 9 stream-B scenarios implemented and green (298 tests pass, 2 pre-existing skips). Changes: - cli/compass: add _check_command_passes (command-passes CHECK_FN); policy lint validates command-passes params on project guardrails only (G4 exempt) - governance/guardrails.yml: register command-passes check; add to G4.checks; add verify.fitness gate evidence requirements - governance/routing-policy.yml: add RG-FLOOR-006 (cross-cutting/critical) and RG-FLOOR-007 (irreversible domains) promoting verify.fitness from advisory to blocking - architecture/decisions/ADR-009-...: new ADR, status proposed — fitness functions are project guardrails, not framework guardrails - architecture/decisions/README.md, architecture/system-context.md, skills/evidence-gates/SKILL.md: reference ADR-009 and verify.fitness pattern - tests/test_fitness_functions.py: 12 tests (TRC-B1, B2, B3, B6, B7, FM1) - tests/test_verify_fitness_route_promotion.py: 16 tests (TRC-B4, B5, B8) - tests/fixtures/route-baseline.yml: expedition gains verify.fitness (RG-FLOOR-006 fires) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts: # cli/compass # governance/guardrails.yml

# Conflicts: # skills/evidence-gates/SKILL.md

…ation'

…ks PASS Brings the Farley proposal's full shortlist into main: - A1 intermittent-test integrity (S5 strategy + no-trusted-rerun check on G4 + governance/quarantine.yml + tdd-green attempts/rerun_without_change) - A2 architectural fitness functions (command-passes check + verify.fitness gate + RG-FLOOR-006/007 + ADR-009) - B1 TDD-as-design rebalance (S2 dual-role + tdd-discipline Listen-to-tests + design_smell advisory signal + coverage-as-floor caveat) - C1 example-first refinement in bdd-specification skill - C2 commit-vs-acceptance vocabulary in evidence-gates skill Surface added stays within the USP-5 budget: 1 strategy, 1 gate, 2 checks, 1 evidence field, 1 signal category, 1 ADR. No sixth guardrail, no fifth routing dimension (ADR-002 honoured). Integration: A → B → C → D merged; combined regression 371 passed / 2 skip; final compass check PASS — all 13 checks (including the new no-trusted-rerun and command-passes that this task ships). Backfills paid: - BF-1: cmd_tdd_green now always records rerun_without_change when attempts > 1 (true/false per source-hash); stream-A TRC-A1 evidence regenerated. - BF-INTEGRATION: stream-D's 8 cross-cutting invariant tests added. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lands the Farley guidebook shortlist (farley-guidebook task) — additive mechanism only: - new strategy S5 — Intermittency is failure - new gate verify.fitness (route-promoted via RG-FLOOR-006/007) - new G4 checks no-trusted-rerun and command-passes - new ADR-009 — fitness functions as project guardrails - new evidence-record fields attempts + rerun_without_change - new signal category design_smell Semver minor: no breaking change; ADR-002 / ADR-006 honoured. Bumps version on five published surfaces: - VERSION - .claude-plugin/plugin.json - .claude-plugin/marketplace.json ($.metadata.version and $.plugins[0].version) - cli/compass COMPASS_VERSION Replaces the obsolete tests/test_version_bump_1_0_0.py with the parametrised 1.1.0 equivalent (TRC-1). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The .compass/work/farley-guidebook/ files for the framework's own task got pulled into main via the stream-B merge — but the framework repo's own .gitignore has /.compass/work/ root-anchored, deliberately. Comparison- requirements (the prior expedition) has zero tracked task files for the same reason: the framework dogfoods Compass on its own work, but the work- state itself is local audit-trail only, not part of the public framework artifact that ships to adopters. The partial-tracked state was the actual bug — task.yml referenced evidence paths (verification-report.md and stream-A/C/D green-*.json) that .gitignore keeps off-disk on a fresh clone. CI's `compass ci` then fails resolving the evidence pointers. Removes from index (keeps local copies): - task.yml - devlog.md - stream-B's bled-through evidence/*.json + *.log No content change; restores the gitignore's intent. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rephrases the substantive ideas in Compass's own voice; renames test_farley_invariants.py to test_release_invariants.py; updates the test that asserted the literal external quote to assert the framing instead. No behavioural change — all 372 tests still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jed72 and others added 30 commits May 25, 2026 14:02

compass(comparison-requirements): integrate stream-A-analyze

97f73ce

Merge branch 'compass/comparison-requirements/stream-B-living-spec'

9b9d348

# Conflicts: # architecture/decisions/ADR-008-cross-task-derived-artifacts.md

Merge stream-C-next-invisible into main

1638049

# Conflicts: # cli/compass

build(stream-A): TRC-A6 — strategies.md declares S5 'Intermittency is…

1bb3b62

… failure.' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

build(stream-A): TRC-FM2 — task lint rejects attempts <= 0 on test-ru…

e7e81a9

…n evidence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

build(stream-A): TRC-A1 — tdd-green records attempts:1 on clean first…

ae9426f

… pass Add _source_tree_hash, _load_tdd_state, _save_tdd_state helpers and DD-7 rerun detection to cmd_tdd_green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'compass/farley-guidebook/stream-A-intermittent-tests'

9d38633

Merge branch 'compass/farley-guidebook/stream-B-fitness-functions'

8141331

# Conflicts: # cli/compass # governance/guardrails.yml

Merge branch 'compass/farley-guidebook/stream-C-tdd-design-rebalance'

68a7662

# Conflicts: # skills/evidence-gates/SKILL.md

build(stream-D): cross-cutting integration tests — TRC-F1..F7, FM4

36a871d

Merge branch 'compass/farley-guidebook/stream-D-cross-cutting-verific…

164cce9

…ation'

jed72 and others added 2 commits May 26, 2026 08:58

jed72 changed the title ~~release(v1.1.0): Farley guidebook shortlist~~ release(v1.1.0): fitness functions + intermittent-test integrity May 26, 2026

jed72 merged commit f9769b4 into main May 26, 2026
1 check passed

jed72 deleted the release/v1.1.0 branch May 26, 2026 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release(v1.1.0): fitness functions + intermittent-test integrity#10

release(v1.1.0): fitness functions + intermittent-test integrity#10
jed72 merged 32 commits into
mainfrom
release/v1.1.0

jed72 commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jed72 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jed72 commented May 26, 2026 •

edited

Loading