release(v1.1.0): fitness functions + intermittent-test integrity#10
Merged
Conversation
Delivers TRC-C1..C10, TRC-F3, TRC-F6 (12 scenarios) for task
comparison-requirements:
- cli/compass: new `next` subcommand (+160 lines) reading task.yml +
route.md, prints one-line phase/gate status; read-only; <200ms p95.
- CLAUDE.md: additive paragraph under "Never skip Frame" — Trigger Frame
on intent, not just the literal command. Includes the no-re-frame
clause (TRC-F3) so already-framed tasks proceed without re-Frame.
- agents/{spec-author,planner,builder,orchestrator}.md: one-sentence
intent-trigger reinforcement on each agent's description.
26 tests added (tests/next/, tests/invisible/), all green. No new
hooks, no new code surface for triggering — Q4's resolution honoured.
Co-Authored-By: Claude (compass:builder) <noreply@anthropic.com>
Implements `compass analyze` (cross-artifact coherence check), the `add_gate` floor property in routing-policy (RG-FLOOR-004/005), the `coherence-check` evidence type + `coherence-check-passes` CHECK_FN under G4, and the full 16-scenario test suite for stream A. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements all 13 stream-B scenarios (TRC-B1 through TRC-B11 and TRC-F2): - derive_system_spec() in cli/compass: walks .compass/work/*/task.yml for status==landed, applies supersession (same intent_id → latest-landed wins, older → archive appendix), writes docs/system-spec.md with DERIVED FILE header; idempotent and reconstructible (ADR-008). - compass _derive-system-spec --internal: private subparser hidden from compass --help via _choices_actions filter + metavar override (DD-4); --internal flag mandatory to prevent accidental shell invocation. - scripts/integrate.sh: post-combined-regression block writes status: landed + land_timestamp to task.yml via Python heredoc, then invokes the private subparser (DD-4, TRC-B11). - schemas/task.schema.json: added optional status (active|landed) and land_timestamp fields; schema_version 1.0 files still valid (Inv-8, DD-3). - templates/task.yml: bumped schema_version to "1.1", added status: active. - ADR-008: records the cross-task derived artifacts decision; added to architecture/decisions/README.md index. - tests/test_self_architecture.py: repaired test_adrs_cover_p1_to_p8 to accept >= 6 ADRs (founding six remain required; expansion ADRs allowed). 35 tests in tests/derive/ all green; compass policy lint PASS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…task derived artifacts) + comparison-requirements analysis Planning artifacts for the `comparison-requirements` task, captured before Build started: - architecture/decisions/ADR-007 — gates may be conditionally promoted from advisory to blocking via routing-policy floors; advisory gates write evidence but do not block Land - architecture/decisions/ADR-008 — cross-task derived artifacts are generated from landed task scenarios at Land time; the derivation is reconstructible, idempotent, and never a source-of-truth - docs/analysis/comparison-requirements.md — business-analysis pass over docs/proposals/comparison.md; converts the three shortlist candidates (compass analyze, living system spec, compass next + invisible triggering) into buildable BRs, scenarios, and NFRs Both ADRs are in 'proposed' status — they document architecturally-novel patterns (the third gate-lifecycle class for verify.analyze; the framework's first cross-task derived artifact). Promoted to 'accepted' after the build streams ship and the patterns prove out in practice.
# Conflicts: # architecture/decisions/ADR-008-cross-task-derived-artifacts.md
# Conflicts: # cli/compass
…ext.md headings Stream-B's test rewrite at Build introduced "## Boundaries" + "## Principles" expectations that don't match the file (which has "## Boundary conditions" and no Principles section). Aligned at Land integration of comparison-requirements (TRC-D5 honoured).
… ADR index
Closes the comparison-requirements task with everything green:
- tests/cross_cutting/test_stream_d_invariants.py — 10 cross-cutting
invariant tests (TRC-D1..D10) asserting on the integrated state of
streams A/B/C:
D1: five-point mental model unchanged
D2: only `analyze` and `next` added publicly (`_derive-system-spec` private)
D3: no tier ladder
D4: no new agent personas, role enum unchanged
D5: pipeline phases still flex by route
D6: immovable gates remain immovable
D7: TDD remains a strategy that Spike suspends
D8: bare-repo zero-setup preserved
D9: route evaluate deterministic
D10: no LLM SDK in cli/compass
- docs/system-spec.md — first run of the new living system spec
derivation (ADR-008). Includes the 51 scenarios from this task plus
a DERIVED FILE header.
- architecture/decisions/README.md — added ADR-007 to the index
alongside ADR-008 (Stream-B added ADR-008 via its branch merge).
Combined regression: 260 passed, 2 skipped.
`compass check`: PASS — all 11 checks pass.
All 7 gates closed (verify.correctness/governance/traceability/regression/
security/clarity/claims).
BF-1 paid (latency measurements confirm provisional targets).
task.yml.status: landed.
… failure.' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…k roles Amend S2 in governance/strategies.md to declare both purposes of TDD: governance role (satisfies G1) and design-feedback loop. Quotes Farley: "TDD is less about testing and more about good design." Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n evidence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…i-pattern Add "Listen to your tests" section to tdd-discipline explaining that a hard-to-write test is a design smell — change the design, not the test. Add "Test behaviour, not implementation" anti-pattern with the "swap the implementation — does the test survive?" check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… pass Add _source_tree_hash, _load_tdd_state, _save_tdd_state helpers and DD-7 rerun detection to cmd_tdd_green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1 implementation) TRC-A2 test was already passing because DD-7 rerun detection added in TRC-A1 handles both scenarios. Red step showed test passing (no separate red recorded for TRC-A2). Evidence: evidence/green-TRC-A2.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ls.yml Add design_smell top-level key to governance/signals.yml with three default patterns: tests dominated by mocking/setup, assertions on internal method calls, single test asserts more than one behaviour. Update schemas/signals.schema.json to accept the new optional property. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…aveat Add inline comment to guardrails.yml Q1 coverage-floor example and add a "Coverage as evidence" section to evidence-gates/SKILL.md explaining that coverage is a floor, never a target — a side effect of test discipline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…registry - governance/quarantine.yml: new file, shipped empty (DD-1) - governance/guardrails.yml: no-trusted-rerun check added to checks: and G4.checks; evidence_types.test-run.description updated per DD-9 - cli/compass: _check_no_trusted_rerun + _load_quarantine_registry + quarantine policy lint (_lint_errors_quarantine) added to CHECK_FNS and cmd_policy_lint Also covers TRC-A4, TRC-A5, TRC-FM3 (all exercised by TestNoTrustedRerunCheck). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r gates for B1 Add negative assertion tests verifying stream-C (B1) adds no new checks or gates to guardrails.yml. Add provenance comment to guardrails.yml confirming the constraint. B1 is judgement-side only — no new mechanism. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d-specification Add "Example-first refinement chain" section to bdd-specification skill: vague idea → concrete examples → acceptance criteria → executable spec. Includes ubiquitous language discipline and "should" naming prefix. Explicitly refuses user stories as a per-role spec format (ADR-004). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stream A — A1: Intermittent-test integrity
Scenarios: TRC-A1, TRC-A2, TRC-A3, TRC-A4, TRC-A5, TRC-A6, TRC-A7, TRC-FM2, TRC-FM3
Tests added: 21 (in tests/test_intermittent_tests.py)
Files changed: cli/compass, governance/guardrails.yml, governance/quarantine.yml (new),
governance/strategies.md, tests/test_intermittent_tests.py
Suite: 291 passed, 2 skipped; policy lint: PASS; task lint: PASS
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vidence-gates
Add "Pipeline stage vocabulary" section to evidence-gates/SKILL.md:
- verify.correctness as the acceptance/releasability gate ("anything that defines releasable")
- tdd-red/tdd-green loop as the commit stage ("anything that can fail fast")
- explicitly state Release and Production stages are out of scope (safety-contract guarantee 6)
- cite G4 as the standing falsification principle
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s functions (TRC-B1..B8, TRC-FM1) All 9 stream-B scenarios implemented and green (298 tests pass, 2 pre-existing skips). Changes: - cli/compass: add _check_command_passes (command-passes CHECK_FN); policy lint validates command-passes params on project guardrails only (G4 exempt) - governance/guardrails.yml: register command-passes check; add to G4.checks; add verify.fitness gate evidence requirements - governance/routing-policy.yml: add RG-FLOOR-006 (cross-cutting/critical) and RG-FLOOR-007 (irreversible domains) promoting verify.fitness from advisory to blocking - architecture/decisions/ADR-009-...: new ADR, status proposed — fitness functions are project guardrails, not framework guardrails - architecture/decisions/README.md, architecture/system-context.md, skills/evidence-gates/SKILL.md: reference ADR-009 and verify.fitness pattern - tests/test_fitness_functions.py: 12 tests (TRC-B1, B2, B3, B6, B7, FM1) - tests/test_verify_fitness_route_promotion.py: 16 tests (TRC-B4, B5, B8) - tests/fixtures/route-baseline.yml: expedition gains verify.fitness (RG-FLOOR-006 fires) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts: # cli/compass # governance/guardrails.yml
# Conflicts: # skills/evidence-gates/SKILL.md
…ks PASS Brings the Farley proposal's full shortlist into main: - A1 intermittent-test integrity (S5 strategy + no-trusted-rerun check on G4 + governance/quarantine.yml + tdd-green attempts/rerun_without_change) - A2 architectural fitness functions (command-passes check + verify.fitness gate + RG-FLOOR-006/007 + ADR-009) - B1 TDD-as-design rebalance (S2 dual-role + tdd-discipline Listen-to-tests + design_smell advisory signal + coverage-as-floor caveat) - C1 example-first refinement in bdd-specification skill - C2 commit-vs-acceptance vocabulary in evidence-gates skill Surface added stays within the USP-5 budget: 1 strategy, 1 gate, 2 checks, 1 evidence field, 1 signal category, 1 ADR. No sixth guardrail, no fifth routing dimension (ADR-002 honoured). Integration: A → B → C → D merged; combined regression 371 passed / 2 skip; final compass check PASS — all 13 checks (including the new no-trusted-rerun and command-passes that this task ships). Backfills paid: - BF-1: cmd_tdd_green now always records rerun_without_change when attempts > 1 (true/false per source-hash); stream-A TRC-A1 evidence regenerated. - BF-INTEGRATION: stream-D's 8 cross-cutting invariant tests added. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the Farley guidebook shortlist (farley-guidebook task) — additive mechanism only: - new strategy S5 — Intermittency is failure - new gate verify.fitness (route-promoted via RG-FLOOR-006/007) - new G4 checks no-trusted-rerun and command-passes - new ADR-009 — fitness functions as project guardrails - new evidence-record fields attempts + rerun_without_change - new signal category design_smell Semver minor: no breaking change; ADR-002 / ADR-006 honoured. Bumps version on five published surfaces: - VERSION - .claude-plugin/plugin.json - .claude-plugin/marketplace.json ($.metadata.version and $.plugins[0].version) - cli/compass COMPASS_VERSION Replaces the obsolete tests/test_version_bump_1_0_0.py with the parametrised 1.1.0 equivalent (TRC-1). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The .compass/work/farley-guidebook/ files for the framework's own task got pulled into main via the stream-B merge — but the framework repo's own .gitignore has /.compass/work/ root-anchored, deliberately. Comparison- requirements (the prior expedition) has zero tracked task files for the same reason: the framework dogfoods Compass on its own work, but the work- state itself is local audit-trail only, not part of the public framework artifact that ships to adopters. The partial-tracked state was the actual bug — task.yml referenced evidence paths (verification-report.md and stream-A/C/D green-*.json) that .gitignore keeps off-disk on a fresh clone. CI's `compass ci` then fails resolving the evidence pointers. Removes from index (keeps local copies): - task.yml - devlog.md - stream-B's bled-through evidence/*.json + *.log No content change; restores the gitignore's intent. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rephrases the substantive ideas in Compass's own voice; renames test_farley_invariants.py to test_release_invariants.py; updates the test that asserted the literal external quote to assert the framing instead. No behavioural change — all 372 tests still pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cuts Compass 1.1.0 — semver minor (additive mechanism, backward-compat preserved).
Lands five candidate suggestions:
no-trusted-rerun, newgovernance/quarantine.yml,cmd_tdd_greennow recordsattempts+rerun_without_change(source-tree SHA-256 detection)command-passes, new gateverify.fitness(route-promoted viaRG-FLOOR-006/RG-FLOOR-007), ADR-009 "Architectural fitness functions are project guardrails, not framework guardrails"tdd-disciplineskill gains "Listen to your tests" section + "test behaviour, not implementation" anti-pattern,signals.ymlgainsdesign_smelladvisory category, coverage-as-floor caveat inevidence-gatesskillbdd-specificationskill (user-stories refusal kept per ADR-004)evidence-gatesskill (Release/Production stages stay out of scope per safety-contract guarantee 6)Surface added stays within the USP-5 legibility budget: 1 strategy, 1 gate, 2 checks, 1 evidence-record field pair, 1 signal category, 1 ADR. No sixth guardrail, no fifth routing dimension — ADR-002 honoured.
Version bumped on all five published surfaces:
VERSION,.claude-plugin/plugin.json,.claude-plugin/marketplace.json($.metadata.versionand$.plugins[0].version), andcli/compassCOMPASS_VERSION.Test plan
self-checkpassescompass check --task farley-guidebookpasses — all 13 checks (including the newno-trusted-rerun,command-passes,dod-evidence-typed)compass check --task version-bump-1-1-0passes — all 13 checkscompass policy lintpasses — governance YAML structurally validv1.1.0pushed; GitHub release created withdist/compass-1.1.0.tar.gz(6.7 MB, 262 files)🤖 Generated with Claude Code