v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation#1907
Merged
Conversation
…om it Single source of truth for the carved-skill set + per-skill invariants (EQ1). parity-harness.ts sectioned entries and skill-size-budget.ts SECTIONS_EXTRACTED now derive from it instead of hand-maintained lists. Closes a pre-existing drift: plan-devex-review was in SECTIONS_EXTRACTED but had no sectioned parity invariant; now generated. carve-guards.ts is a pure leaf data module (import type only) to avoid an import cycle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
discoverCarvedSkills/checkOrdering/checkCompleteness take a root param so the negative tests can point the real guards at a fixture dir. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-PR backstop for every carved skill, one test() per skill, driven by CARVE_GUARDS staticInvariants. Generalizes + retires the ceo-specific ordering test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Asserts filesystem carved set == CARVE_GUARDS set both directions, so a future carve without a registry entry fails CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Temp fixture broken 3 ways proves E1/E2 actually throw, via the injectable root. Kills the silent-pass-guard failure class. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One file iterating CARVE_GUARDS, one test() per skill with GSTACK_CARVE_SKILL cost-scoping (D-CODEX A). external carves (ship, plan-ceo) keep bespoke tests; testNames aligned to their touchfile keys. Registered in touchfiles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Steps 2-9 (per-file audit, auto-updates, risky-change asks, CHANGELOG voice polish, cross-doc consistency, TODOS cleanup, VERSION bump, commit + PR body) move to sections/release-body.md, read on demand after the Step 1.5 coverage map. Skeleton 59,256 -> 45,797 B (-23%); union preserved. Adds the CARVE_GUARDS entry (auto-extends parity + size-budget via EQ1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phases 3-6 (complete proposal, drill-downs, design preview, writing DESIGN.md) move to sections/proposal-and-preview.md, read on demand after product context + research. Skeleton 80,719 -> 59,229 B (-27%); union preserved. Adds the CARVE_GUARDS entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scope-dependent audit Phases 2-11 move to sections/audit-phases.md. Mode dispatch (## Arguments, ## Mode Resolution), always-run Phases 0/1, and the Phase 12 false-positive-filtering exceptions stay ALWAYS-LOADED in the skeleton. Skeleton 79,383 -> 65,117 B (-18%); union preserved. Adds a cso CARVE_GUARDS entry with an earliest-use invariant (mustPrecedeStop): mode dispatch must appear before any STOP-Read, so a directive that decides which sections to read can't be stranded behind the STOP that reads them (codex outside-voice #6). carve-guard-checks gains the mustPrecedeStop check. parity moves cso monolith -> generated carved entry. cso-preserved.test.ts strengthened: phrases checked against the union, plus an always-loaded contract on the skeleton (dispatch + FP-filtering, codex #5). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lease carves The cso carve moved Secrets Archaeology (prefixes, lib/redact-patterns.ts pointer, git-history scan) into sections/audit-phases.md, and the document-release carve moved the Step 9 PR-body redaction scan into sections/release-body.md. Three content-presence tests asserted that content in the skeleton SKILL.md/.md.tmpl; they now read the skeleton+sections union (same fix as cso-preserved + parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- cso section: add a scope-gate header so '--owasp' (and other scoped modes)
run only their selected phases, not every phase bundled in the section
('execute in full' no longer overrides Mode Resolution).
- carve-guard-checks: gateAfterStop now compares against the LAST STOP, not the
first, so a gate stranded between two STOPs in a multi-STOP skeleton fails.
- TODOS: behavioral section-loading hermeticity (verifier matches global-install
path, not the fixture) — pre-existing in auq-sdk-capture.ts, deferred.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
E2E Evals: ❌ FAIL71/73 tests passed | $11.98 total cost | 12 parallel runners
12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite Failures
|
garrytan
added a commit
that referenced
this pull request
Jun 8, 2026
…gistry main shipped a generalized carve-guard system (PR #1907) that is now the single source of truth for carved-skill skeleton invariants. Register the PR-title rule there instead of a standalone test: ship's mustStayInSkeleton asserts v$NEW_VERSION + the rewrite helper stay always-loaded, and mustMoveToSection asserts both the create and update PR paths stay carved into pr-body.md (present in the union, out of the skeleton). Delete the standalone ship-pr-title-version-always-loaded test it replaces. The CI-workflow safety tripwire stays standalone (not a carve concern). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5 tasks
garrytan
added a commit
that referenced
this pull request
Jun 8, 2026
…tle-sync backstop (#1909) * fix(ship): restore always-loaded PR-title-version invariant to skeleton The v1.54.0.0 carve moved the 'PR title MUST start with v$NEW_VERSION' rule out of the always-loaded ship skeleton and entirely into the lazily-loaded pr-body.md section. The agent only set the version prefix if it happened to read that section before creating the PR, so PRs landed with bare titles. Restore a one-line invariant (+ helper reference) to ship/SKILL.md.tmpl right before the {{SECTION:pr-body}} pointer, mirroring the AUQ always-loaded precedent. Full procedure stays sectioned. Regenerated all hosts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): guard PR-title-version rule + pull_request_target safety Two free gate tests so a future carve or workflow refactor can't silently regress: - ship-pr-title-version-always-loaded: asserts the invariant lives in the always-loaded ship/SKILL.md skeleton (not only sections/), and that the skeleton+sections union keeps BOTH the create and the existing-PR update title paths. Modeled on test/auq-format-always-loaded.test.ts. - pr-title-sync-workflow-safety: static tripwire that fails CI if pr-title-sync.yml checks out PR-head code or inlines an attacker-controlled ${{ github.event.pull_request.* }} field inside a run: block (the two pull_request_target footguns actionlint cannot catch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ci): pr-title-sync covers fork PRs via hardened pull_request_target Under plain pull_request the GITHUB_TOKEN is read-only on fork PRs, so the title-sync backstop could never edit a fork/agent PR title. Switch to pull_request_target (write token in base context) and make it safe: - Check out the base repo only (no ref:) — execute trusted infra, never fork-head code. - All attacker-controlled PR fields (title, head repo, head sha) pass via env: and are referenced as shell-quoted "$VAR", never inlined into run:. - Read the PR-head VERSION as data (raw media type) from the head repo at the head sha; guard the assignment under set -e. - Same-repo read failure fails loudly; fork miss warns and skips (the backstop stays green without going silently optional). - Never echo the raw fork title (Actions parses ::workflow-command:: from stdout). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ship): expand binDir path in pr-body Linked Spec block ship/sections/pr-body.md.tmpl:98-99 used ${ctx.paths.binDir}, but the gen-skill-docs generator only resolves {{TOKEN}} syntax in .tmpl files — the ${...} JS-template-literal form is substituted only inside .ts resolver files. So the token passed through literally into the generated pr-body.md, leaving the agent with an unexpandable ${ctx.paths.binDir}/gstack-paths command in the Linked Spec auto-detect block. Use the hardcoded helper path, consistent with every other path reference in this section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(test): fold ship PR-title skeleton guard into carve-guard registry main shipped a generalized carve-guard system (PR #1907) that is now the single source of truth for carved-skill skeleton invariants. Register the PR-title rule there instead of a standalone test: ship's mustStayInSkeleton asserts v$NEW_VERSION + the rewrite helper stay always-loaded, and mustMoveToSection asserts both the create and update PR paths stay carved into pr-body.md (present in the union, out of the skeleton). Delete the standalone ship-pr-title-version-always-loaded test it replaces. The CI-workflow safety tripwire stays standalone (not a carve concern). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.57.3.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slims three more heavyweight skills and builds the safety net that makes carving safe to keep doing.
Carve-guard system (PR1 — guard machinery):
test/helpers/carve-guards.tsregistry: one source of truth for which skills are carved + each skill's invariants, required reads, and skeleton-size floor.parity-harness.tsandskill-size-budget.tsnow derive their carved-skill lists from it (closed a pre-existingplan-devex-reviewparity gap).GSTACK_CARVE_SKILLcost-scoping), E1 completeness meta-guard that fails CI if a carved skill lacks its guards, and ET1 guard-of-guards negative tests proving the guards actually fire.Three new carves (PR2):
document-release(skeleton 59,256 → 45,797 B, −23%),design-consultation(80,719 → 59,229 B, −27%),cso(79,383 → 65,117 B, −18%). All content preserved in on-demand sections.csocarved security-safe: mode dispatch, always-run phases, and false-positive-filtering exceptions stay always-loaded; an earliest-use invariant + a section scope-gate keep scoped modes (e.g.--owasp) from running unselected phases.Test Coverage
Test + skill-template diff — no application code paths. The carve guards ARE the coverage for the template changes: 6 → 9 carved skills now fully guarded (static gate + behavioral periodic + completeness meta-guard + parity union floor + size-budget). Full deterministic suite green (3,373 pass); the only failure is the unrelated
setup-emoji-fontenvironmental flake (284s font-resolution hang, untouched by this branch).Pre-Landing Review
Plan-stage: CEO + Eng + Codex (×2) all CLEARED before implementation. Diff-stage adversarial codex pass found 3 issues:
cso --owaspcould run non-OWASP phases ("execute in full" over a section bundling all phases) → added a scope-gate header to the section.gateAfterStopcheck compared against the first STOP, not the last → now uses the last STOP.auq-sdk-capture.ts(affects existing ship/ceo section-loading tests too).Plan Completion
All plan items DONE. PR1 (T1–T6: registry + refactor, E2, E1, ET1, T2, E3-TODO) and PR2 (T7–T9 carves + T10 CHANGELOG/VERSION) shipped. Plan:
~/.claude/plans/system-instruction-you-are-working-jaunty-goose.md.TODOS
Added two deferred items: real-session section-read canary (E3), and behavioral-test hermeticity hardening (from the codex review).
Test plan
🤖 Generated with Claude Code