Skip to content

v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation#1907

Merged
garrytan merged 14 commits into
mainfrom
garrytan/token-reduction-next-stage
Jun 8, 2026
Merged

v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation#1907
garrytan merged 14 commits into
mainfrom
garrytan/token-reduction-next-stage

Conversation

@garrytan

@garrytan garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Slims three more heavyweight skills and builds the safety net that makes carving safe to keep doing.

Carve-guard system (PR1 — guard machinery):

  • Canonical test/helpers/carve-guards.ts registry: one source of truth for which skills are carved + each skill's invariants, required reads, and skeleton-size floor. parity-harness.ts and skill-size-budget.ts now derive their carved-skill lists from it (closed a pre-existing plan-devex-review parity gap).
  • E2 data-driven static ordering test (gate, free), T2 data-driven behavioral section-loading test (periodic, GSTACK_CARVE_SKILL cost-scoping), E1 completeness meta-guard that fails CI if a carved skill lacks its guards, and ET1 guard-of-guards negative tests proving the guards actually fire.

Three new carves (PR2):

  • document-release (skeleton 59,256 → 45,797 B, −23%), design-consultation (80,719 → 59,229 B, −27%), cso (79,383 → 65,117 B, −18%). All content preserved in on-demand sections.
  • cso carved security-safe: mode dispatch, always-run phases, and false-positive-filtering exceptions stay always-loaded; an earliest-use invariant + a section scope-gate keep scoped modes (e.g. --owasp) from running unselected phases.

Test Coverage

Test + skill-template diff — no application code paths. The carve guards ARE the coverage for the template changes: 6 → 9 carved skills now fully guarded (static gate + behavioral periodic + completeness meta-guard + parity union floor + size-budget). Full deterministic suite green (3,373 pass); the only failure is the unrelated setup-emoji-font environmental flake (284s font-resolution hang, untouched by this branch).

Pre-Landing Review

Plan-stage: CEO + Eng + Codex (×2) all CLEARED before implementation. Diff-stage adversarial codex pass found 3 issues:

  • [FIXED] cso --owasp could run non-OWASP phases ("execute in full" over a section bundling all phases) → added a scope-gate header to the section.
  • [FIXED] gateAfterStop check compared against the first STOP, not the last → now uses the last STOP.
  • [DEFERRED → TODOS] behavioral section-loading verifier can match the global-install path, not just the fixture — pre-existing in auq-sdk-capture.ts (affects existing ship/ceo section-loading tests too).

Plan Completion

All plan items DONE. PR1 (T1–T6: registry + refactor, E2, E1, ET1, T2, E3-TODO) and PR2 (T7–T9 carves + T10 CHANGELOG/VERSION) shipped. Plan: ~/.claude/plans/system-instruction-you-are-working-jaunty-goose.md.

TODOS

Added two deferred items: real-session section-read canary (E3), and behavioral-test hermeticity hardening (from the codex review).

Test plan

  • Carve guard suite (E1/E2/T5) + cso-preserved + redaction/taxonomy + parity + size-budget + section-manifest: 144 pass, 0 fail
  • Full deterministic suite: 3,373 pass (only unrelated setup-emoji-font flake fails)
  • Periodic behavioral section-loading evals (run separately — paid)

🤖 Generated with Claude Code

garrytan and others added 14 commits June 7, 2026 17:52
…om it

Single source of truth for the carved-skill set + per-skill invariants
(EQ1). parity-harness.ts sectioned entries and skill-size-budget.ts
SECTIONS_EXTRACTED now derive from it instead of hand-maintained lists.
Closes a pre-existing drift: plan-devex-review was in SECTIONS_EXTRACTED
but had no sectioned parity invariant; now generated. carve-guards.ts is
a pure leaf data module (import type only) to avoid an import cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
discoverCarvedSkills/checkOrdering/checkCompleteness take a root param so
the negative tests can point the real guards at a fixture dir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-PR backstop for every carved skill, one test() per skill, driven by
CARVE_GUARDS staticInvariants. Generalizes + retires the ceo-specific
ordering test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Asserts filesystem carved set == CARVE_GUARDS set both directions, so a
future carve without a registry entry fails CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Temp fixture broken 3 ways proves E1/E2 actually throw, via the injectable
root. Kills the silent-pass-guard failure class.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
One file iterating CARVE_GUARDS, one test() per skill with GSTACK_CARVE_SKILL
cost-scoping (D-CODEX A). external carves (ship, plan-ceo) keep bespoke
tests; testNames aligned to their touchfile keys. Registered in touchfiles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Steps 2-9 (per-file audit, auto-updates, risky-change asks, CHANGELOG
voice polish, cross-doc consistency, TODOS cleanup, VERSION bump, commit +
PR body) move to sections/release-body.md, read on demand after the Step
1.5 coverage map. Skeleton 59,256 -> 45,797 B (-23%); union preserved.
Adds the CARVE_GUARDS entry (auto-extends parity + size-budget via EQ1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phases 3-6 (complete proposal, drill-downs, design preview, writing
DESIGN.md) move to sections/proposal-and-preview.md, read on demand after
product context + research. Skeleton 80,719 -> 59,229 B (-27%); union
preserved. Adds the CARVE_GUARDS entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scope-dependent audit Phases 2-11 move to sections/audit-phases.md. Mode
dispatch (## Arguments, ## Mode Resolution), always-run Phases 0/1, and the
Phase 12 false-positive-filtering exceptions stay ALWAYS-LOADED in the
skeleton. Skeleton 79,383 -> 65,117 B (-18%); union preserved.

Adds a cso CARVE_GUARDS entry with an earliest-use invariant (mustPrecedeStop):
mode dispatch must appear before any STOP-Read, so a directive that decides
which sections to read can't be stranded behind the STOP that reads them
(codex outside-voice #6). carve-guard-checks gains the mustPrecedeStop check.
parity moves cso monolith -> generated carved entry. cso-preserved.test.ts
strengthened: phrases checked against the union, plus an always-loaded
contract on the skeleton (dispatch + FP-filtering, codex #5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lease carves

The cso carve moved Secrets Archaeology (prefixes, lib/redact-patterns.ts
pointer, git-history scan) into sections/audit-phases.md, and the
document-release carve moved the Step 9 PR-body redaction scan into
sections/release-body.md. Three content-presence tests asserted that content
in the skeleton SKILL.md/.md.tmpl; they now read the skeleton+sections union
(same fix as cso-preserved + parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- cso section: add a scope-gate header so '--owasp' (and other scoped modes)
  run only their selected phases, not every phase bundled in the section
  ('execute in full' no longer overrides Mode Resolution).
- carve-guard-checks: gateAfterStop now compares against the LAST STOP, not the
  first, so a gate stranded between two STOPs in a multi-STOP skeleton fails.
- TODOS: behavioral section-loading hermeticity (verifier matches global-install
  path, not the fixture) — pre-existing in auq-sdk-capture.ts, deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@trunk-io

trunk-io Bot commented Jun 8, 2026

Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

E2E Evals: ❌ FAIL

71/73 tests passed | $11.98 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 7/7 $0.39
e2e-deploy 6/6 $1.33
e2e-design 4/4 $0.88
e2e-plan 8/8 $2.67
e2e-qa-workflow 3/3 $1.12
e2e-review 6/6 $1.55
e2e-workflow 4/4 $0.83
llm-judge 25/27 $0.54
e2e-plan 8/8 $2.67

12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Failures

  • ❌ document-release/SKILL.md workflow: unknown
  • ❌ document-release/SKILL.md workflow: unknown

@garrytan garrytan merged commit e722c5b into main Jun 8, 2026
24 checks passed
garrytan added a commit that referenced this pull request Jun 8, 2026
…gistry

main shipped a generalized carve-guard system (PR #1907) that is now the single
source of truth for carved-skill skeleton invariants. Register the PR-title rule
there instead of a standalone test: ship's mustStayInSkeleton asserts v$NEW_VERSION
+ the rewrite helper stay always-loaded, and mustMoveToSection asserts both the
create and update PR paths stay carved into pr-body.md (present in the union, out of
the skeleton). Delete the standalone ship-pr-title-version-always-loaded test it
replaces. The CI-workflow safety tripwire stays standalone (not a carve concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request Jun 8, 2026
…tle-sync backstop (#1909)

* fix(ship): restore always-loaded PR-title-version invariant to skeleton

The v1.54.0.0 carve moved the 'PR title MUST start with v$NEW_VERSION' rule
out of the always-loaded ship skeleton and entirely into the lazily-loaded
pr-body.md section. The agent only set the version prefix if it happened to
read that section before creating the PR, so PRs landed with bare titles.

Restore a one-line invariant (+ helper reference) to ship/SKILL.md.tmpl right
before the {{SECTION:pr-body}} pointer, mirroring the AUQ always-loaded
precedent. Full procedure stays sectioned. Regenerated all hosts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): guard PR-title-version rule + pull_request_target safety

Two free gate tests so a future carve or workflow refactor can't silently
regress:

- ship-pr-title-version-always-loaded: asserts the invariant lives in the
  always-loaded ship/SKILL.md skeleton (not only sections/), and that the
  skeleton+sections union keeps BOTH the create and the existing-PR update
  title paths. Modeled on test/auq-format-always-loaded.test.ts.
- pr-title-sync-workflow-safety: static tripwire that fails CI if
  pr-title-sync.yml checks out PR-head code or inlines an attacker-controlled
  ${{ github.event.pull_request.* }} field inside a run: block (the two
  pull_request_target footguns actionlint cannot catch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(ci): pr-title-sync covers fork PRs via hardened pull_request_target

Under plain pull_request the GITHUB_TOKEN is read-only on fork PRs, so the
title-sync backstop could never edit a fork/agent PR title. Switch to
pull_request_target (write token in base context) and make it safe:

- Check out the base repo only (no ref:) — execute trusted infra, never
  fork-head code.
- All attacker-controlled PR fields (title, head repo, head sha) pass via
  env: and are referenced as shell-quoted "$VAR", never inlined into run:.
- Read the PR-head VERSION as data (raw media type) from the head repo at the
  head sha; guard the assignment under set -e.
- Same-repo read failure fails loudly; fork miss warns and skips (the backstop
  stays green without going silently optional).
- Never echo the raw fork title (Actions parses ::workflow-command:: from stdout).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(ship): expand binDir path in pr-body Linked Spec block

ship/sections/pr-body.md.tmpl:98-99 used ${ctx.paths.binDir}, but the
gen-skill-docs generator only resolves {{TOKEN}} syntax in .tmpl files — the
${...} JS-template-literal form is substituted only inside .ts resolver files.
So the token passed through literally into the generated pr-body.md, leaving the
agent with an unexpandable ${ctx.paths.binDir}/gstack-paths command in the
Linked Spec auto-detect block. Use the hardcoded helper path, consistent with
every other path reference in this section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(test): fold ship PR-title skeleton guard into carve-guard registry

main shipped a generalized carve-guard system (PR #1907) that is now the single
source of truth for carved-skill skeleton invariants. Register the PR-title rule
there instead of a standalone test: ship's mustStayInSkeleton asserts v$NEW_VERSION
+ the rewrite helper stay always-loaded, and mustMoveToSection asserts both the
create and update PR paths stay carved into pr-body.md (present in the union, out of
the skeleton). Delete the standalone ship-pr-title-version-always-loaded test it
replaces. The CI-workflow safety tripwire stays standalone (not a carve concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.3.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant