v1.57.3.0 fix(ship): always-loaded PR-title-version rule + fork-PR title-sync backstop#1909
Merged
Conversation
The v1.54.0.0 carve moved the 'PR title MUST start with v$NEW_VERSION' rule
out of the always-loaded ship skeleton and entirely into the lazily-loaded
pr-body.md section. The agent only set the version prefix if it happened to
read that section before creating the PR, so PRs landed with bare titles.
Restore a one-line invariant (+ helper reference) to ship/SKILL.md.tmpl right
before the {{SECTION:pr-body}} pointer, mirroring the AUQ always-loaded
precedent. Full procedure stays sectioned. Regenerated all hosts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two free gate tests so a future carve or workflow refactor can't silently
regress:
- ship-pr-title-version-always-loaded: asserts the invariant lives in the
always-loaded ship/SKILL.md skeleton (not only sections/), and that the
skeleton+sections union keeps BOTH the create and the existing-PR update
title paths. Modeled on test/auq-format-always-loaded.test.ts.
- pr-title-sync-workflow-safety: static tripwire that fails CI if
pr-title-sync.yml checks out PR-head code or inlines an attacker-controlled
${{ github.event.pull_request.* }} field inside a run: block (the two
pull_request_target footguns actionlint cannot catch).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Under plain pull_request the GITHUB_TOKEN is read-only on fork PRs, so the title-sync backstop could never edit a fork/agent PR title. Switch to pull_request_target (write token in base context) and make it safe: - Check out the base repo only (no ref:) — execute trusted infra, never fork-head code. - All attacker-controlled PR fields (title, head repo, head sha) pass via env: and are referenced as shell-quoted "$VAR", never inlined into run:. - Read the PR-head VERSION as data (raw media type) from the head repo at the head sha; guard the assignment under set -e. - Same-repo read failure fails loudly; fork miss warns and skips (the backstop stays green without going silently optional). - Never echo the raw fork title (Actions parses ::workflow-command:: from stdout). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ship/sections/pr-body.md.tmpl:98-99 used ${ctx.paths.binDir}, but the
gen-skill-docs generator only resolves {{TOKEN}} syntax in .tmpl files — the
${...} JS-template-literal form is substituted only inside .ts resolver files.
So the token passed through literally into the generated pr-body.md, leaving the
agent with an unexpandable ${ctx.paths.binDir}/gstack-paths command in the
Linked Spec auto-detect block. Use the hardcoded helper path, consistent with
every other path reference in this section.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gistry main shipped a generalized carve-guard system (PR #1907) that is now the single source of truth for carved-skill skeleton invariants. Register the PR-title rule there instead of a standalone test: ship's mustStayInSkeleton asserts v$NEW_VERSION + the rewrite helper stay always-loaded, and mustMoveToSection asserts both the create and update PR paths stay carved into pr-body.md (present in the union, out of the skeleton). Delete the standalone ship-pr-title-version-always-loaded test it replaces. The CI-workflow safety tripwire stays standalone (not a carve concern). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
E2E Evals: ✅ PASS9/9 tests passed | $.94 total cost | 12 parallel runners
12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
…sion-fix # Conflicts: # CHANGELOG.md # VERSION # package.json
garrytan
added a commit
that referenced
this pull request
Jun 8, 2026
…rule) #1909 added the always-loaded PR/MR-title-version invariant to the ship skill and regenerated ship/SKILL.md, but did not regenerate the golden fixtures, so the golden-file regression test was red on main. Refresh the three ship baselines (claude/codex/factory) from the current generated output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3 tasks
garrytan
added a commit
that referenced
this pull request
Jun 9, 2026
…ions (#1916) * fix(plan-devex-review): add missing gstack-review-log step plan-devex-review carried the EXIT PLAN MODE GATE but never wrote a review-log entry, so the gate's 'review log was called' check was structurally unsatisfiable and the Review Readiness Dashboard / GSTACK REVIEW REPORT had no plan-devex-review data to read. Add a Review Log section before the dashboard read, logging the devex fields the report parser already expects (status, scores, product_type, tthw, persona, competitive_tier, unresolved, commit). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(review): make unresolved-decisions status mandatory in GSTACK REVIEW REPORT The report's UNRESOLVED line was optional ('omit if empty') and the EXIT PLAN MODE GATE only checked it 'if applicable', so a plan could ship with no statement about open decisions at all — a missed ambiguity read identically to a clean plan. Now every report ends with a mandatory unresolved-decisions status as its final line: either the exact unbolded sentinel 'NO UNRESOLVED DECISIONS', or a '**UNRESOLVED DECISIONS:**' block of bullets. The gate blocks ExitPlanMode unless that final line is present. generatePlanFileReviewReport: current-review items are listed from context; prior reviews contribute an aggregate count computed as latest-fresh-row- per-skill minus the current run (no double-count, dashboard 7-day window). generateExitPlanModeGate: check #3 is now blocking with no 'if applicable' escape; bolded sentinel does not satisfy it. Tests: static guard in gen-skill-docs.test.ts asserts the mandatory status across all six report consumers and the gate across gate-bearing skills; skill-e2e-plan.test.ts asserts the written report's final line is the status (and fixes a stale 'four review rows' -> five-row prompt). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(review): compress unresolved-status prose to fit parity budget After merging origin/main (v1.57.3.0), plan-devex-review exceeded the 1.05x parity ratio vs the v1.53.0.0 baseline. Rather than rebase the baseline, compressed the new prose to stay under the cap honestly: the report's unresolved-status block (~32 -> ~9 lines) and the EXIT PLAN MODE GATE's final-line check (~7 -> ~5 lines), plus the plan-devex-review review-log step. All load-bearing rules and the exact gate-checkable tokens are preserved; the static guards in gen-skill-docs.test.ts still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: regenerate stale ship golden fixtures (#1909 follow-up) #1909 (v1.57.3.0) added the always-loaded PR-title-version rule to ship's template and committed the regenerated ship/SKILL.md, but did not refresh the three ship golden fixtures, leaving the golden-file regression test red on main. Regenerate them from current output. The diff is purely #1909 content: the PR-title invariant line plus a previously-unresolved ${ctx.paths.binDir} placeholder that current generation correctly resolves. No feature content from this branch leaks into ship (ship does not consume the review report resolvers). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plan-devex-review): restore TIMESTAMP fill instruction in review-log Adversarial review caught that compressing the devex review-log block dropped the TIMESTAMP substitution guidance the three sibling plan-review skills carry. A literal "timestamp":"TIMESTAMP" parses as JSON but is an unparseable date, so the Review Readiness Dashboard's 7-day freshness window silently drops the plan-devex-review row (and the report's prior-review aggregation loses it). Restore the one-line instruction. Also: the plan-review-report E2E now derives its last-line check from the report slice, not the whole file, so a mis-placed report surfaces the real trailing content in the failure message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): rebase parity baseline v1.53.0.0 -> v1.57.7.0 The v1.53 anchor is four minor versions stale. v1.54-v1.57 (ship/plan carving, carve-guards, AUQ prose fallback, the cross-session decision-log preamble) plus this branch's mandatory unresolved-decisions status line pushed the three plan-review skills past the 5% ratchet even after exhaustive compression. The new baseline captures current UNION sizes (skeleton + sections/*.md, matching what parity-harness measures) so the per-skill 1.05 ratio keeps catching future bloat. The frozen v1.44.1 integrity anchor and the v1.47 size-budget baseline are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.57.7.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores a regression and hardens the title-sync backstop. The v1.54.0.0 carve moved the "PR title must start with
v$NEW_VERSION" rule out of /ship's always-loaded skeleton into the lazily-loadedpr-body.mdsection, so the agent only applied it when it happened to read that section — recent open PRs landed with bare titles.Ship skeleton (agent-side fix)
fix(ship): restore always-loaded PR-title-version invariant to skeleton— adds a one-line invariant + helper reference toship/SKILL.md.tmplright before the{{SECTION:pr-body}}pointer (mirrors the AUQ always-loaded precedent); full procedure stays sectioned. Regenerated all hosts.fix(ship): expand binDir path in pr-body Linked Spec block—${ctx.paths.binDir}is only substituted in.tsresolvers, not.tmplfiles, so it shipped literally into the generatedpr-body.mdLinked Spec block. Now uses the hardcoded helper path.CI backstop (fork coverage)
fix(ci): pr-title-sync covers fork PRs via hardened pull_request_target— plainpull_requestgives a read-only token on fork PRs, so the title-sync workflow could never edit a fork/agent PR. Switched topull_request_target(write token in base context) and made it safe: base-repo checkout only (no PR-head code execution), all PR fields viaenv:(never inlined intorun:), VERSION read as data from the PR head, same-repo-loud / fork-soft on failure, no raw-title echo.Guards (keep it from drifting again)
test(ship): guard PR-title-version rule + pull_request_target safety— addstest/pr-title-sync-workflow-safety.test.ts, a static tripwire that fails CI if the workflow checks out PR-head code or inlines an attacker-controlled${{ github.event.pull_request.* }}field into arun:block (the injection class actionlint can't catch).refactor(test): fold ship PR-title skeleton guard into carve-guard registry— main shipped a generalized carve-guard system (v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation #1907) that is now the single source of truth for carved-skill skeleton invariants. Registered the rule there (mustStayInSkeletonassertsv$NEW_VERSION+ the helper stay always-loaded;mustMoveToSectionasserts both the create and update PR paths stay carved into the section) and deleted the now-redundant standalone test.(The merge commit pulls in main's v1.57.0.0 carve-guard system; it is not part of this PR's substantive change.)
Test Coverage
All new behavior is covered by deterministic free tests:
test/pr-title-sync-workflow-safety.test.ts(new) — 5 assertions: no PR-head checkout, no${{ pull_request.* }}inrun:, usespull_request_target, title viaenv:.test/helpers/carve-guards.tsship entry —mustStayInSkeleton+mustMoveToSectionnow assert the PR-title invariant via the carve-guard ordering test (test/carve-section-ordering.test.ts).test/pr-title-rewrite.test.ts(9 cases).The only inherent gap is
pull_request_targetruntime behavior, which is not locally unit-testable; covered byactionlint+ the static safety tripwire + a live fork-PR smoke after merge.Pre-Landing Review
/plan-eng-review): CLEAR — 5 findings, 0 critical gaps, all folded in (including a prior-learning catch that the original commit split would fail CIcheck-freshness; corrected to atomic tmpl+regen)./codex review, outside voice): PASS (no P1) — 8 findings, all folded in. Codex caught 4 things eng-review missed:gen:skill-docsneeds--host all, the workflow needs a mechanical injection guard, therefs/pull/N/headread was an unsafe assumption (now uses head-repo metadata), and a path-token issue. One Codex finding (browse skill: default to sonnet to save tokens #8, use${ctx.paths.binDir}in the skeleton) was REVERSED after verifying that token doesn't substitute in.tmplfiles.Eval Results
Diff-selected suite (
bun run test:evals), full tier: routing 10/10 PASS, $2.39, 563s (includesjourney-ship). Free gate (bun test) green.Two
ios-qaagent-flow simulation tests failed under full-suite concurrency (tailnet allowlist gate,capability-tier enforcement). Proven unrelated to this branch (blame protocol):test/skill-e2e-ios.test.tsonorigin/mainin isolation: 7 pass, 0 fail.test:evalsload with(attempt 3)retry markers — timing/resource flakiness, not a deterministic regression. Pre-existing test-infra flakiness, not introduced here.Plan Completion
Plan implemented in full (skeleton invariant, two guard tests, CI hardening, binDir fix, carve-guard integration). All eng + codex findings folded in. 0 unresolved.
Test plan
bun test(free gate) — passedbun testtargeted: carve-guard 20/0, workflow-safety 8/0, 786 skill-doc testsbun run test:evalsrouting 10/10; ios-qa flakiness proven pre-existingactionlint .github/workflows/pr-title-sync.yml— cleangstack-pr-title-rewrite.sh3-case sanity🤖 Generated with Claude Code