Skip to content

feat(plan-rollout): /plan-rollout MVP — decomposition-as-artifact#1424

Open
mastermanas805 wants to merge 12 commits into
garrytan:mainfrom
mastermanas805:feat/plan-rollout-mvp
Open

feat(plan-rollout): /plan-rollout MVP — decomposition-as-artifact#1424
mastermanas805 wants to merge 12 commits into
garrytan:mainfrom
mastermanas805:feat/plan-rollout-mvp

Conversation

@mastermanas805
Copy link
Copy Markdown

@mastermanas805 mastermanas805 commented May 11, 2026

Third pass at #1192. Foundation-only doc-PR (#1417) was correctly closed as "literature without a consumer." This PR lands the consumer.

Summary

  • New skill /plan-rollout (plan-rollout/SKILL.md.tmpl, 179 lines). Reads the working diff (committed + staged + unstaged + untracked) plus SYSTEM.md if present, writes decomposition.md with per-slice file lists, reader-time estimates, dependency edges, and reconciliation flags.
  • Optional schema spec at docs/SYSTEM_MD.md (129 lines). Optional input — the skill falls back to path heuristics + import-graph discovery when absent.

Size discipline

After the compression pass (latest 2 commits):

Substantive lines Raw diff
#1192 (rejected) 2164 2164
#1424 (this PR) 310 1206
Reduction −86% (7x smaller) −44%

The 896-line gap between substantive and raw is the generated plan-rollout/SKILL.md — deterministic output of the template, reviewer skim.

Per OSS PR-size research (SmartBear/Cisco, Google internal): review effectiveness drops sharply beyond 400 changed lines. Substantive content here (310 lines) is well inside the healthy band.

Dogfood: PR #1241"is this one PR or three?"

Manually walked /plan-rollout's logic against garrytan/gstack#1241 (fix(ask-user): keep question payloads compact, 41 files, +661 / −282).

Verdict the skill should emit: one PR. 39 of 41 files are deterministic regenerations of one source change in scripts/resolvers/preamble/generate-ask-user-format.ts. Not independently shippable — splitting them off leaves dependent fragments. Reader time: ~21 min, under the 30-min cap.

Bucketing (no SYSTEM.md): */SKILL.md regenerations (36 files, +576/−252) · scripts/resolvers/preamble/ (1 file, the fix, +16/−7) · test/fixtures/golden/ regenerations (3, +54/−27) · test/ tests (2, +31/−3).

v1.1 backlog surfaced by this dogfood (none required for v0):

  1. Regeneration detector. Path heuristics can't tell */SKILL.md is mechanical output of */SKILL.md.tmpl. Without the detector a naive operator could still ship "source + regenerations" as two PRs.
  2. Regen-multiplier on reader-time. Lines in regenerated files should count ~0.1x (skim speed), not 1x.
  3. --explain mode. When the verdict is "one PR," print the rejected slicing alternatives and the signal that rejected each.
  4. Calibration loop. Log predicted vs actual reader-time on the first ~10 real invocations. Ground v2 in data, not guesses.

Honest limit surfaced: SYSTEM.md is not the right primitive for build-output coupling — that's a Makefile-style dependency, not a contract graph. The schema doc says so.

Self-dogfood

Ran /plan-rollout against this branch. Output: a 50-line "one PR" verdict written to ~/.gstack/projects/. The skill correctly did not manufacture a multi-slice stack for a tightly-coupled set of files.

Explicit MVP boundaries

Out of scope (deferred — will only land with consumers):

  • rollout.md (rollout/rollback strategy + inverse-rollback auto-gen)
  • spill-check skill (in-progress diff vs declared slice)
  • /ship and /review integrations
  • SYSTEM.md scaffolder

Reverting v0 is git rm -r plan-rollout/ docs/SYSTEM_MD.md. Schema and registry entries revert independently if either piece doesn't fit.

Relationship to other plan-* skills

When Mode Output
/plan-ceo-review / /plan-eng-review Pre-code Conversation Decisions on plan
/plan-rollout (this PR) Post-code Analysis decomposition.md artifact

/plan-rollout complements the plan-* family without duplicating them — they review the plan, this analyzes the diff.

Test plan

  • bun test test/skill-validation.test.ts test/gen-skill-docs.test.ts — 704/704 pass
  • bun run gen:skill-docs --host all generates clean for claude + 7 external hosts
  • Dogfood: walked the skill against fix(ask-user): keep question payloads compact #1241 manually (see above)
  • Self-dogfood: ran /plan-rollout against this branch, verdict correctly "one PR"
  • Maintainer eyeball on skill voice + scope
  • If accepted: follow-up PR for the four v1.1 items above

Pre-existing test failures in test/*gbrain-sync* are present on main (verified by stashing the change and re-running on main).

Commits (bisectable)

10 commits in three logical waves:

  1. Build (4): schema, skill, registry, dogfood report
  2. Anti-pattern fixes (3): drop external skill cross-references, SCREAMING_SNAKE rename, V0 design-doc shape
  3. Compression (3): size discipline pass — SYSTEM_MD.md, design doc, SKILL.md.tmpl
  4. Final cuts (2): drop V0 design doc (moved here), further SKILL.md.tmpl compression

Refs #1192. Supersedes #1417.

🤖 Generated with Claude Code

mastermanas805 and others added 7 commits May 11, 2026 09:58
The semantic-contract-graph schema. Optional input to /plan-rollout —
declares role-level contracts (auth mints session tokens middleware
enforces; breaks-if format change without coordinated deploy). Distinct
from the import graph (discovered at runtime). Repo-wide, long-lived,
hand-authored.

This commit lands the spec only. The consuming skill ships in the next
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposition-as-artifact. Reads the working diff (committed + staged +
unstaged + untracked) plus SYSTEM.md if present, writes decomposition.md
with per-slice file lists, reader-time estimates, dependency edges, and
contract-graph reconciliation flags.

Positioned as the post-decision consumer to /plan-pull-request:
- /plan-pull-request decides shape in conversation (pre-code).
- /plan-rollout analyzes a real diff and writes the artifact (post-code).

Triggers narrowed to "decompose the diff" / "write a decomposition" /
"plan-rollout" to avoid collision with /plan-pull-request's pre-decision
triggers.

MVP boundaries (explicit):
- No rollout.md, no /spill-check, no /ship-/review integrations.
- No SYSTEM.md scaffolder — humans write the schema by hand or copy the
  example.
- Reconciliation is informational, never blocking.
- Step 2 explicitly handles uncommitted working-tree state via
  `git diff <base>` (not `<base>...HEAD`) plus
  `git ls-files --others --exclude-standard` for untracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s.md

Add the one-line registry entries the skill-validation tests
("every skill is documented") expect. Positions /plan-rollout in the
plan-mode review group (alongside /plan-eng-review, /plan-tune) with
its specialist label "Decomposition Analyst" in docs/skills.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worked example: manually walked /plan-rollout against
garrytan#1241 (fix(ask-user): keep question payloads compact)
— 41 files, +661 / -282, regeneration-heavy.

Verdict the skill should emit: "one PR." 39 of 41 files are
deterministic regenerations of one source change in
scripts/resolvers/preamble/generate-ask-user-format.ts; slicing them
off would leave dependent fragments. Reader time: ~21 min (under cap).

Findings (v1.1 todos) surfaced by the dogfood:
1. Deterministic-regeneration detector — merge build-output slices
   into their source slice automatically.
2. Regen-multiplier on reader-time so skim-only output isn't
   over-counted.
3. --explain mode — when verdict is "one PR," print the rejected
   slicing alternatives and the signals that rejected them.
4. Calibration loop — predicted vs actual reader-time on first ~10
   real invocations to ground v2 heuristics in data.

SYSTEM.md is explicitly called out as the WRONG primitive for catching
build-output coupling — that needs a Makefile-style dependency, not a
contract graph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
/plan-pull-request is not a gstack-shipped skill; it's a separately
installed global skill. Referencing it as a sibling in our SKILL.md
and docs/skills.md created a dangling dependency from the maintainer's
perspective. The skill stands on its own: it reads a real diff and
writes decomposition.md. Whatever pre-decision workflow the user runs
beforehand is the user's setup, not this skill's documented contract.

- Stripped "Relationship to /plan-pull-request" section from
  plan-rollout/SKILL.md.tmpl
- Removed the "If invoked before code exists, point at /plan-pull-request"
  redirect — now a simple "nothing to decompose, write a slice first" exit
- Reworded the docs/skills.md table row to describe what the skill does
  on its own, no external pairing claims
- Tightened the description frontmatter accordingly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the SCREAMING_SNAKE_CASE convention used by other topic docs in
docs/ (ADDING_A_HOST.md, OPENCLAW.md, REMOTE_BROWSER_ACCESS.md,
ON_THE_LOC_CONTROVERSY.md). The hyphenated form was carried over from
closed PR garrytan#1417 and matched no existing convention in this directory.

No content changes — pure rename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the versioned design-doc convention in docs/designs/
(PLAN_TUNING_V0.md, PLAN_TUNING_V1.md, SELF_LEARNING_V0.md,
PACING_UPDATES_V0.md). The original PLAN_ROLLOUT_DOGFOOD.md filename
introduced a new "_DOGFOOD" suffix that didn't match any existing
pattern and read like an evidence appendix rather than a design doc.

Restructure:
- New "Design" section at the top describing what /plan-rollout is,
  what v0 ships, and what's deferred to v1.1+
- "Dogfood: PR garrytan#1241" section retains the worked example (file
  breakdown, reader-time estimate, verdict, findings)
- New "v1.1 roadmap" section consolidates the four follow-up todos

All original dogfood content preserved verbatim under its new section
heading.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 and others added 5 commits May 11, 2026 11:33
Trim ceremonial prose. Keep load-bearing content: intro, schema, field
reference (now a table), example, "how /plan-rollout uses it", out-of-scope.
Drop redundant "what it is / what it isn't" expansion, "relationship to
other declarative files" table, separate scaffolding section.

No semantic change to the schema or the skill's contract with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop "What v0 got right" (self-congratulation), "What v0 proves"
(redundant), and the long prose framing on each finding. Collapse
findings into one-sentence-each v1.1 backlog.

All concrete content preserved: problem, design, what ships, dogfood
table, verdict, four findings, limit-surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten step prose. All 8 steps + self-check + limits preserved
semantically. Behavior unchanged — same bash commands, same priority
order in slice ranking, same verdict-first design.

Combined with the docs/ compressions, total substantive diff drops
701 → 414 lines (-41%).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tree

Per OSS convention, design rationale and dogfood evidence live in the
PR description, not as checked-in cruft. The V0.md content has been
folded into PR garrytan#1424's description, where reviewers actually look. 81
lines off the reviewable diff.

If long-form design rationale is wanted in-repo later, it can land as
a follow-up — but only if a real consumer (a feature that depends on
the rationale) ships at the same time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply Anthropic skill-authoring guidance from their public docs:
"every line is a recurring token cost; if a competent reader wouldn't
miss it, remove it." State what to do, drop the why/narration. Trust
that the reader is Claude.

Cuts:
- "What this skill does/doesn't do" prose framing (replaced by terse
  bullets)
- Per-step rationale paragraphs ("These edges are how you order slices
  because...") → kept the rule, dropped the explanation
- Repeated "no slicing" hedging across multiple sections → one source
  of truth in the When-to-invoke section

Behavior unchanged. Generated SKILL.md drops 1011 → 897 lines (~10%
fewer tokens at every skill invocation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805
Copy link
Copy Markdown
Author

@garrytan I have recreated the PR with asked changes in #1192 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant