docs(workflow): STACKED_PRS.md + session cost log + 2026-04-18 post-mortem#112
Conversation
Tier 3.1 + 3.3 alignment from the 2026-04-18 autonomous session. docs/STACKED_PRS.md (new) - Strategic-view companion to the existing GIT-WORKFLOW.md Graphite tutorial. Covers WHEN to stack, manual retarget fallback when gt isn't available, and the concrete 2026-04-18 incident post-mortem (PRs #64/#65 on mojwang.tech merged into feature branches instead of main because auto-delete-head-branches was off) - Documents repo hygiene defaults: - GitHub "Automatically delete head branches" ON - Vercel-Neon "Delete Neon database branch when Vercel preview is deleted" ON - Cross-referenced from GIT-WORKFLOW.md's Graphite section docs/CLAUDE_AGENTS.md — Session cost log format - Adds the `scripts/.session-cost.log` append-only format to the Cost Awareness section. Fields: date | topic | dispatches | models | outcome - Captures directional cost awareness for weekly-review pattern detection (research-heavy vs implementation, stacks-needing- Graphite-threshold, etc.) - Cross-references STACKED_PRS.md for the "manual stacking indicates missing tooling" pattern
There was a problem hiding this comment.
Pull request overview
Adds strategic documentation to complement the existing Graphite tutorial, capturing stacked-PR workflow guidance and recent incident learnings, and introduces a lightweight session cost log format for agentic workflow review.
Changes:
- Add
docs/STACKED_PRS.mdcovering when/why to stack PRs, manual fallback retargeting, repo hygiene defaults, and a 2026-04-18 incident post‑mortem. - Cross-reference the new stacked-PR doc from
docs/GIT-WORKFLOW.md. - Document an append-only session cost log line format in
docs/CLAUDE_AGENTS.md.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docs/STACKED_PRS.md | New strategic doc for stacked PRs, manual fallback guidance, and incident post‑mortem. |
| docs/GIT-WORKFLOW.md | Adds a “See also” link pointing readers to the new strategic stacked-PR doc. |
| docs/CLAUDE_AGENTS.md | Defines a session cost log line format and links it to stacked-PR/tooling gap patterns. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| gt auth | ||
| ``` | ||
|
|
||
| Bootstrap script (`setup-macos.sh`) adds this alongside `gh`, `delta`, `fzf`, etc. |
There was a problem hiding this comment.
setup-macos.sh doesn’t appear to be responsible for installing Homebrew packages (including Graphite); Graphite is installed from homebrew/Brewfile via scripts/install-packages.sh. Consider updating this line to point at the Brewfile/install-packages step (or setup.sh) so the install source is accurate.
| Bootstrap script (`setup-macos.sh`) adds this alongside `gh`, `delta`, `fzf`, etc. | |
| In this repo, Graphite is installed alongside `gh`, `delta`, `fzf`, and other Homebrew packages from `homebrew/Brewfile` via `scripts/install-packages.sh`. |
| If you're mid-stack without `gt`, **always retarget each downstream PR's base to `main` before the parent merges**. GitHub doesn't do this automatically unless "Auto-delete head branches" is on AND the parent's head gets deleted via the merge. | ||
|
|
||
| ```bash | ||
| gh pr edit <child-pr-number> --base main | ||
| ``` |
There was a problem hiding this comment.
The manual retarget guidance here is likely backwards: retargeting the child PR to main before the parent merges removes the stacked-diff benefit (the child diff will include the parent). The usual mitigation for the failure mode described is to retarget (or delete the parent branch to trigger GitHub retargeting) immediately after the parent merges, and before merging the child PR.
| # Submit the whole stack as coordinated PRs | ||
| gt stack submit |
There was a problem hiding this comment.
This doc uses gt branch create / gt stack submit, but docs/GIT-WORKFLOW.md’s Graphite section uses gt create and gt submit --stack (and calls out gt ss). To avoid confusing readers, it’d help to align on one command style or explicitly mention the equivalent commands/aliases.
| At session end, append a single line to `scripts/.session-cost.log` (append-only, gitignored, reviewed at weekly review): | ||
|
|
||
| ``` | ||
| YYYY-MM-DD | <session_topic> | <dispatches> | <models_used> | <outcome> | ||
| ``` |
There was a problem hiding this comment.
This says scripts/.session-cost.log is gitignored, but there’s no ignore entry for it in the repo’s .gitignore right now. Either add scripts/.session-cost.log to .gitignore (preferred if you want it append-only/local), or adjust the wording so it doesn’t imply it’s already ignored.
Roadmap P0.2 from the 2026-04-19 overnight research. Unblocks P0.3 (/grade-session retroactive grading) and P5.1 (meta-agent) — both need a real log with real rows. PR #112 already documented a 5-field cost-log format but never shipped a writer. This PR ships the writer and extends the format to 6 fields: adds `agent_sha` so later analysis can correlate outcomes with specific agent-prompt versions (P5.1 requirement). Format (pipe-separated): YYYY-MM-DDTHH:MM:SS | topic | dispatches | models | outcome | agent_sha Outcome enum: shipped | partial | reverted | blocked | plan-only | (empty) Empty = grade later via /grade-session (P0.3). Design decisions: - Slash command + shell script, NOT a Stop hook. Claude Code's Stop fires every turn, not session end, so a hook-driven autolog would misfire on partial turns. One explicit /log-session per session is cleaner and gives the user control over the topic/dispatches/models summary (fields no hook could auto-capture). - Timestamp instead of just date — multiple sessions per day are common. - Auto-captured fields: timestamp (`date`), agent_sha (`git log` on .claude/agents/). - --outcome sentinel: `--outcome ""` explicitly marks ungraded, skipping the prompt; missing --outcome prompts interactively. - Pipe chars in user fields get sanitized to `;` so a malformed topic doesn't break column parsing. - Log file is gitignored (personal state, not shared artifact). Files: - scripts/log-session.sh — new, executable, shellcheck clean - .claude/commands/log-session.md — new slash command wrapper - docs/CLAUDE_AGENTS.md — updated format doc (6 fields, broader enum) - .gitignore — adds scripts/.session-cost.log Verified: - shellcheck clean - Happy path: `--topic … --outcome shipped` writes a row - Empty outcome: `--outcome ""` writes a row with blank 5th field - Invalid outcome: `--outcome bogus` exits 2 with clear error - Pipe sanitization: topic containing `|` rewrites as `;` - Gitignored: log file doesn't appear in git status
## Context Roadmap **P0.2** from the [2026-04-19 overnight research implementation roadmap](https://github.com/mojwang/mojwang.tech/blob/main/../../../../../ai/workspace/claude/resources/agentic-research/2026-04-19-implementation-roadmap.md). Unblocks **P0.3** (\`/grade-session\` retroactive grading) and **P5.1** (meta-agent) — both need a real log with real rows. PR #112 shipped documentation for a 5-field cost-log format but never shipped a writer. This PR ships the writer and extends the format to 6 fields: adds \`agent_sha\` so later analysis can correlate outcomes with specific agent-prompt versions (P5.1 requirement). ## Format (pipe-separated) \`\`\` YYYY-MM-DDTHH:MM:SS | topic | dispatches | models | outcome | agent_sha \`\`\` Outcome enum: \`shipped | partial | reverted | blocked | plan-only | (empty)\`. Empty = "grade later," filled retroactively by \`/grade-session\` (forthcoming in P0.3). ## Design decisions - **Slash command + shell script, NOT a Stop hook.** Claude Code's \`Stop\` hook fires every turn, not session end, so a hook-driven autolog would misfire on partial turns. One explicit \`/log-session\` per session is cleaner and gives the user control over topic/dispatches/models (fields no hook could auto-capture). - **Timestamp instead of just date** — multiple sessions per day are common. - **Auto-captured fields:** timestamp (\`date\`), agent_sha (\`git log\` on \`.claude/agents/\`). - **\`--outcome\` sentinel:** \`--outcome \"\"\` explicitly marks ungraded, skipping the prompt; missing \`--outcome\` prompts interactively. - **Pipe-char sanitization:** user fields replace \`|\` with \`;\` so malformed input doesn't break column parsing. - **Log file gitignored** — personal state, not a shared artifact. ## Files - \`scripts/log-session.sh\` — new, executable, \`shellcheck\` clean, follows \`#!/usr/bin/env bash\` + \`set -euo pipefail\` convention - \`.claude/commands/log-session.md\` — new slash-command wrapper - \`docs/CLAUDE_AGENTS.md\` — updated format doc (6 fields, broader outcome enum, usage example) - \`.gitignore\` — adds \`scripts/.session-cost.log\` ## Usage \`\`\` /log-session --topic \"P0.2 session-logger infra\" --dispatches \"—\" --models \"sonnet\" --outcome \"shipped\" \`\`\` Missing flags prompt interactively. ## Verification - [x] \`shellcheck scripts/log-session.sh\` — clean - [x] Happy path: \`--topic … --outcome shipped\` writes a row - [x] Empty outcome: \`--outcome \"\"\` writes a row with blank 5th field - [x] Invalid outcome: \`--outcome bogus\` exits 2 with clear error - [x] Pipe sanitization: topic containing \`|\` is rewritten as \`;\` - [x] Gitignored: log file doesn't appear in \`git status\` ## Out of scope (deferred to P0.3) - \`/grade-session\` retroactive grading (walks ungraded rows) - \`weekly-review\` command update to surface ungraded sessions - SessionStart hook nudge (\"N ungraded sessions\") - Auto-capture of dispatches + models via transcript parsing (Claude Code doesn't expose transcripts to hooks)
Tier 3.1 + 3.3 from the next-session plan. Flows today's workflow learnings back into the canonical agentic infra.
Summary
`docs/STACKED_PRS.md` (new)
`docs/CLAUDE_AGENTS.md` — adds session-cost-log format
Test plan
Pre-existing findings (no changes this PR)