Skip to content

feat: entire review command#993

Open
peyton-alt wants to merge 22 commits intomainfrom
feat/entire-review
Open

feat: entire review command#993
peyton-alt wants to merge 22 commits intomainfrom
feat/entire-review

Conversation

@peyton-alt
Copy link
Copy Markdown
Contributor

@peyton-alt peyton-alt commented Apr 21, 2026

Summary

Adds entire review — a command that runs configured review skills (e.g. /pr-review-toolkit:review-pr, /codex:adversarial-review) inside an agent session, tags that session as a review, and permanently attaches review metadata to the next checkpoint on commit. Tracking-first: no custom review prompts, just wraps existing skills so reviewers don't redundantly re-run them on the same commit.

Three workflows, one command:

  • Author self-review: entire review → agent runs skills → commit → next reviewer sees HasReview=true and can skip.
  • Designated reviewer: same flow, different seat.
  • One-shot feedback: entire review → stop after the review, no commit needed.

No new trailers, no finalize step, no empty commits. The review session rides the normal checkpoint condensation pipeline; kind: "agent_review" on CommittedMetadata and has_review: true on CheckpointSummary are the only new persisted data.

What's in this PR

Command surface (review.go)

  • entire review — load config, spawn configured agent, show scope summary
  • entire review --edit — interactive huh picker for per-agent skills (pre-checks saved choices, alphabetical agent order, [i/N] counter)
  • entire review --track-only — write the pending marker without spawning (for manual start)

Scoping

  • Reviewing feat/X vs main: 3 commits, 7 files changed, 2 uncommitted printed before launch (falls back gracefully on detached HEAD / clean default branch)
  • detectBaseBranch: origin/HEADorigin/mainorigin/master → local main/master
  • Re-run guard: reads HasReview from the checkpoint summary via ResolveCommittedReaderForCheckpoint (v1 + v2 aware), prompts before re-reviewing HEAD

Pending-marker handshake (lifecycle.go)

  • .git/entire-sessions/review-pending.json written by entire review, adopted by the spawned agent's first UserPromptSubmit hook, cleared on adoption
  • Worktree-scoped: a session in another worktree of the same repo won't race to claim the marker
  • Deferred best-effort cleanup so an agent quitting before UserPromptSubmit doesn't leave a stale marker

Checkpoint metadata (checkpoint/, strategy/manual_commit_condensation.go)

  • WriteCommittedOptions.Kind + ReviewSkills + HasReview populated by condensation
  • CheckpointSummary.HasReview is an umbrella flag — future review kinds (e.g. manual review) should also set it, no new boolean needed
  • session.Kind.IsReview() is the single source of truth for the umbrella; checkpoint package uses opts.HasReview directly (no string coupling)

Agent launching (agent/agent.go, per-agent claude.go / codex.go / gemini.go)

  • New Launcher interface with LaunchCmd(ctx, prompt) (*exec.Cmd, error)
  • Subprocess-spawn implementations for Claude Code, Codex, Gemini CLI
  • agent.LauncherFor(name) returns the launcher or ok=false (falls back to --track-only with a manual instruction)

Settings (settings/settings.go)

  • EntireSettings.Review is map[string][]string (agent → skill list)
  • Load errors are surfaced to the user (malformed JSON no longer wipes the file when the picker runs)

Status (status.go)

  • Review · reviewed (checkpoint abc123…) line when HEAD's checkpoint has HasReview

Test plan

  • mise run check — fmt + lint + unit + integration + Vogon E2E canary, all green
  • Local golangci-lint run (CI-mode, no --fix): 0 issues
  • Manual smoke on ~/demo-repository: entire review → Claude opens with composed prompt → session tagged kind: "agent_review" with full review_skills list → worktree scoping verified (my own session in .worktrees/entire-review did NOT steal the marker)
  • entire review --track-only verified to write marker with worktree_path field populated
  • Real agent E2E exercising the full loop (marker → adoption → commit → HasReview=true on summary) — deferred to a follow-up

Review feedback addressed on this PR

The following findings from Cursor Bugbot / Copilot / multi-agent PR review pass were fixed during review iteration:

  • Review checkpoint lookup for v2 stores (was v1-only)
  • detectBaseBranch fallback chain (doc said one thing, code did another — twice, both fixed + pinned by test)
  • Pending marker could survive a clean agent exit without UserPromptSubmit — now has deferred cleanup
  • saveReviewConfig + runReview no longer wipe user settings on parse error
  • HasReview derivation no longer string-coupled to session.KindAgentReview across a package boundary
  • Context threaded through all marker helpers
  • headHasReviewCheckpoint logs each failure step at Debug so the re-run guard isn't silent

Follow-up (not blocking)

  • Integration/E2E exercising the full review-session → condensation → HasReview loop
  • runReview error branches (launcher-missing fallback, empty-config picker trigger) have limited coverage
  • Surface ReviewSkills in entire explain output
  • Docs for agents outside the curated list (claude-code, codex) — edit .entire/settings.json directly under review.<agent-name>

Copilot AI review requested due to automatic review settings April 21, 2026 03:28
Comment thread cmd/entire/cli/review.go
Comment thread cmd/entire/cli/trailers/trailers.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new entire review workflow to run team-configured review skills, track the review lifecycle in session state, and mark completion on the current branch via a trailer-rich empty “Review” commit. This integrates with agent hooks (pending-marker adoption + Stop-hook fallback) and extends supported agents with a subprocess launch contract.

Changes:

  • Introduces review commit trailers (Entire-Review-*) plus parse/append helpers and unit tests.
  • Adds entire review command (config picker, pending-marker handshake, finalize flows) and surfaces latest review in entire status.
  • Extends session state and agent implementations to support review lifecycle + subprocess launching; updates setup scaffolding and docs.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
cmd/entire/cli/trailers/trailers.go Adds review trailer keys/metadata + append/parse helpers.
cmd/entire/cli/trailers/trailers_test.go Tests for review trailer formatting/parsing.
cmd/entire/cli/status.go Displays most recent review info in entire status.
cmd/entire/cli/setup_subagents.go Scaffolds /entire-review:finish skill templates for supported agents.
cmd/entire/cli/setup_subagents_test.go Validates finish-skill scaffolding creates expected files.
cmd/entire/cli/setup.go Hooks finish-skill scaffolding into enable/setup flow.
cmd/entire/cli/settings/settings.go Adds EntireSettings.Review + helper accessor.
cmd/entire/cli/settings/settings_test.go Tests settings JSON round-trip for review config.
cmd/entire/cli/session/state.go Adds session Kind, ReviewStatus, ReviewSkills, and exports GetGitCommonDir.
cmd/entire/cli/session/state_test.go Tests JSON round-trip for new session state fields.
cmd/entire/cli/root.go Registers the new review command.
cmd/entire/cli/review.go Implements entire review (marker, picker, launcher spawn, finalize, review commit, scanner).
cmd/entire/cli/review_test.go Unit tests for marker, command behavior, finalize flows, scanner, detached-HEAD guard.
cmd/entire/cli/lifecycle.go Adopts pending review marker on turn start; triggers fallback on unfinalized review at session end.
cmd/entire/cli/lifecycle_test.go Tests marker adoption + unfinalized-review detection/fallback wiring.
cmd/entire/cli/agent/agent.go Introduces agent.Launcher interface contract.
cmd/entire/cli/agent/registry.go Adds LauncherFor lookup helper.
cmd/entire/cli/agent/registry_test.go Tests LauncherFor behavior with a mock launcher agent.
cmd/entire/cli/agent/generate_external_test.go Ensures at least one agent implements Launcher.
cmd/entire/cli/agent/claudecode/claude.go Implements LaunchCmd for Claude Code.
cmd/entire/cli/agent/claudecode/claude_test.go Tests Claude launcher command shape.
cmd/entire/cli/agent/codex/codex.go Implements LaunchCmd for Codex.
cmd/entire/cli/agent/codex/codex_test.go Tests Codex launcher command shape.
cmd/entire/cli/agent/geminicli/gemini.go Implements LaunchCmd for Gemini CLI.
cmd/entire/cli/agent/geminicli/gemini_test.go Tests Gemini launcher command shape.
CLAUDE.md Documents entire review command, settings, session kind/status, and trailers.

Comment thread cmd/entire/cli/review.go Outdated
Comment thread cmd/entire/cli/review.go Outdated
Comment thread cmd/entire/cli/review.go Outdated
Comment thread cmd/entire/cli/trailers/trailers.go Outdated
Comment thread cmd/entire/cli/review.go
Comment thread cmd/entire/cli/review.go Outdated
Comment thread cmd/entire/cli/review.go Outdated
@peyton-alt peyton-alt changed the title feat: entire review — team-wide PR review command feat: entire review command Apr 21, 2026
Soph and others added 3 commits April 21, 2026 10:52
* Add Kind and ReviewSkills fields to checkpoint metadata

Wire session Kind and ReviewSkills through the condensation pipeline into
CommittedMetadata (per-session) and add HasReview to CheckpointSummary
for O(1) review detection. Both v1 and v2 stores are updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 55279dbd22e0

* Remove review trailers, ReviewStatus, and associated types

Review metadata now lives in checkpoint metadata (Kind/ReviewSkills on
CommittedMetadata), not in commit trailers. Drop all Entire-Review-*
trailer constants, ReviewMetadata struct, parse/append helpers, regex
patterns, and the ReviewStatus state machine from session.State.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 707c1b5e97b6

* Simplify review command to immutable checkpoint model

Replace the finalize lifecycle (--postreview, --finalize, fix/close/skip
TUI, empty review commits, stop-hook fallback) with a simpler model:
review is an immutable session attribute that flows through the existing
condensation pipeline into checkpoint metadata.

The re-run guard now reads CheckpointSummary.HasReview via the
Entire-Checkpoint trailer on HEAD — O(1) instead of unbounded history
walk. Status display uses the same path.

Removes ~800 lines of finalize flow, commit creation, history scanning,
and fallback TUI code plus their tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 52a75e6fb33a

* Remove /entire-review:finish skill scaffolding

The simplified review model no longer needs an in-session finish skill —
the session ends naturally and gets condensed on the next commit. Remove
all three agent templates (Claude/Codex/Gemini), the scaffold/write/report
functions, and the setup.go call site.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0a7e96bba740

* Update CLAUDE.md for checkpoint-based review architecture

Document the simplified review model: immutable checkpoint attachment,
Kind/ReviewSkills in CommittedMetadata, HasReview on CheckpointSummary,
and O(1) re-run guard. Also restore accidentally dropped nolint directive
on deprecated paths.CheckpointPath call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b2c068e52ed3

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, rename Kind

- Picker (`--edit`): iterate agents in deterministic alphabetical order, print
  an up-front header listing all agents, show a [i/N] counter + explicit
  "Agent: <name>" description in each form, and pre-check previously-saved
  skills via huh.Option.Selected(true). Users no longer lose selections on
  re-edit or get confused about which agent a form belongs to.

- Scope output: before spawning the agent, print a one-line summary like
  "Reviewing feat/X vs main: 3 commits, 7 files changed, 2 uncommitted"
  so the user can see what's about to be reviewed. Falls back gracefully
  for detached HEAD / unknown base / clean default branch.

- Worktree-scoped marker: add PendingReviewMarker.WorktreePath and make
  adoptPendingReviewMarkerInto skip claiming when the worktree doesn't
  match. Without this, a Claude session in any worktree of the same repo
  could race to claim a marker written by `entire review` in a different
  worktree (they share .git/entire-sessions/). Backwards compatible: a
  blank WorktreePath falls back to legacy unscoped adoption.

- Kind rename: session.KindReview -> session.KindAgentReview, wire value
  "review" -> "agent_review". Future review kinds (e.g. manual review) can
  be added as distinct Kind values without renaming. CheckpointSummary
  .HasReview stays as the umbrella "any review happened" flag so callers
  don't have to disjunction a growing list of type-specific booleans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 64a48508b9bf
@peyton-alt
Copy link
Copy Markdown
Contributor Author

@BugBot review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 2e320fd. Configure here.

Comment thread cmd/entire/cli/review.go Outdated
Comment thread cmd/entire/cli/review.go
peyton-alt and others added 5 commits April 21, 2026 14:53
…tection

- headHasReviewCheckpoint now resolves checkpoint summaries via
  ResolveCommittedReaderForCheckpoint so repos with CheckpointsVersion=2
  enabled also find HasReview. Previously it only consulted the v1 store,
  which silently returned false for v2 users (the re-run guard and
  `entire status` "reviewed" line both quietly failed).

- detectBaseBranch now actually checks refs/remotes/origin/<name> before
  falling back to refs/heads/<name>. Matches what the doc comment always
  claimed but only partially implemented: forks without origin/HEAD
  configured and without a local main branch were returning "" even when
  origin/main clearly existed.

- New regression test TestDetectBaseBranch_UsesOriginMainWhenNoLocalMain
  reproduces the fork scenario so future drift is caught.

Flagged by Cursor Bugbot on PR #993.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 21c5332394e1
…rivation

Covers issues #1-7 from the multi-agent PR review pass:

1-2. Settings-load errors no longer wipe user config. Both runReview and
     saveReviewConfig now distinguish "file missing" (settings.Load returns
     defaults) from "file malformed" (returns err). Malformed-file errors
     surface to the user with a friendly "fix .entire/settings.json" hint
     instead of silently opening the picker and overwriting unrelated
     settings with an empty EntireSettings{}.

3.  Pending marker is now cleaned up on every spawn-path failure via a
     deferred best-effort ClearPendingReviewMarker. Previously it only
     fired on execCmd.Run() non-zero exit, so a clean agent exit without
     any UserPromptSubmit (e.g. /quit) would leave the marker behind to
     mis-tag the next unrelated session.

4.  detectBaseBranch loop order now matches its docstring: all remote-
     tracking branches (origin/main, origin/master) are tried before any
     local branch. The doc-vs-code drift the comment-analyzer caught is
     fixed; a regression test pins the order.

5.  HasReview derivation no longer lives as a hardcoded string literal in
     the checkpoint package. session.Kind.IsReview() is the single source
     of truth; manual_commit_condensation sets WriteCommittedOptions
     .HasReview via that helper. Future review-kind Kind values just add
     to the IsReview disjunction and HasReview keeps covering them —
     previously a typo of "agent_review" in committed.go would have
     silently broken the feature.

6.  Context now threads through pendingMarkerPath, WritePendingReviewMarker,
     ReadPendingReviewMarker, ClearPendingReviewMarker. Previously they
     all used context.Background() internally, dropping cancellation and
     tracing from callers.

7.  headHasReviewCheckpoint now logs at Debug on each failure step
     (locate worktree root, read HEAD, parse trailer, open repo, resolve
     summary, HasReview false) so users debugging "why didn't the review
     badge show up?" have breadcrumbs instead of a single silent false.

Plus tests:
- TestKindIsReview pins the IsReview invariant and forces future review
  kinds to be added to the disjunction.
- TestSaveReviewConfig_ReturnsErrorOnMalformedSettings pins the data-loss
  regression fix.
- TestDetectBaseBranch_PrefersAllRemotesOverLocals pins the #4 fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 410590b32b74
CI lint flagged cmd/entire/cli/git_operations.go:131 for 3 "master"
occurrences in one file (plus 2 more across review.go and resume.go).
Previously-passing runs were cached; my PR hit the threshold.

Fix: add masterBaseBranch = "master" alongside defaultBaseBranch = "main"
in trail_cmd.go (where the constants already live) and route all 5
non-test call sites through the two constants.

Also tag the pre-existing //nolint:ireturn on buildSummaryGenerator with
nolintlint so golangci-lint --fix doesn't oscillate between "strip the
directive, now ireturn fires" and "restore the directive, now nolintlint
says unused."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 461a00d78058
Main branch commit b1e7691 expanded the ireturn allowlist to cover
github.com/entireio/cli/cmd/entire/cli/agent/.+, so LauncherFor's
return of the Launcher interface no longer triggers ireturn. With the
directive still in place, nolintlint flags it as unused. Removing it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 568e1f8dad04
@peyton-alt peyton-alt marked this pull request as ready for review April 21, 2026 23:08
@peyton-alt peyton-alt requested a review from a team as a code owner April 21, 2026 23:08
Soph and others added 3 commits April 22, 2026 23:43
* Preserve pending marker on non-launchable agent fallback

When entire review selected an agent whose registry entry has no
Launcher implementation (cursor, opencode, copilot-cli, factoryai-droid,
vogon), the command printed "falling back to --track-only" and
instructed the user to start the agent manually — but the cleanup defer
registered just above the LauncherFor check wiped the pending marker on
return. The manually-started agent's UserPromptSubmit hook then had
nothing to adopt, silently discarding the review.

Resolve the launcher before installing the defer so the !ok branch
mirrors the structure of the --track-only branch above it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1f0985c2ce15

* Gate entire review on hooks-installed for selected agent

runReview previously read the configured agent from settings and wrote
the pending marker without checking whether the agent's hooks were
actually installed. Two failure paths hit this:

  1. Stale config: user ran entire disable for an agent but left its
     entry in review settings. selectReviewAgent still picks it.
  2. Hand-edited settings.json: user added a review entry for an agent
     they never ran entire configure --agent for.

Both cases silently dropped the review — the marker was written, but no
UserPromptSubmit hook ever fired to adopt it. Check
GetAgentsWithHooksInstalled after agent selection and surface a
concrete "run entire configure --agent <name>" error instead.

Also update the picker's empty-installed-agents error text so it
points at the relevant command (entire configure --agent) for users
who have already entire enable'd the repo but disabled their agent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: d3dbdf6d996e

* Add --agent flag to entire review

selectReviewAgent always returned the alphabetically-first configured
agent, so a user with both {"claude-code": [...], "codex": [...]} in
.entire/settings.json could never reach codex without deleting the
claude-code entry. The first-run picker happily writes multiple agents,
so this footgun is reachable through the normal UX.

Add --agent NAME to select a specific configured agent. Default
behavior (no flag) remains alphabetically-first for backwards
compatibility. Unknown values surface a concrete error listing the
configured agents instead of silently falling back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 5e3f408cc54b

* Simplify hooks-gate and override lookups in entire review

Three code-review agents flagged the same two inefficiencies:

  - The hooks-gate built a map just to check a single membership against
    a slice that's typically 1-3 elements. Replace with slices.Contains.
  - selectReviewAgent's override handling did a linear scan of the
    sorted names slice when a direct map lookup against `review` would
    give O(1) membership + the skills value in one step.

Both sites now express intent more clearly and drop a handful of
allocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: f95d0b73bd6e

* Record the composed review prompt on session and checkpoint metadata

Add ReviewPrompt alongside the existing ReviewSkills list. ReviewPrompt
is the exact text the agent received — ground truth for what the review
actually asked. ReviewSkills remains as the structured, queryable form
populated from config on spawn.

The spawn path now composes the prompt once and stores it on the
pending marker as Prompt. Adoption writes both fields into session
state; condensation propagates both into CommittedMetadata. No user-
facing behavior change yet — the new field will be the primary source
for the post-hoc attach flow in a following commit, where skills aren't
known up front.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a15bd1461855

* Add review-attach surfaces for post-hoc review tagging

Running a review manually (outside 'entire review') previously had no
way to record it as a review — the session was just a regular session
with no Kind or ReviewSkills set. This adds two equivalent surfaces
that both funnel through runAttach:

  entire attach <session-id> --review [--skills "/a,/b"]
  entire review attach <session-id> [--skills "/a,/b"]

Both resolve skills from --skills if provided, otherwise from the
agent's configured review skills, and error with a pointer at
'entire review --edit' when neither is available. The attach uses the
transcript's first user prompt as ReviewPrompt — the ground truth for
"what the reviewer actually asked" when skills weren't declared up
front.

runAttach now takes an attachOptions struct so the force/review flags
don't bloat the positional parameter list. The new struct carries
ReviewSkills; ReviewPrompt is sourced from the transcript metadata.

Review-upgrading a session that already has a checkpoint is refused
with a concrete error — rewriting existing entire/checkpoints/v1
metadata is a larger change than this first cut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 6f62ebf70232

* Bind review marker adoption to agent name

adoptPendingReviewMarkerInto checked Kind and WorktreePath, but not
AgentName. Whichever agent's UserPromptSubmit hook fired first in the
worktree would claim the marker — a cursor session could silently steal
a marker written for claude-code, tagging itself as a review with the
wrong agent's skills.

The marker already records AgentName. Thread the session's agent name
through the call site (ag.Name() is already in scope) and refuse to
adopt when they don't match. The marker is left in place, not cleared,
so the correct-agent session can still claim it when its own hook fires.

Empty AgentName falls back to unscoped adoption for backwards compat
with pre-fix markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 300b4e1d8152

* Preserve uncurated review config entries on --edit

runReviewConfigPicker narrows the picker to agents that both have hooks
installed AND appear in curatedReviewSkills. saveReviewConfig then
replaced s.Review wholesale with the picker's output. Any entries the
user had added manually for external agents, uncurated agents, or
temporarily-uninstalled agents were silently deleted the first time
they ran `entire review --edit`.

Extract mergePickerResults: preserve existing entries for agents the
picker did not surface; offered agents are fully controlled by the
picker (set if they appear in selected, removed otherwise). Saving now
uses the merged map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 5e2acf066b8b

* Bind review marker adoption to starting commit SHA

PendingReviewMarker records StartingSHA at marker-write time, but
adoptPendingReviewMarkerInto never consulted it. The failure mode:

  1. User runs 'entire review --track-only' at SHA A, intending to
     review SHA A's code.
  2. User commits (HEAD moves to SHA B) without starting the review.
  3. Later, an unrelated agent session starts in the same worktree.
  4. The session's UserPromptSubmit hook claims the stale marker and
     tags an unrelated session as a review of code that already moved on.

On adoption, compare current HEAD to StartingSHA. Mismatched SHA
indicates the user's original intent no longer applies: clear the
stale marker (so it can't mis-tag future sessions either) and do not
adopt. Unlike worktree/agent mismatches — which leave the marker for
the correct session to claim — a SHA mismatch can never resolve, so
clearing is the safe default.

Failure to resolve HEAD is non-fatal; log at Debug and skip the check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: e0c5a4497f26

* Let --edit save when picker output is empty but external config remains

runReviewConfigPicker guarded against an empty save with
"no review skills selected" — but it checked the picker's output, not
the merged final config. That blocked a valid flow introduced alongside
mergePickerResults: a user with {claude-code, my-external} in settings
could not use --edit to deselect the claude-code entry while keeping
my-external, since the deselect made selected empty and the guard
fired before the merge preserved the external entry.

Move the guard onto `merged`: refuse to save only if the final config
would actually be empty (no picks and no pre-existing non-offered
entries). This makes mergePickerResults complete in the "manual
settings.json entry" case it was introduced to support.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 73d759a5d40e

* Simplify review_test.go: extract setup helper, fix merge assertion

Three review agents identified the same cleanups:

  - 5 tests repeated the same 6-line "temp repo + single commit + chdir"
    boilerplate. Extract setupReviewTestRepoWithCommit.
  - TestMergePickerResults' assertion loop only compared the first
    element of each skill slice, missing multi-element divergences.
    Replace with reflect.DeepEqual + single Errorf.
  - TestReviewMarker_RoundTrip's "sanity check" glob discarded its
    result via `_ = entries`, asserting nothing. Move the glob to
    before the clear so the marker file is actually present, and
    assert len > 0 so a path regression fails loudly.

Also promote "codex", "my-external", and "/external-skill" to package-
level constants, and use testCodexAgent in the override test that had
shadowed it with a local const.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 21d16660c5bf

* Resolve review-attach skills after agent detection

Both entry points (entire attach --review, entire review attach)
resolved review skills in the cobra RunE using the --agent flag value.
That flag defaults to DefaultAgentName (claude-code), so a Gemini
session attached without --agent failed upfront with "no review.
claude-code configured" — even though runAttach's auto-detection would
otherwise have correctly identified the session as Gemini from its
transcript.

Move skills resolution into runAttach, after resolveAgentAndTranscript
has returned the real agent. The attachOptions struct now exposes
Review (bool) + ReviewSkillsOverride ([]string); resolveReviewSkills
is called post-detection with ag.Name(). User-facing errors in the
review path are surfaced to stderr via a runAttachSurfaceReviewErrors
wrapper so the "no skills configured" message remains visible under
root's SilenceErrors=true.

Regression test TestAttachCmd_ReviewAutoDetectsAgentForSkills attaches
a Gemini session via --review without --agent and asserts the Gemini-
configured skills land in the session state, not claude-code's.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0a36cbb7b218

* Discover external agents in 'entire review attach'

'entire attach' calls external.DiscoverAndRegister so --agent
<external-name> is recognized and transcript auto-detection covers
external agents. The 'entire review attach' alias skipped this call,
so attaching an external agent session as a review failed with "agent
not available" even though the plain 'entire attach --review' path
would have worked.

Add the call to match 'entire attach' exactly — the two surfaces
should be behaviorally equivalent up to the implicit --review flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 773d46c02c44

* Add review integration tests

Entire-Checkpoint: bf36113ed444

* Allow review-attach without configured skills

Attach should not require `entire review --edit` to have been run.
The review is tagged via Kind + ReviewPrompt (the session's first user
prompt); the skills list is a queryable convenience, not the source of
truth for "was this a review." Blocking attach with "no review skills
configured" forced users to set up the spawn flow's config just to
record a post-hoc review — unrelated setup for an unrelated flow.

resolveReviewSkills now returns (nil, nil) when neither --skills nor
settings.Review[agent] have entries. runAttach records ReviewSkills as
empty (json omitempty) in that case, and the attach proceeds.

TestAttachCmd_ReviewWithoutSkillsOrConfigSucceeds replaces the
previous "must error" test: asserts the attach succeeds, Kind is
set to agent_review, ReviewPrompt carries the first user prompt, and
ReviewSkills is empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b27f87214f45

* Stop pulling attach review skills from spawn-path config

Attach was falling back to settings.Review[agent] when --skills wasn't
provided, silently recording skills the user may not have actually run
in that session. settings.Review is the spawn-path default — what the
user would run if they used `entire review` — not a claim about what
happened in a given manual session.

Only --skills is a user assertion about this session. resolveReviewSkills
now just returns opts.ReviewSkillsOverride verbatim; when the flag is
absent, ReviewSkills stays empty (which is fine — Kind + ReviewPrompt
already carry the review fact). TestAttachCmd_ReviewDoesNotInferSkills
FromConfig pins this as a regression guard.

Simplified TestAttachCmd_ReviewAutoDetectsAgent accordingly — it still
asserts auto-detect works for a Gemini session attached via --review,
just without the now-irrelevant skills-from-config assertion.

Help text updated to reflect that --skills is purely optional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 9cf19731539a

* Add first-run setup banner for entire review

`entire review` with no config silently dropped into the skills
picker. Users running the command to kick off a review misread the
picker as the review itself — the header just said "Configuring
review skills..." mid-flow, with no signposting that this was
one-time setup or that the review would still run afterward.

Frame the setup phase explicitly:

  1. Banner stating no config found, where it's saved, and that the
     review will run after setup.
  2. huh.Confirm gating the picker — user acknowledges before the
     picker takes over the terminal. Default is Yes, so hitting enter
     proceeds.
  3. "Setup complete — running review now." after the picker returns,
     to close the loop on what happens next.

Only fires when s.Review is empty; --edit and subsequent runs are
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 4be85e13e298

* Remove --track-only from entire review

--track-only existed to let users write a pending marker without
spawning the configured agent, typically because they wanted to start
the agent manually. With `entire review attach <session-id>` landing
as a post-hoc review-tagging surface, --track-only loses its reason to
exist:

  - Start the agent manually, then run `entire review attach <id>` to
    tag the session as a review.
  - Non-launchable agents (cursor, opencode, etc.) already fall back
    to "marker written; start the agent yourself" — that code path
    still exists, triggered by the LauncherFor !ok branch, not by a
    user-facing flag.

Drop the flag, the flag's RunE branch, and its unit test. Reword the
non-launchable-agent message so it no longer references a flag that
doesn't exist.

Integration tests that used `entire review --track-only` to write a
marker for adoption-pipeline testing now call a new TestEnv helper
(WritePendingReviewMarker) that writes the JSON directly. The tests'
subject is adoption behavior, not CLI wiring — removing the CLI
round-trip sharpens focus and removes the flag dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: dacb4641f5ec

* Allow custom review prompts per agent in the config/picker

The spawn path synthesized the review prompt from the skills list via
a hardcoded template ("Please run these review skills in order: ...").
Users couldn't include context ("focus on security this week") or use
a prompt that wasn't just a list of slash commands without editing
settings.json by hand.

Schema change: `review` in settings goes from map[string][]string to
map[string]ReviewConfig, where ReviewConfig carries both Skills and
Prompt. No back-compat unmarshaller — the feature is unreleased, so a
clean break is safe.

Compose precedence:
  - If cfg.Prompt is non-empty, it is used verbatim.
  - Otherwise cfg.Skills is composed into the default template.

Skills are always recorded on the checkpoint metadata regardless of
which path composed the prompt — they're the structured queryable tag
alongside ReviewPrompt (which is the ground truth).

Picker flow gains a second huh.Group per agent: a Text input for
"Additional instructions (optional)", seeded with any previously-saved
Prompt so users can tweak it in --edit. Empty input = no prompt
configured, skills-only behavior.

selectReviewAgent, mergePickerResults, saveReviewConfig, and
composeReviewPrompt all updated to carry ReviewConfig. Tests updated
accordingly; added TestComposeReviewPrompt_CustomPromptWinsOverSkills,
TestComposeReviewPrompt_EmptyConfigReturnsEmpty, a prompt-only case in
TestMergePickerResults, and TestReviewConfig_IsZero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 49f3045012fa

* Fix entire review attach overwriting existing checkpoint session

The existing-checkpoint guard in runAttach only checked
existingState.LastCheckpointID — the session state file's record of
which checkpoint it belongs to. If a session was already recorded in
the HEAD checkpoint but its state file didn't track that (file deleted,
never written, condensation path that didn't update LastCheckpointID),
the guard passed. resolveCheckpointID then returned the HEAD trailer's
existing checkpoint ID, and findSessionIndex — which dedups by
SessionID — matched the existing session and silently OVERWROTE its
metadata with review-flavored metadata.

Add a defense-in-depth check: when review-attaching to an existing
checkpoint, read that checkpoint's sessions by ID and refuse if the
target session is already recorded there. Matches the existing guard's
error shape ("rewriting an existing checkpoint as a review is not
supported yet") so users get a consistent message across both detection
paths.

TestAttach_ReviewWithExistingCheckpointErrorsEvenWithoutSessionState
pins the regression: first attach writes state + checkpoint, state file
is deleted to simulate the gap, second review-attach must error rather
than overwrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 2977971a30ce

* Refresh checkpoint metadata before attach, mirroring entire resume

The previous fix only guarded against local state. When HEAD's
Entire-Checkpoint trailer references a checkpoint whose data lives on
the remote (e.g., after pulling a co-worker's branch without updating
the local entire/checkpoints/v1 branch), two things go wrong:

  1. The conflict guard (session already in checkpoint) reads the
     stale local tree and misses remote-only sessions.
  2. findSessionIndex — which picks the next-available session index
     when appending — returns a too-low index based on the stale tree,
     colliding with a real remote-only session on push.

Both cases need the SAME fix: bring the local metadata branch up to
date before reading or writing. `entire resume` solved this in
getMetadataTree with a fallback chain (checkpoint-remote → treeless
fetch → local → full fetch → remote-tracking ref), and the helper
returns a freshly-opened repo handle so go-git's packfile cache sees
new objects.

Reuse getMetadataTree for its side effects whenever isExistingCheckpoint
is true. Factored the call into refreshForExistingCheckpoint to keep
runAttach under the maintidx threshold. Non-fatal on fetch failure —
proceed with local state and let the downstream guard catch what it
can; better than blocking attach on a transient network issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1047ada4f974

* Add SkillDiscoverer interface and skilldiscovery registry

Entire-Checkpoint: 2ede3c09af1b

* Implement SkillDiscoverer for Claude Code; stub Codex and Gemini

Entire-Checkpoint: 363a81678918

* Render review picker as built-in + discovered + install-hint sections

Entire-Checkpoint: b5a6a3541d11

* Verify configured review skills are installed before spawn

Entire-Checkpoint: d47afadbdc69

* Tighten review-skill discovery UX

- Match keywords on skill name only. Description matching pulled in
  unrelated skills whose descriptions happened to mention "review"
  or "inspect" in non-review contexts ("review the plan", "inspect
  recent sessions") and crowded the picker.
- Extend Claude Code discoverer to walk commands/*.md and agents/*.md
  in addition to skills/*/SKILL.md. Plugins like pr-review-toolkit
  ship their review tools as commands + agents, not skills, and were
  silently missed by a skills-only walker.
- Drop descriptions from picker labels. Agent descriptions contain
  pages of embedded usage examples and made the picker unreadable;
  skill names are self-explanatory.

Entire-Checkpoint: 07e9a4c71c8f

* Prompt for review agent when multiple are configured

Entire-Checkpoint: 10580fcd83a5

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Peyton Montei <peyton@entire.io>
Three issues in the picker/spawn flow, each independently broken:

1. Spawn path missed external agents. entire review calls
   GetAgentsWithHooksInstalled + agent.Get directly, but unlike
   `review attach` it never ran external.DiscoverAndRegister. A review
   config targeting an external agent would fail even though hooks and
   `attach --review` worked. Add the call to the spawn entry point.

2. Picker never actually preselected saved skills. The header promised
   "Previously-saved skills are pre-checked" but buildReviewPickerFields
   built every huh.Option without .Selected(true), and builtinPicks /
   discoveredPicks started nil. Accepting defaults in --edit would
   silently wipe the agent's saved skills — destructive regression.

   Fix: split the saved flat skill list into the two picker buckets via
   splitSavedPicks(saved, curated, discovered), pre-populate both as
   huh.MultiSelect Value bindings, and mark matching options
   Selected(true) in buildReviewPickerFields. preselectedSet turns the
   caller's output slice into a lookup set.

3. Per-agent header never rendered. The counter title used a type
   assertion `fields[0].(interface{ Title(string) huh.Field })` that
   never matches — huh's concrete Title methods return
   *MultiSelect[T] / *Text / *Note, not huh.Field. So the user saw
   only "Built-in commands" with no signal which agent they were
   configuring. Replace with a huh.NewNote() header (skip:true, so
   non-blocking) showing agent name + "Agent N of M" progress.

Tests: TestSplitSavedPicks covers the 5 partition cases; TestPreselectedSet
covers nil/empty/populated. External-discovery fix has no direct test
(the call isn't injectable) but mirrors newReviewAttachCmd's contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1977cb63d398
Keep the command invokable but drop it from `entire help` while the
feature is still maturing. Users can still run `entire review`, `entire
review --help`, `entire review --edit`, and `entire review attach <id>`;
the command just doesn't advertise itself in the top-level Available
Commands list.

Comment notes the flag should come off once we're ready to promote
the feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b7e0bfda13b2
Soph and others added 2 commits April 23, 2026 16:09
Three small safety/cleanup changes around the review-attach overwrite
investigation:

1. Tripwire in writeStandardCheckpointEntries: when the write lands at
   sessionIndex=0 AND entries already has a session-0 metadata.json
   with a DIFFERENT sessionID, log a loud WARN. That's the exact
   shape of the unreproduced production regression — a session was
   silently replaced with a different sessionID's data. Cheap: one
   map lookup + one blob read on that narrow case, no-op on all other
   writes. If the bug ever recurs we get a log trace pinpointing
   which checkpoint, which sessionID got overwritten, and whether
   existingSummary was nil at decision time.

2. defer logging.Close() in runAttach. Without it the 8KB buffered log
   writer never flushed on process exit, so any Warn/Info during
   attach (including the tripwire above) got silently dropped. Matches
   the pattern already used by resume/clean/reset/rewind/migrate/
   explain.

3. flatten findSessionIndex's inner if/continue layout — no behavior
   change, just readability after removing the temporary verbose debug
   logging that wrapped the blob read.

Also keeps TestAttach_ReviewAppendsAsAdditionalSessionWhenIDDiffers
(added while investigating): pins the happy-path append behavior so if
the overwrite shape ever regresses in the same unit-testable path, the
test fires before the tripwire does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: df1a8329a6d8
Scenario: a colleague creates a branch with commits that carry
Entire-Checkpoint trailers and pushes. You pull and check out the
branch — git pull doesn't bring down entire/checkpoints/v1 (it's a
separate orphan branch). You run `entire review attach <session>`.
Pre-fix: resolveCheckpointID reads the trailer and finds the ID, but
the local metadata branch doesn't have the data. WriteCommitted
silently creates a FRESH checkpoint with that ID and this session as
session 0. On push, your local (single-session) checkpoint overwrites
the colleague's multi-session original on the remote.

Two defenses required, both now in place:

1. Fetch entire/checkpoints/v1 before attach. Already done via
   refreshForExistingCheckpoint -> getMetadataTree, which runs the
   same fallback chain as `entire resume`: checkpoint-remote ->
   treeless -> local -> full -> remote-tracking.

2. If the checkpoint is STILL not present locally after refresh,
   refuse with a concrete error pointing at the fetch command. Pull
   this check into verifyCheckpointLocallyAvailable so runAttach
   doesn't balloon the maintidx lint.

Error message tells the user what happened and what to do:

  checkpoint Xxxxxx referenced by HEAD is missing from the local
  entire/checkpoints/v1 branch (the remote fetch didn't bring it in
  either). Creating a fresh checkpoint here would overwrite the
  original session data on push. Run:

      git fetch origin entire/checkpoints/v1:entire/checkpoints/v1

  then re-run attach. If the colleague who made this commit hasn't
  pushed their checkpoint metadata yet, ask them to do so first.

TestAttach_ReviewRefusesWhenCheckpointMissingFromLocalBranch pins
the behavior: amends HEAD with a trailer pointing at a never-created
checkpoint ID, runs review-attach, asserts the error fires and no
fresh checkpoint is written. Verified the test fails without the
guard (stashed out, ran, re-applied).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: fd021c1bcfad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants