subagent-driven-development: phantom completion fix + persistent validator pool (Team Mode)#578
subagent-driven-development: phantom completion fix + persistent validator pool (Team Mode)#578STRML wants to merge 23 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a two‑phase Persistent Validator Pool: Phase 1 enforces mandatory evidence and per‑requirement citations; Phase 2 introduces Team Mode with persistent validators, combined spec+quality reviews, cross‑task SHA maps, re‑review loops, rotation and recovery rules, and a fallback path if memory retention fails. Documentation and prompt templates updated. Changes
Sequence Diagram(s)sequenceDiagram
actor Implementer
participant Repo as Git/Repo
participant SDK as SDK Memory Test
participant Orchestrator as Orchestrator
participant ValidatorPool as Validator Pool
participant Validator as Validator Agent
Implementer->>Repo: Push changes + Mandatory Evidence (tests, diff stat, files)
Orchestrator->>SDK: Run memory validation
alt SDK passes (Team Mode)
Orchestrator->>ValidatorPool: Activate team (assign validators)
ValidatorPool->>Validator: Dispatch combined spec+quality review (BASE_SHA/HEAD_SHA, SHA map)
Validator->>Repo: Fetch diffs & verify file:line citations
Validator->>Orchestrator: Submit Report (Spec Compliance, Code Quality, Verdict)
alt Fix required
Orchestrator->>Implementer: Request fix (include validator citations)
Implementer->>Repo: Push fix
Orchestrator->>ValidatorPool: Re‑review (bounded cycles)
end
else SDK fails (fallback)
Orchestrator->>Validator: Dispatch combined single‑agent review (fallback path)
Validator->>Repo: Fetch diffs & verify claims
Validator->>Orchestrator: Submit Report
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (9)
skills/subagent-driven-development/validator-prompt.md (1)
5-5: Address static analysis hint: add language specifier.The fenced code block at line 5 (and continuing through the document) lacks a language specifier. While this is a documentation file with a template, consider adding a language hint for better rendering:
📝 Optional formatting improvement
-**Purpose:** Single-pass combined spec compliance and code quality review. Validators are persistent agents that retain context across tasks — use this to detect cross-task gaps by re-reading prior task diffs. +**Purpose:** Single-pass combined spec compliance and code quality review. Validators are persistent agents that retain context across tasks — use this to detect cross-task gaps by re-reading prior task diffs.Change the code block delimiter at line 13 from
```to```markdownor```textfor proper syntax highlighting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/validator-prompt.md` at line 5, The fenced code block uses a plain triple-backtick delimiter (```); update that delimiter to include a language specifier such as ```markdown or ```text so the block renders with proper syntax highlighting and clearer formatting (replace the existing ``` fence with ```markdown or ```text throughout the document).skills/subagent-driven-development/SKILL.md (1)
122-122: Address markdown static analysis hints (optional).Three fenced code blocks lack language specifiers (lines 122, 185, 327). While these are non-critical formatting issues, adding language hints (e.g.,
```text) would improve rendering in some Markdown viewers.Also applies to: 185-185, 327-327
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` at line 122, Three fenced code blocks in SKILL.md are missing language specifiers; update each triple-backtick block (the three occurrences of ``` with no language) to include an explicit language hint such as ```text (or ```bash/```json as appropriate for the content) so Markdown linters and renderers correctly highlight and render them.skills/subagent-driven-development/implementer-prompt.md (2)
89-89: Clarify who the "controller" is.The term "controller" appears without prior definition. Based on the PR context, this seems to refer to either the team lead (in team mode) or the main agent (in standard mode). Consider making this explicit: "rejected by the controller (team lead or main agent)" for clarity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/implementer-prompt.md` at line 89, The phrase "controller" in the sentence "A report submitted without this evidence section will be **rejected by the controller**..." is ambiguous; update that sentence in implementer-prompt.md to explicitly define "controller" by replacing or appending clarification such as "rejected by the controller (team lead in team mode, or the main agent in standard mode)" so reviewers/readers immediately understand who enforces rejection; locate the exact string "rejected by the controller" and change it to the clarified wording.
79-79: Consider more flexible git diff guidance.The instruction
git diff --stat <base-sha>assumes the implementer knows the base SHA. In practice, implementers might be working from various starting points (feature branch, multiple commits, etc.). Consider adding guidance like "or usegit diff --stat $(git merge-base main HEAD)to compare against the branch point" for cases where the base isn't obvious.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/implementer-prompt.md` at line 79, Update the "**2. Diff stat** — run `git diff --stat <base-sha>`" line to offer flexible alternatives when the base SHA isn't known: keep the original suggestion but add guidance to obtain the branch point via the merge-base of main and the current branch (so users can pass that as the base), and also mention using a simple fallback like comparing against the previous commit (HEAD~1); reference the "**2. Diff stat**" heading and the git diff --stat instruction to locate and update the text.docs/plans/2026-02-28-persistent-validator-pool-design.md (1)
95-95: Dependency inference limitation is clearly acknowledged.The design acknowledges that validators can only catch dependencies "where the task spec explicitly mentions prior tasks" and that "implicit dependencies that go unmentioned in the spec may not be caught." This is an important limitation. Consider adding a recommendation to the writing-plans skill to explicitly document all task dependencies in the task spec text itself.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/plans/2026-02-28-persistent-validator-pool-design.md` at line 95, Update the design note to add a concrete recommendation for the writing-plans skill: require authors to include an explicit "Dependencies" section in each task spec that lists prior task IDs, diffs, and any modules/schemas referenced; update references to the writing-plans validator logic (the "Infer dependencies" guidance) to point authors at this required section so implicit dependencies are not missed and validators can reliably re-read prior diffs when listed.docs/plans/2026-02-28-persistent-validator-pool-implementation.md (2)
4-4: SDK memory test already completed - Task 4 is documentation only.Line 4 states the SDK memory validation test was already run and RETAINED (tested 2026-02-28). This means:
- Task 4 (lines 200-246) serves as documentation of the test protocol, not an action item
- Task 4b (lines 248-368) should be skipped entirely
Consider adding a note at Task 4 (line 200) stating: "Note: This test was already completed (see header line 4). This task documents the test protocol for reference."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/plans/2026-02-28-persistent-validator-pool-implementation.md` at line 4, Add a short clarifying note at the start of Task 4 indicating the SDK memory validation was already completed (referencing the header line that shows "SDK memory test result: RETAINED (tested 2026-02-28)") and mark Task 4 as documentation of the protocol only; also remove or explicitly mark Task 4b as "skipped" or "not required" (update the Task 4 and Task 4b section headings/messages in the document to reflect these statuses so readers know Task 4 is reference-only and Task 4b should be skipped).
47-68: Address markdown static analysis hints for embedded code blocks.Multiple fenced code blocks in the plan lack language specifiers (lines 47, 109, and others per static analysis). While this is a documentation file, adding language hints would improve readability. However, given the volume of code blocks and the nature of this file as an implementation plan (not user-facing docs), this is a low-priority nitpick.
Also applies to: 109-124
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/plans/2026-02-28-persistent-validator-pool-implementation.md` around lines 47 - 68, Several fenced code blocks in the plan (the blocks labeled "Test output", "Diff stat", and the command/output examples under "Behavior demonstration"/"Files changed") lack language specifiers; update each triple-backtick fence to include an appropriate language hint (e.g., ```bash or ```sh for commands, ```text or ```console for pasted output, and ```diff for git diff output) so syntax highlighters and linters stop flagging them; locate the blocks by searching for the literal headings "Test output", "Diff stat", and "Behavior demonstration" in the plan and add the language tags to their opening ``` lines.skills/subagent-driven-development/spec-reviewer-prompt.md (2)
77-77: Redundant statement - consider consolidating.This line repeats the message already stated at lines 70-71 ("A ✅ approval without per-requirement citations is invalid"). While repetition for emphasis can be effective, having it immediately after the first statement (only 6 lines apart) may be unnecessary. Consider either consolidating these statements or adding different emphasis if the repetition is intentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/spec-reviewer-prompt.md` at line 77, Remove the redundant sentence "A ✅ without the per-requirement citation table is not accepted." (the duplicate statement that repeats the earlier "A ✅ approval without per-requirement citations is invalid") by either deleting this second occurrence or consolidating it into the earlier sentence so the rule appears once with any needed emphasis; update the surrounding text in spec-reviewer-prompt.md to keep a single clear statement about per-requirement citations.
70-71: "Controller" terminology appears here too.Same suggestion as in implementer-prompt.md: consider defining "controller" explicitly as "controller (team lead or main agent)" for clarity across both modes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/spec-reviewer-prompt.md` around lines 70 - 71, The term "controller" is ambiguous in spec-reviewer-prompt.md; update the first occurrence of "controller" (the bold sentence starting with "A ✅ approval...") to explicitly define it as "controller (team lead or main agent)" and apply the same explicit parenthetical definition in implementer-prompt.md so both modes use the same terminology; ensure the wording is brief and consistent wherever "controller" is introduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/plans/2026-02-28-persistent-validator-pool-design.md`:
- Line 121: The Validator rotation tradeoff paragraph ("Validator rotation
(large plans):") is too weak given it undermines cross-task gap detection;
update docs and design to (1) explicitly document the limitation in SKILL.md
under "Validator rotation" and add a clear recommendation against using team
mode for plans where cross-task gap detection is critical, (2) propose and
document two concrete alternatives in the design doc: increase the rotation
threshold (e.g., rotate every 10 tasks instead of 5) and a "handoff protocol"
where the outgoing validator emits a summarized dependency digest (task IDs, key
outputs, unresolved references) that the incoming validator ingests, and (3) add
a short example of the handoff data shape and where it is produced/consumed
(referencing the Validator rotation paragraph and the term "handoff protocol")
so reviewers can evaluate feasibility.
- Around line 148-151: Revise the Phase 2 bullet list to accurately describe the
files: update the `spec-reviewer-prompt.md` bullet to say it remains for Phase 1
citation requirements (standard mode), add a new bullet for
`validator-prompt.md` stating "**create new.** Combined spec+quality prompt for
persistent validators (team mode)", and keep the
`code-quality-reviewer-prompt.md` bullet marked "**delete.**" so the intent and
file roles are unambiguous.
In `@docs/plans/2026-02-28-persistent-validator-pool-implementation.md`:
- Around line 248-368: The document contains an inconsistent fallback (Task 4b)
that describes a "Fresh Subagent" validator-prompt but the repo actually
contains the persistent-team validator-prompt used when SDK memory was retained;
update the plan by either removing Task 4b entirely or clearly marking it as
skipped: find the "Task 4b: (FALLBACK ONLY — skip if SDK retains context)"
section and the referenced validator-prompt.md and SKILL.md entries and replace
the fallback content with a single prominent note like "SKIPPED - SDK memory
retained. See main path Tasks 5-10" (or delete the fallback block), and ensure
any references to code-quality-reviewer-prompt.md in SKILL.md and docs are
updated to point to the actual validator-prompt.md if needed.
In `@skills/subagent-driven-development/SKILL.md`:
- Around line 196-200: The text is inconsistent about how validators signal
availability: update the SKILL.md lines describing validator signaling to match
validator-prompt.md and the implementation plan by specifying that validators
must send both TaskUpdate(status: "completed") and a SendMessage when their
verdict is ready; update the phrasing that currently says the team lead "uses
message receipt as the availability signal" to state the team lead treats either
the TaskUpdate or the SendMessage (preferably requiring TaskUpdate as the
canonical signal and SendMessage as a human-readable notification) as the
availability/ completion signal; ensure references to SendMessage and TaskUpdate
and the implementation plan (Task 6) are aligned so all docs consistently
require TaskUpdate(status: "completed") plus SendMessage.
---
Nitpick comments:
In `@docs/plans/2026-02-28-persistent-validator-pool-design.md`:
- Line 95: Update the design note to add a concrete recommendation for the
writing-plans skill: require authors to include an explicit "Dependencies"
section in each task spec that lists prior task IDs, diffs, and any
modules/schemas referenced; update references to the writing-plans validator
logic (the "Infer dependencies" guidance) to point authors at this required
section so implicit dependencies are not missed and validators can reliably
re-read prior diffs when listed.
In `@docs/plans/2026-02-28-persistent-validator-pool-implementation.md`:
- Line 4: Add a short clarifying note at the start of Task 4 indicating the SDK
memory validation was already completed (referencing the header line that shows
"SDK memory test result: RETAINED (tested 2026-02-28)") and mark Task 4 as
documentation of the protocol only; also remove or explicitly mark Task 4b as
"skipped" or "not required" (update the Task 4 and Task 4b section
headings/messages in the document to reflect these statuses so readers know Task
4 is reference-only and Task 4b should be skipped).
- Around line 47-68: Several fenced code blocks in the plan (the blocks labeled
"Test output", "Diff stat", and the command/output examples under "Behavior
demonstration"/"Files changed") lack language specifiers; update each
triple-backtick fence to include an appropriate language hint (e.g., ```bash or
```sh for commands, ```text or ```console for pasted output, and ```diff for git
diff output) so syntax highlighters and linters stop flagging them; locate the
blocks by searching for the literal headings "Test output", "Diff stat", and
"Behavior demonstration" in the plan and add the language tags to their opening
``` lines.
In `@skills/subagent-driven-development/implementer-prompt.md`:
- Line 89: The phrase "controller" in the sentence "A report submitted without
this evidence section will be **rejected by the controller**..." is ambiguous;
update that sentence in implementer-prompt.md to explicitly define "controller"
by replacing or appending clarification such as "rejected by the controller
(team lead in team mode, or the main agent in standard mode)" so
reviewers/readers immediately understand who enforces rejection; locate the
exact string "rejected by the controller" and change it to the clarified
wording.
- Line 79: Update the "**2. Diff stat** — run `git diff --stat <base-sha>`" line
to offer flexible alternatives when the base SHA isn't known: keep the original
suggestion but add guidance to obtain the branch point via the merge-base of
main and the current branch (so users can pass that as the base), and also
mention using a simple fallback like comparing against the previous commit
(HEAD~1); reference the "**2. Diff stat**" heading and the git diff --stat
instruction to locate and update the text.
In `@skills/subagent-driven-development/SKILL.md`:
- Line 122: Three fenced code blocks in SKILL.md are missing language
specifiers; update each triple-backtick block (the three occurrences of ``` with
no language) to include an explicit language hint such as ```text (or
```bash/```json as appropriate for the content) so Markdown linters and
renderers correctly highlight and render them.
In `@skills/subagent-driven-development/spec-reviewer-prompt.md`:
- Line 77: Remove the redundant sentence "A ✅ without the per-requirement
citation table is not accepted." (the duplicate statement that repeats the
earlier "A ✅ approval without per-requirement citations is invalid") by either
deleting this second occurrence or consolidating it into the earlier sentence so
the rule appears once with any needed emphasis; update the surrounding text in
spec-reviewer-prompt.md to keep a single clear statement about per-requirement
citations.
- Around line 70-71: The term "controller" is ambiguous in
spec-reviewer-prompt.md; update the first occurrence of "controller" (the bold
sentence starting with "A ✅ approval...") to explicitly define it as "controller
(team lead or main agent)" and apply the same explicit parenthetical definition
in implementer-prompt.md so both modes use the same terminology; ensure the
wording is brief and consistent wherever "controller" is introduced.
In `@skills/subagent-driven-development/validator-prompt.md`:
- Line 5: The fenced code block uses a plain triple-backtick delimiter (```);
update that delimiter to include a language specifier such as ```markdown or
```text so the block renders with proper syntax highlighting and clearer
formatting (replace the existing ``` fence with ```markdown or ```text
throughout the document).
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/plans/2026-02-28-persistent-validator-pool-design.mddocs/plans/2026-02-28-persistent-validator-pool-implementation.mdskills/subagent-driven-development/SKILL.mdskills/subagent-driven-development/code-quality-reviewer-prompt.mdskills/subagent-driven-development/implementer-prompt.mdskills/subagent-driven-development/spec-reviewer-prompt.mdskills/subagent-driven-development/validator-prompt.md
💤 Files with no reviewable changes (1)
- skills/subagent-driven-development/code-quality-reviewer-prompt.md
There was a problem hiding this comment.
♻️ Duplicate comments (2)
skills/subagent-driven-development/SKILL.md (2)
202-206:⚠️ Potential issue | 🟡 MinorInconsistency: validator availability signaling mechanism (duplicate of past review).
Lines 203-205 describe availability tracking solely via
SendMessagereceipt, but line 126 states validators are "Declared idle/busy viaTaskUpdate." According to the past review comment, validator-prompt.md instructs validators to use bothTaskUpdate(status: "completed")ANDSendMessage.Recommendation: Update this section to clarify that:
- Validators send both
TaskUpdate(status: "completed")(for programmatic tracking) andSendMessage(for human-readable notification)- Team lead tracks availability using
TaskUpdateas the canonical signal- Align with validator-prompt.md and the implementation plan (Task 6, lines 597-598 per past review)
This directly addresses the unresolved past review comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` around lines 202 - 206, Update the text to resolve the inconsistency by stating that validators emit both TaskUpdate(status: "completed") for programmatic availability tracking and SendMessage for human-readable notification; specify that the team lead treats TaskUpdate as the canonical availability signal (use TaskUpdate events to decide whether to send a new review) while SendMessage is used only for human notifications; mention this aligns with validator-prompt.md and the implementation plan (Task 6) so the section referencing validator availability via SendMessage is changed to reflect both signals and the canonical role of TaskUpdate.
126-126:⚠️ Potential issue | 🟡 MinorInconsistency: validator availability signaling mechanism persists.
Line 126 states validators are "Declared idle/busy via TaskUpdate," but lines 203-205 state "Validators notify the team lead when their verdict is ready via
SendMessage. The team lead uses message receipt as the availability signal."This is the same inconsistency flagged in the past review comment on lines 202-206. The description needs to be unified across the file.
Recommendation: If validators use both
TaskUpdate(status: "completed")ANDSendMessage(as suggested by validator-prompt.md per the past review), update line 126 and lines 203-205 to reflect both mechanisms. Clarify which signal is canonical (likelyTaskUpdatefor programmatic tracking, withSendMessageas a human-readable notification).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` at line 126, Unify the availability signaling description for validator agents: update the "Validator agents (2)" line and the later paragraph that mentions "Validators notify the team lead..." so they consistently state that validators declare status via TaskUpdate(status: "completed") as the canonical, programmatic availability signal and also send a SendMessage as a human-readable notification (per validator-prompt.md); explicitly mark TaskUpdate as authoritative for workflow tracking and SendMessage as optional/notification to the team lead.
🧹 Nitpick comments (1)
skills/subagent-driven-development/SKILL.md (1)
128-128: Optional: Add language specifiers to fenced code blocks.Static analysis flags three fenced code blocks without language specifiers (lines 128, 191, 333). While these are templates/pseudo-code rather than executable code, adding
textas the language identifier would silence the linter warnings:-``` +```textThis is purely a stylistic improvement and does not affect functionality.
Also applies to: 191-191, 333-333
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` at line 128, Three fenced code blocks use bare triple-backticks (``` ) causing linter warnings; update each opening fence to include the language specifier "text" (i.e., change ``` to ```text) for the three occurrences of empty-language code blocks so the linter stops flagging them and the markdown remains unchanged in content; search the SKILL.md for code fences that are just three backticks and replace their opening fence accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@skills/subagent-driven-development/SKILL.md`:
- Around line 202-206: Update the text to resolve the inconsistency by stating
that validators emit both TaskUpdate(status: "completed") for programmatic
availability tracking and SendMessage for human-readable notification; specify
that the team lead treats TaskUpdate as the canonical availability signal (use
TaskUpdate events to decide whether to send a new review) while SendMessage is
used only for human notifications; mention this aligns with validator-prompt.md
and the implementation plan (Task 6) so the section referencing validator
availability via SendMessage is changed to reflect both signals and the
canonical role of TaskUpdate.
- Line 126: Unify the availability signaling description for validator agents:
update the "Validator agents (2)" line and the later paragraph that mentions
"Validators notify the team lead..." so they consistently state that validators
declare status via TaskUpdate(status: "completed") as the canonical,
programmatic availability signal and also send a SendMessage as a human-readable
notification (per validator-prompt.md); explicitly mark TaskUpdate as
authoritative for workflow tracking and SendMessage as optional/notification to
the team lead.
---
Nitpick comments:
In `@skills/subagent-driven-development/SKILL.md`:
- Line 128: Three fenced code blocks use bare triple-backticks (``` ) causing
linter warnings; update each opening fence to include the language specifier
"text" (i.e., change ``` to ```text) for the three occurrences of empty-language
code blocks so the linter stops flagging them and the markdown remains unchanged
in content; search the SKILL.md for code fences that are just three backticks
and replace their opening fence accordingly.
|
Addressed all coderabbit review comments in commit 0a0a054: Comment 1 (validator rotation tradeoff, design.md:121): Added a rotation handoff protocol — before the outgoing validator is replaced, the team lead asks it for a dependency summary (cross-task relationships observed across its reviewed tasks). This summary is included in the first review request to the incoming validator, preserving the most critical cross-task knowledge across the rotation boundary. Applied in both design doc and SKILL.md. Comment 2 (Files to Change description, design.md:151): Corrected the Phase 2 entry — it now accurately describes creating a new Comment 3 (Task 4b fallback, impl.md:368): Added a prominent Comment 4 (TaskUpdate inconsistency, SKILL.md:206): Already resolved in commit 582927a (before this review was posted). |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/plans/2026-02-28-persistent-validator-pool-implementation.md (2)
128-144: Consider adding language identifier to code block for clarity.The code block at line 128 showing the "Current" content could benefit from a language identifier (e.g.,
markdownortext) to improve readability and satisfy linters. This is a minor formatting improvement that would align with markdown best practices.📝 Proposed improvement
-Current (lines 58-60 inside template): -``` +Current (lines 58-60 inside template): +```text Report: - ✅ Spec compliant (if everything matches after code inspection) - ❌ Issues found: [list specifically what's missing or extra, with file:line references]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/plans/2026-02-28-persistent-validator-pool-implementation.md` around lines 128 - 144, Update the triple-backtick code block that shows the "Current (lines 58-60 inside template):" snippet so it includes a language identifier (e.g., change ``` to ```text or ```markdown) to satisfy linters and improve readability; specifically edit the fenced block containing the "Report:" lines in the docs/plans template so the opening fence is annotated (look for the exact "Report:" snippet shown) and leave the block contents unchanged.
186-191: Consider adding language identifier to bash code block.The bash code block at line 187 could include the
bashlanguage identifier for syntax highlighting and linter compliance. This is a minor formatting improvement.📝 Proposed improvement
**Step 3: Check git log** -```bash +```bash git log --oneline -5Note: This may already be correct in your editor; if line 187 already shows `bash`, disregard this comment. </details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/plans/2026-02-28-persistent-validator-pool-implementation.mdaround
lines 186 - 191, Add the bash language identifier to the fenced code block
containing the git command so the block starting with the triple backticks and
containing "git log --oneline -5" is annotated as bash; locate the code fence in
the document (the block that currently reads ``` followed by "git log --oneline
-5") and change it to include the language token (bash) on the opening fence to
enable proper syntax highlighting and linter compliance.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@skills/subagent-driven-development/SKILL.md:
- Line 126: The doc is inconsistent about how validator availability is tracked:
keep the validators' declaration "via TaskUpdate" in the "Validator agents (2)"
line and update the team-lead description (the paragraph that currently says
"Team lead tracks validator availability via message flow" and "uses message
receipt as the availability signal") to explicitly state that the Team lead
observes message flow for activity but relies on the validators' TaskUpdate
state (TaskUpdate) as the authoritative availability signal; ensure references
to "Validator agents (2)", "Team lead", and "TaskUpdate" are mentioned so the
workflow example remains consistent.
Nitpick comments:
In@docs/plans/2026-02-28-persistent-validator-pool-implementation.md:
- Around line 128-144: Update the triple-backtick code block that shows the
"Current (lines 58-60 inside template):" snippet so it includes a language
identifier (e.g., changetotext or ```markdown) to satisfy linters and
improve readability; specifically edit the fenced block containing the "Report:"
lines in the docs/plans template so the opening fence is annotated (look for the
exact "Report:" snippet shown) and leave the block contents unchanged.- Around line 186-191: Add the bash language identifier to the fenced code block
containing the git command so the block starting with the triple backticks and
containing "git log --oneline -5" is annotated as bash; locate the code fence in
the document (the block that currently reads ``` followed by "git log --oneline
-5") and change it to include the language token (bash) on the opening fence to
enable proper syntax highlighting and linter compliance.</details> --- <details> <summary>ℹ️ Review info</summary> **Configuration used**: Repository UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between fdec277854872b9195d6d8c4c5fc5d4382469796 and 0a0a054c424434d38edcfdf9cf3137e474fb9da6. </details> <details> <summary>📒 Files selected for processing (3)</summary> * `docs/plans/2026-02-28-persistent-validator-pool-design.md` * `docs/plans/2026-02-28-persistent-validator-pool-implementation.md` * `skills/subagent-driven-development/SKILL.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
skills/subagent-driven-development/SKILL.md (1)
203-203: Optional: vary sentence structure for improved readability.Three consecutive sentences begin with "Team lead" which slightly impacts flow. Consider varying the structure for better readability.
♻️ Proposed variation
-Validator agents (2) declare their state via TaskUpdate: `in_progress` when they receive a review request, `completed` when they return a verdict. Team lead relies on this TaskUpdate state as the authoritative availability signal. Team lead also observes message flow as a secondary activity indicator — a validator is treated as busy from the moment they receive a review request until their verdict arrives. Team lead does not send a second review to a validator before receiving their verdict for the first. +Validator agents (2) declare their state via TaskUpdate: `in_progress` when they receive a review request, `completed` when they return a verdict. The team lead relies on this TaskUpdate state as the authoritative availability signal, while also observing message flow as a secondary activity indicator — a validator is treated as busy from the moment they receive a review request until their verdict arrives. A second review is never sent to a validator before receiving their verdict for the first.As per static analysis hint: LanguageTool flagged three successive sentences beginning with "Team lead" (ENGLISH_WORD_REPEAT_BEGINNING_RULE).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` at line 203, The three successive sentences starting with "Team lead" reduce readability; rewrite the passage that describes validator agents declaring state via TaskUpdate (`in_progress`, `completed`) and how the team lead uses that state and message flow to gate further requests so the sentences don't all begin with "Team lead" — e.g., start one sentence with the action ("The team lead relies on this TaskUpdate state..."), another with the observation ("They also observe message flow..."), or use passive/compound structures to vary rhythm while keeping the same meaning and preserving references to TaskUpdate, `in_progress`, `completed`, validator, and verdict.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/subagent-driven-development/SKILL.md`:
- Line 8: Update the sentence "Execute plan by dispatching fresh subagent per
task, with combined spec+quality review after each. In team mode: persistent
validator agents eliminate per-review startup overhead and enable cross-task gap
detection." to clarify that the "combined spec+quality review after each"
applies only to Team Mode; explicitly state that Standard Mode performs two
separate passes (first a spec compliance review, then a code quality review).
Edit the SKILL.md phrasing around that sentence to mention "Team Mode: combined
spec+quality review..." and "Standard Mode: sequential spec compliance review
followed by code quality review" so readers are not misled.
---
Nitpick comments:
In `@skills/subagent-driven-development/SKILL.md`:
- Line 203: The three successive sentences starting with "Team lead" reduce
readability; rewrite the passage that describes validator agents declaring state
via TaskUpdate (`in_progress`, `completed`) and how the team lead uses that
state and message flow to gate further requests so the sentences don't all begin
with "Team lead" — e.g., start one sentence with the action ("The team lead
relies on this TaskUpdate state..."), another with the observation ("They also
observe message flow..."), or use passive/compound structures to vary rhythm
while keeping the same meaning and preserving references to TaskUpdate,
`in_progress`, `completed`, validator, and verdict.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
skills/subagent-driven-development/SKILL.md (5)
48-48: Consider clarifying that this section describes Standard Mode.The section header "The Process" could be misinterpreted as applying to both Standard and Team modes. Since Team Mode has its own dedicated section (line 106), explicitly labeling this as "The Standard Mode Process" would improve clarity.
📝 Suggested clarification
-## The Process +## The Process (Standard Mode)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` at line 48, Rename the ambiguous section header "The Process" to explicitly "The Standard Mode Process" in SKILL.md so readers understand it refers only to Standard Mode (contrast with the existing "Team Mode" section); update any nearby references or links that mention "The Process" to match the new header text to avoid broken anchors and ensure the Team Mode section remains distinct.
335-399: Consider adding language specifiers to fenced code blocks.The fenced code block at line 335 (Team Mode example workflow) doesn't have a language specifier. Consider adding
textto follow markdown best practices.📝 Suggested fix
-``` +```text You: I'm using Subagent-Driven Development (team mode) for this plan. [Pre-flight: "How much context do you have left? If below ~50%, run /compact before I proceed."](Apply to the opening fence tag of the code block)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` around lines 335 - 399, The fenced code block containing the "You: I'm using Subagent-Driven Development (team mode) for this plan." Team Mode example should include a language specifier; change the opening fence from ``` to ```text (leave the contents and closing fence unchanged) so the block is marked as plain text in SKILL.md and improves markdown rendering.
203-203: Consider varying sentence structure for better readability.Three consecutive sentences begin with "Team lead," which slightly impacts readability. Consider rewording to vary the sentence structure.
✍️ Suggested rewording
-Validator agents (2) declare their state via TaskUpdate: `in_progress` when they receive a review request, `completed` when they return a verdict. Team lead relies on this TaskUpdate state as the authoritative availability signal. Team lead also observes message flow as a secondary activity indicator — a validator is treated as busy from the moment they receive a review request until their verdict arrives. Team lead does not send a second review to a validator before receiving their verdict for the first. +Validator agents (2) declare their state via TaskUpdate: `in_progress` when they receive a review request, `completed` when they return a verdict. The team lead relies on this TaskUpdate state as the authoritative availability signal, while also observing message flow as a secondary activity indicator. A validator is treated as busy from the moment they receive a review request until their verdict arrives, and the team lead does not send a second review to a validator before receiving their verdict for the first.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/subagent-driven-development/SKILL.md` at line 203, The three consecutive sentences starting with "Team lead" in the paragraph describing Validator agents and TaskUpdate create a repetitive rhythm; rewrite them to vary sentence structure while preserving meaning: keep the phrases "Validator agents (2)", "TaskUpdate: `in_progress`", and "completed" and that the Team lead relies on TaskUpdate as the authoritative availability signal, but rephrase so one sentence explains TaskUpdate states, another explains the Team lead's reliance on that state, and a third describes observing message flow as a secondary indicator and the rule that a validator won’t receive a second review before their verdict—all while avoiding starting each sentence with "Team lead".
128-133: Consider adding language specifiers to fenced code blocks.The fenced code block at line 128 (team composition diagram) doesn't have a language specifier. While this doesn't affect functionality, adding
textas a language identifier follows markdown best practices and may improve rendering in some viewers.📝 Suggested fix
-``` +```text Team: - team-lead (you) - implementer-1, implementer-2, implementer-3 ← parallel, one per task group - validator-1, validator-2 ← persistent, combined spec+quality</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@skills/subagent-driven-development/SKILL.mdaround lines 128 - 133, Add a
language specifier to the fenced code block that contains the "Team:"
composition diagram in SKILL.md by changing the opening fence from ``` toentries like implementer-1, validator-1) is marked as plain text for better rendering and markdown tooling.
191-197: Consider adding language specifiers to fenced code blocks.The fenced code block at line 191 (review request template) doesn't have a language specifier. Consider adding
textormarkdownto improve rendering consistency.📝 Suggested fix
-``` +```text Task spec: [FULL TEXT of task requirements] Implementer report: [full report including test output, diff stat, files changed] BASE_SHA: [commit before this task] HEAD_SHA: [current HEAD after implementer's commit] SHA map (for dependency re-reads): [dict of prior task → their HEAD_SHA, e.g. {"task-1": "abc123", "task-2": "def456"}]</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@skills/subagent-driven-development/SKILL.mdaround lines 191 - 197, The
fenced code block in the review-request template (the Task spec / Implementer
report / BASE_SHA block) lacks a language tag; update the triple-backtick fence
to include a language specifier such astextormarkdown(e.g., changetotext) so the block renders consistently—edit the fenced block in SKILL.md
where the Task spec/Implementer report/BASE_SHA/HEAD_SHA/SHA map snippet appears
to add the specifier.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@skills/subagent-driven-development/SKILL.md:
- Around line 203-206: The text incorrectly claims validators update TaskUpdate
to signal availability; remove references to TaskUpdate and instead state that
validators use SendMessage to notify the team lead that their verdict is ready
and that the team lead treats SendMessage as the authoritative availability
signal. Update the paragraph to say validators are treated as busy from receipt
of a review request until their SendMessage verdict arrives, and the team lead
will not send a second review to a validator before receiving that SendMessage;
ensure references to TaskUpdate are deleted and replace with SendMessage-based
signaling (referencing SendMessage and the team lead's routing behavior).
Nitpick comments:
In@skills/subagent-driven-development/SKILL.md:
- Line 48: Rename the ambiguous section header "The Process" to explicitly "The
Standard Mode Process" in SKILL.md so readers understand it refers only to
Standard Mode (contrast with the existing "Team Mode" section); update any
nearby references or links that mention "The Process" to match the new header
text to avoid broken anchors and ensure the Team Mode section remains distinct.- Around line 335-399: The fenced code block containing the "You: I'm using
Subagent-Driven Development (team mode) for this plan." Team Mode example should
include a language specifier; change the opening fence fromtotext
(leave the contents and closing fence unchanged) so the block is marked as plain
text in SKILL.md and improves markdown rendering.- Line 203: The three consecutive sentences starting with "Team lead" in the
paragraph describing Validator agents and TaskUpdate create a repetitive rhythm;
rewrite them to vary sentence structure while preserving meaning: keep the
phrases "Validator agents (2)", "TaskUpdate:in_progress", and "completed" and
that the Team lead relies on TaskUpdate as the authoritative availability
signal, but rephrase so one sentence explains TaskUpdate states, another
explains the Team lead's reliance on that state, and a third describes observing
message flow as a secondary indicator and the rule that a validator won’t
receive a second review before their verdict—all while avoiding starting each
sentence with "Team lead".- Around line 128-133: Add a language specifier to the fenced code block that
contains the "Team:" composition diagram in SKILL.md by changing the opening
fence fromtotext so the block (the lines starting with "Team:" and the
subsequent list entries like implementer-1, validator-1) is marked as plain text
for better rendering and markdown tooling.- Around line 191-197: The fenced code block in the review-request template (the
Task spec / Implementer report / BASE_SHA block) lacks a language tag; update
the triple-backtick fence to include a language specifier such astextor
markdown(e.g., changetotext) so the block renders consistently—edit
the fenced block in SKILL.md where the Task spec/Implementer
report/BASE_SHA/HEAD_SHA/SHA map snippet appears to add the specifier.</details> --- <details> <summary>ℹ️ Review info</summary> **Configuration used**: Repository UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 141b990b8eb1aab769e38d6a2ae489fab6d8bd48 and 4ec9a5080784d9d23f3ec7a39c5d68d23121221f. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `skills/subagent-driven-development/SKILL.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Define three evidence types (command, citation, diff) that all other skills reference. Add rejection rule for missing evidence. Add re-review loop bound (max 3 cycles). Add state.yml integration for verification result persistence. Generalize "24 failure memories" reference. Update Common Failures and Agent delegation patterns to reference specific evidence types. Addresses Findings V, Y from skills audit. Inspired by PR obra#578 (@author) evidence-based review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rsistence Require command+diff evidence for task completion. Add TDD gate (RED+GREEN evidence before marking complete). Add controller anti-rationalization (never skip review because task seems simple). Add re-review loop bound (max 3 cycles, then escalate). Add pre-flight context check before each subagent dispatch. Add state.yml progress tracking for cross-session resume. Fix final reviewer scope to whole-feature (not just last task). Add token optimization note for directory-based plans. Add review tiering (light/standard/critical). Update example workflow to show evidence collection and state.yml updates. Addresses obra#485, obra#463, obra#291, obra#147, obra#87, PR obra#578, Findings D, K, V, W. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ates Support mixed QA+implementation waves (Wave 0 = QA writes tests, Wave N = implementation + QA writes tests for N+1). Add evidence requirements (command + diff) for all completion reports. Add state.yml wave tracking (plan.completed_tasks, plan.current_wave, worktree.implementers.*). Add model tiering (opus for lead/review, sonnet for implementers). Trim implementer prompt for project agents. Add worktree lifecycle documentation, re-review loop bound (max 3 cycles), pre-flight context check, and citation evidence requirement in reviewer prompts. Addresses obra#429, obra#464, PR obra#578, Findings D, P, Q, T, U, V. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reject prose-only verdicts — require file:line citations per requirement. Add anti-skip enforcement in Red Flags. Add re-review loop bound (max 3 cycles, then escalate). Add security review tier for auth/payment/data tasks. Add per-implementer SHA guidance for team mode. Add uncommitted changes check (git status before dispatch). References canonical evidence format from verification-before-completion. Addresses obra#528, obra#463, obra#479, PR obra#560, PR obra#578, Findings T, V. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define three evidence types (command, citation, diff) that all other skills reference. Add rejection rule for missing evidence. Add re-review loop bound (max 3 cycles). Add state.yml integration for verification result persistence. Generalize "24 failure memories" reference. Update Common Failures and Agent delegation patterns to reference specific evidence types. Addresses Findings V, Y from skills audit. Inspired by PR obra#578 (@author) evidence-based review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rsistence Require command+diff evidence for task completion. Add TDD gate (RED+GREEN evidence before marking complete). Add controller anti-rationalization (never skip review because task seems simple). Add re-review loop bound (max 3 cycles, then escalate). Add pre-flight context check before each subagent dispatch. Add state.yml progress tracking for cross-session resume. Fix final reviewer scope to whole-feature (not just last task). Add token optimization note for directory-based plans. Add review tiering (light/standard/critical). Update example workflow to show evidence collection and state.yml updates. Addresses obra#485, obra#463, obra#291, obra#147, obra#87, PR obra#578, Findings D, K, V, W. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ates Support mixed QA+implementation waves (Wave 0 = QA writes tests, Wave N = implementation + QA writes tests for N+1). Add evidence requirements (command + diff) for all completion reports. Add state.yml wave tracking (plan.completed_tasks, plan.current_wave, worktree.implementers.*). Add model tiering (opus for lead/review, sonnet for implementers). Trim implementer prompt for project agents. Add worktree lifecycle documentation, re-review loop bound (max 3 cycles), pre-flight context check, and citation evidence requirement in reviewer prompts. Addresses obra#429, obra#464, PR obra#578, Findings D, P, Q, T, U, V. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reject prose-only verdicts — require file:line citations per requirement. Add anti-skip enforcement in Red Flags. Add re-review loop bound (max 3 cycles, then escalate). Add security review tier for auth/payment/data tasks. Add per-implementer SHA guidance for team mode. Add uncommitted changes check (git status before dispatch). References canonical evidence format from verification-before-completion. Addresses obra#528, obra#463, obra#479, PR obra#560, PR obra#578, Findings T, V. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… PR obra#578) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… PR obra#578) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9bec260 to
72d572f
Compare
Rebase note + design decision: surfacing Standard vs Team ModeJust rebased on upstream main. One conflict in Upstream change: Restructured the execution handoff to be capability-based — if the harness has subagents, use This branch's change: Added Team Mode as an explicit option alongside Standard mode (the original fix that motivated this PR). Resolution: Kept upstream's capability-based structure as the outer shell, but gated the Standard vs Team Mode choice on Team Mode availability — since it's a beta feature not present in most clients, presenting the choice unconditionally would be misleading: The design rationale: upstream's capability-based gating is correct, but within the subagent-capable path we want to surface Team Mode when it's available — otherwise users never discover it. Gating on availability avoids presenting a choice that most clients can't act on. |
Define three evidence types (command, citation, diff) that all other skills reference. Add rejection rule for missing evidence. Add re-review loop bound (max 3 cycles). Add state.yml integration for verification result persistence. Generalize "24 failure memories" reference. Update Common Failures and Agent delegation patterns to reference specific evidence types. Addresses Findings V, Y from skills audit. Inspired by PR obra#578 (@author) evidence-based review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ates Support mixed QA+implementation waves (Wave 0 = QA writes tests, Wave N = implementation + QA writes tests for N+1). Add evidence requirements (command + diff) for all completion reports. Add state.yml wave tracking (plan.completed_tasks, plan.current_wave, worktree.implementers.*). Add model tiering (opus for lead/review, sonnet for implementers). Trim implementer prompt for project agents. Add worktree lifecycle documentation, re-review loop bound (max 3 cycles), pre-flight context check, and citation evidence requirement in reviewer prompts. Addresses obra#429, obra#464, PR obra#578, Findings D, P, Q, T, U, V. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… PR obra#578) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rsistence Require command+diff evidence for task completion. Add TDD gate (RED+GREEN evidence before marking complete). Add controller anti-rationalization (never skip review because task seems simple). Add re-review loop bound (max 3 cycles, then escalate). Add pre-flight context check before each subagent dispatch. Add state.yml progress tracking for cross-session resume. Fix final reviewer scope to whole-feature (not just last task). Add token optimization note for directory-based plans. Add review tiering (light/standard/critical). Update example workflow to show evidence collection and state.yml updates. Addresses obra#485, obra#463, obra#291, obra#147, obra#87, PR obra#578, Findings D, K, V, W. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reject prose-only verdicts — require file:line citations per requirement. Add anti-skip enforcement in Red Flags. Add re-review loop bound (max 3 cycles, then escalate). Add security review tier for auth/payment/data tasks. Add per-implementer SHA guidance for team mode. Add uncommitted changes check (git status before dispatch). References canonical evidence format from verification-before-completion. Addresses obra#528, obra#463, obra#479, PR obra#560, PR obra#578, Findings T, V. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e management
Comprehensive audit of all 16 skills with fixes and enhancements:
Skills improvements:
- verification-before-completion: canonical evidence format (command/citation/diff)
- using-git-worktrees: rework to layer on native EnterWorktree
- test-driven-development: add pipelined TDD mode and evidence gates
- agent-team-driven-development: add pipelined TDD and evidence gates
- dispatching-parallel-agents: add worktree isolation and autonomy
- composing-teams: add agent hierarchy and test-writer role
- receiving-code-review: add evidence demand and pedantry filter
- systematic-debugging: improve CSO, add fork and worktree support
- writing-skills: align with Anthropic guide, add versioning
- finishing-a-development-branch: fix worktree CWD and add state.yml
- requesting-code-review: require citation evidence from reviewers
- writing-plans: add directory-based plans and test expectations
- brainstorming: add verification gates, research step, assumption challenging
- executing-plans: add evidence gates and cold session resume
- subagent-driven-development: add evidence gates and progress persistence
- using-superpowers: trim red flags, fix subagent context gap
Cross-cutting fixes:
- Generalize "your human partner" to "the user" across all skills
- Restore Cursor/Codex/multi-platform support across skills
- Correct spec-reviewer agent type and remove bias language (CX-4, CX-18, CX-19)
- Align code-reviewer placeholder to {PLAN_OR_REQUIREMENTS} (CX-7)
- Add worktree env file guidance and fix hook bash compatibility (obra superpowers obraGH-521, obraGH-572, obraGH-584)
- Add integration section and sequential fallback to dispatching (CX-17, markelz0r)
- Add announce and cross-task dependency check to agent-team (CX-15, obra superpowers obraGH-578)
- Update stale skill references and verification links (CX-9)
- Address CodeRabbit review findings
Documentation:
- Skills audit brainstorm, design document, and implementation plan
- Skills audit v2 design, research, and implementation plan
- Upstream research: open issues/PRs, closed issues, forks survey, unmerged PRs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e management
Comprehensive audit of all 16 skills with fixes and enhancements:
Skills improvements:
- verification-before-completion: canonical evidence format (command/citation/diff)
- using-git-worktrees: rework to layer on native EnterWorktree
- test-driven-development: add pipelined TDD mode and evidence gates
- agent-team-driven-development: add pipelined TDD and evidence gates
- dispatching-parallel-agents: add worktree isolation and autonomy
- composing-teams: add agent hierarchy and test-writer role
- receiving-code-review: add evidence demand and pedantry filter
- systematic-debugging: improve CSO, add fork and worktree support
- writing-skills: align with Anthropic guide, add versioning
- finishing-a-development-branch: fix worktree CWD and add state.yml
- requesting-code-review: require citation evidence from reviewers
- writing-plans: add directory-based plans and test expectations
- brainstorming: add verification gates, research step, assumption challenging
- executing-plans: add evidence gates and cold session resume
- subagent-driven-development: add evidence gates and progress persistence
- using-superpowers: trim red flags, fix subagent context gap
Cross-cutting fixes:
- Generalize "your human partner" to "the user" across all skills
- Restore Cursor/Codex/multi-platform support across skills
- Correct spec-reviewer agent type and remove bias language (CX-4, CX-18, CX-19)
- Align code-reviewer placeholder to {PLAN_OR_REQUIREMENTS} (CX-7)
- Add worktree env file guidance and fix hook bash compatibility (obra superpowers obraGH-521, obraGH-572, obraGH-584)
- Add integration section and sequential fallback to dispatching (CX-17, markelz0r)
- Add announce and cross-task dependency check to agent-team (CX-15, obra superpowers obraGH-578)
- Update stale skill references and verification links (CX-9)
- Address CodeRabbit review findings
Documentation:
- Skills audit brainstorm, design document, and implementation plan
- Skills audit v2 design, research, and implementation plan
- Upstream research: open issues/PRs, closed issues, forks survey, unmerged PRs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9928621 to
b049f7e
Compare
Code reviewFound 3 issues:
superpowers/skills/subagent-driven-development/SKILL.md Lines 240 to 244 in b049f7e superpowers/skills/subagent-driven-development/SKILL.md Lines 276 to 278 in b049f7e superpowers/skills/subagent-driven-development/validator-prompt.md Lines 77 to 79 in b049f7e
superpowers/skills/subagent-driven-development/SKILL.md Lines 468 to 470 in b049f7e
Lines 444 to 446 in b049f7e |
13cd1ca to
4ba475c
Compare
|
This is ready to merge after using on a large number of internal projects. It is by my estimation 2-5x faster than subagent mode and appears to be easier on tokens as well, as the team members persist and do not need to re-read files for context. |
…olution Merges Phase 1 (mandatory evidence + file:line citations) and Phase 2 (persistent validator pool, team mode) from STRML's PR obra#578. Conflict in writing-plans/SKILL.md resolved: kept both execution options (Subagent-Driven + Inline Execution) while adding Team Mode detection within the subagent path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implementers must now include full test output, git diff --stat, and file list before reporting complete. Reports without evidence will be rejected by the controller.
Spec reviewer must now produce per-requirement file:line citations with code excerpts. Prose-only approvals are rejected. This makes phantom completion immediately detectable from the evidence trail.
Add TeamCreate branch to When to Use diagram. Update description and core principle to reflect team mode with persistent validators.
Combined spec+quality review in a single SendMessage payload. Includes SHA map for cross-task dependency re-reads, mandatory file:line citations per requirement, and self-reporting via TaskUpdate.
Add Team Mode section to subagent-driven-development: - Pre-flight context compaction recommendation before TeamCreate - 2 persistent validator agents (combined spec+quality review) - SHA map in review requests for cross-task dependency re-reads - 3-cycle review cap with escalation - Context poisoning mitigations: re-read rule, bounded re-open, cascade handling - Validator rotation for plans >8 tasks with explicit tradeoff disclosure - Validator failure recovery (60s timeout -> fresh subagent fallback)
…prompt Remove code-quality-reviewer-prompt.md (superseded by validator-prompt.md in team mode; standard mode uses requesting-code-review skill). Update Red Flags to cover both modes including team-mode-specific rules. Require file:line citations before accepting any verdict. Add compaction reminder to team mode never list.
Show team mode orchestration with parallel implementers, SHA map maintenance, and validator round-trip. Update cost analysis and add context compaction note to team lead overhead.
Align with revised task instructions: - Prompt Templates: list code-quality-reviewer-prompt.md explicitly for standard mode - Pre-Flight: use concise single-paragraph format - Key Constraints: add 9th constraint (never skip compaction) - Team Lifecycle: simplified numbering (6 steps instead of 7) - Example Workflow: update standard example inline with validator pattern - Cost section: match revised format with per-mode breakdown
…Templates code-quality-reviewer-prompt.md is deleted; standard mode now references superpowers:requesting-code-review directly (which the file called anyway).
Standard mode example workflow was incorrectly updated to show combined validator dispatch. Standard mode still uses separate spec and quality reviewers. Team mode example (separate section) correctly shows validator.
- Quality Gates: distinguish standard (two-stage) vs team (single-pass) review - Validator Idle Tracking: remove TaskUpdate-based availability (task ID never defined); replace with message-based flow (verdict receipt = available) - Red Flags "If reviewer finds issues": add mode qualifiers (standard: same subagent; team: SendMessage to implementer + route to validator) - validator-prompt.md: remove TaskUpdate(completed) call (no validator task exists to update); replace with SendMessage notification instruction - Process digraph and Key Constraints: update to match message-based tracking
Previous guidance was passive ("if 20+ turns, consider compacting") and
easily skipped. A team lead with near-full context will fail mid-plan
with no recovery path.
Now: before TeamCreate, the agent must ask the user how much context
remains and wait for /compact before proceeding if below ~50%. This is
a hard gate, not a suggestion. Updated Pre-Flight, Key Constraints,
Team Lifecycle, Red Flags, example workflow, and Cost section.
- design doc "Files to Change": correct Phase 2 entry to show validator-prompt.md (new file) instead of extending spec-reviewer-prompt.md - design doc rotation: add handoff protocol — outgoing validator sends dependency summary to team lead before being replaced - SKILL.md rotation: same handoff protocol added - implementation plan Task 4b: add prominent SKIPPED notice (SDK memory was retained; persistent topology was implemented instead) The TaskUpdate inconsistency flagged in SKILL.md:206 was already resolved in commit 582927a.
…signal Team lead now explicitly checks TaskUpdate state before routing reviews, with message flow as a secondary activity indicator. Resolves inconsistency between line 126 (validators declare idle/busy via TaskUpdate) and the Validator Idle State Tracking section (which only described message flow).
Standard mode performs sequential two-pass review (spec then quality); team mode runs combined spec+quality in a single validator pass. The opening sentence incorrectly implied combined review for all modes.
After compaction, TaskList persists as external storage — team lead should read it immediately rather than assuming state was lost and pinging all implementers. Blind broadcasts are unnecessary and expensive.
…dator availability signal Validators signal availability via SendMessage verdict only. TaskUpdate is not used for availability tracking. Removes inconsistency introduced in previous review pass.
Mike reported a bug where the coordinator bypassed verification by doing implementation work itself instead of delegating to implementer subagents. Adds an explicit constraint and Red Flag entry to prevent this.
Execution handoff previously offered only 2 options (standard, parallel session), leaving team mode unreachable. subagent-driven-development gates team mode on user opt-in, but the opt-in question was never asked. Now presents 3 explicit choices with descriptions to help users decide.
…xt launch Team mode now synthesizes a complete self-contained execution prompt and presents it via EnterPlanMode. User selects 'clear context and auto-accept edits' to start team execution with clean context — no /compact needed.
…rite pattern SendMessage is the primary signal during normal operation; TaskUpdate provides durability for compaction recovery. Team lead owns idle→busy (in_progress on assign), validator owns busy→idle (completed after verdict). Also fixes stale compaction reference in advantages section and dangling code-quality-reviewer-prompt.md reference in release notes.
4ba475c to
faed54f
Compare
Background
This PR addresses two confirmed failure modes in subagent-driven-development, plus a third discovered during testing.
Problem 1: Phantom completion
The most common complaint: a plan finishes, everything shows green, and the implementation is partially or entirely hallucinated. Tasks are "complete" but the code does not actually do what was asked.
The root cause is structural. The current spec reviewer prompt asks the reviewer to "read the actual code" and "do not trust the report" -- but there is nothing that forces it to produce verifiable output. A reviewer can (and does) write a plausible-sounding prose narrative, tick the boxes, and return passed without ever opening a file.
Why citations specifically? A file:line citation is the smallest piece of evidence that is both cheap to produce and hard to fake convincingly. Writing "requirement X is implemented at src/validators.py:42 -- def validate_email(addr): return bool(EMAIL_RE.match(addr))" requires either (a) having read that line of code, or (b) generating a plausible-looking but fabricated citation. Fabricated citations fail immediately on audit -- the file does not exist, the line number is wrong, or the excerpt does not implement the requirement.
Problem 2: Slow validation on large plans
Team mode parallelizes implementers, but validation is still sequential per task: spec review, then quality review, each as a fresh subagent. Each fresh subagent pays ~30-60 seconds of startup overhead and then re-reads the codebase from scratch. On a 10-task plan, that is 20 fresh review invocations, each re-discovering the same project structure.
There is a second problem: fresh subagents have no memory of prior tasks. If task 5 is supposed to use a function that was supposed to be implemented in task 2 but was not, the task 5 reviewer has no way to know.
Why persistent validators? A validator that stays alive across tasks accumulates codebase context. Combined with the SHA map (team lead passes prior task SHAs in every review request, validator re-reads the relevant diffs from git rather than relying on memory), this gives validators a reliable mechanism to check dependencies without trusting their own recall. The combined spec+quality pass also eliminates the spec->quality sequential bottleneck. One agent call per task instead of two.
What this PR does
Phase 1: Mandatory evidence (both modes, no architectural change)
implementer-prompt.md: New mandatory evidence section. Before reporting complete, implementers must paste full test command output (verbatim), git diff --stat output (verbatim), list of changed files with line counts, and behavior demonstration output if the task requires it. Reports without this section are explicitly rejected by the controller.
spec-reviewer-prompt.md: New mandatory citation section. For each requirement in the task spec, the reviewer must provide the specific file:line and a 2-3 line code excerpt. Requirements with no citation found are marked MISSING. A passed verdict without the per-requirement citation table is invalid and must be rejected by the controller.
Phase 2: Persistent validator pool (Team Mode only, requires beta teams API)
BETA DEPENDENCY: Phase 2 requires Claude Code experimental agent teams feature (TeamCreate, SendMessage, TaskList). Phase 1 ships independently and works without it.
validator-prompt.md (new file): A SendMessage payload template for persistent validator team members. Contains both spec compliance (with file:line citations) and code quality in a single pass. Includes SHA map field for cross-task dependency re-reads, mandatory citation format, verdict structure, and re-review protocol.
SKILL.md: Full Team Mode section covering:
code-quality-reviewer-prompt.md: Deleted. Its role is absorbed by validator-prompt.md in team mode; in standard mode, superpowers:requesting-code-review is referenced directly.
Problem 3: Context overload (found during first use)
When a user invokes team mode after a long brainstorming or planning session, the team lead starts with most of its context already consumed. As the plan executes, each task report and verdict adds more. A team lead that hits the context limit mid-plan stops orchestrating.
The original fix used a mandatory interactive /compact check: before TeamCreate, the agent would ask the user how much context they had left and wait for confirmation before proceeding. This worked but was slow and wasteful -- compaction summarizes accumulated noise rather than eliminating it.
Revised fix: The pre-flight now uses Claude Code's built-in "clear context and auto-accept edits" dialog instead. The agent synthesizes a complete self-contained execution prompt embedding the full plan, all task requirements, architecture context, team composition, and initial assignments -- everything the fresh session needs. It presents this via EnterPlanMode. The user selects option 1 (Shift+Tab) to clear context and begin execution. The team lead starts with zero accumulated context, not summarized context. No /compact wait, no back-and-forth. One keypress.
writing-plans: Team Mode surfacing + beta gating
Rebased on upstream main, which restructured writing-plans execution handoff to be capability-based (subagents vs. no subagents). Within the subagent-capable path, this PR adds Team Mode as a conditional option -- only offered when Team Mode is actually available, since it's a beta feature not present in most clients. When Team Mode is unavailable, Standard mode is used directly with no spurious choice presented.
SDK memory validation
Before shipping Phase 2, we validated that Claude Code team members retain conversation context between SendMessage calls. Test: spawn agent, send "remember the number 42", send separate message "what number did I ask you to remember?", agent recalled correctly. Result: RETAINED (tested 2026-02-28). The persistent validator topology depends on this.
Design doc
docs/plans/2026-02-28-persistent-validator-pool-design.md in this branch. Includes the full problem analysis, both phases, the SDK validation gate, context poisoning mitigations with their explicit tradeoffs, and success criteria. The design went through 4 rounds of iterative review before implementation.
Test plan
Phase 1 (standard and team mode):
Phase 2 (team mode, requires beta teams feature):
Pre-flight clean-context launch: