ai-code-review v2: subagent split, embedded Python CLI, deterministic post-steps#2
Conversation
…istic post-steps
Adopt patterns from the hellohippo reference implementation while keeping
OpenCode + Bedrock:
- Move all deterministic logic (config sanitization, pre-compute, comment
operations, post-steps) from bash into one embedded stdlib-only Python CLI
- Split the monolithic prompt into an orchestrator plus OpenCode subagents:
comment-categorizer and comment-lifecycle run on a cheap configurable model
(default nvidia.nemotron-super-3-120b); review dimensions are capped at 2-3
- Structure findings as {file, line, message, dimension, severity} with a
severity filter, cross-file consolidation, and a max_comments cap (overflow
goes to the summary comment)
- Track ALL thread replies (bot + human) so human "false positive" / "won't
fix" feedback is honored instead of re-litigated
- Resolve review threads for real: deterministic post-step matches bot
"Resolved ..." replies and resolves threads via GraphQL; a second post-step
dedupes summary comments
- Split diffs per file and assemble one review-context.md so agents read a
single pre-built file instead of hunting via API calls
- Optional BORIS infra context: boris_mcp_url input installs the bmcp CLI
(pinned release + SHA256, SigV4 auth reuses the Bedrock OIDC role)
- Expand .github/ai-review.yml repo config: model, max_turns, max_comments,
additional_dimensions (all sanitized against PR-branch injection)
- Drop the auto-approve step and the vestigial review_plugin inputs
Design and decision log in HANDOFF.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Ten ranked findings from a multi-agent review of the v2 rewrite (top five marked ship-blockers: NO_ISSUES over-deletion, unanchored RESOLVED_RE, human-unresolve fighting, null-user crash, scalar additional_dimensions), plus verified lower-priority items and one refuted suggestion. Fixes not yet applied. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Narrow NO_ISSUES so unrelated bot comments are never deleted (finding 1) - Replace phrase-matching with a hidden <!-- ai-review-resolve --> marker prepended by resolve-reply; resolve-threads matches only the marker and yields when a human commented after it (findings 2, 3) - Tolerate null comment users from deleted/ghost accounts (finding 4) - Accept scalar additional_dimensions instead of iterating chars (finding 5) - Exit nonzero on mutation failure and restore the lifecycle fallback chain reply -> post-inline -> summary observations (finding 6) - Fall back to literal text when an @-prefixed body is not a file (finding 7) - Honor configured max_comments: 0 (finding 8) - Diff from the merge base to match pulls/files API semantics (finding 9) - Cap per-file (20k) and total (250k) diff bytes in review-context.md with pointers to the on-disk diffs (finding 10) - Cleanups: drop dead max_turns plumbing, bash-heredoc note for the write-disabled lifecycle agent, 400-char reply truncation for dismissal detection, generated-only check moved before expensive precompute work Validated: YAML parses, embedded Python byte-compiles, 29 functional checks pass against the extracted CLI with mocked gh/yq/git. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Working notes for the v2 redesign; superseded by the final implementation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verified against the new release: run flags used here (--title, -m, --print-logs, --log-level) unchanged; permission defaults still allow bash in-worktree without --auto; agent definitions move to the now documented .opencode/agents/ directory (the singular path still loads but is legacy). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…llers The org-wide secret holding the Bedrock OIDC role is named DEVELOPMENT_ACCOUNT_ROLE_ARN; the reusable workflow keeps its BEDROCK_ROLE_ARN secret interface and callers map the name. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
AI Code Review Summary
Additional observations
This summary was automatically generated by the AI code review workflow. |
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Validate inputs.subagent_model against the model regex before it is
substituted into agent frontmatter (defense in depth; callers are
org-controlled)
- Substitute ${SUBAGENT_MODEL} with envsubst restricted to that one
variable, at heredoc write time — replaces the perl -pi pass; the
value is inserted as literal data either way
- read_body: an argument that is a bare path to an existing file under
the work dir is read as a file with a warning — the first live run
showed the lifecycle agent passing the summary file path without '@',
which posted the literal path as the summary comment
- Lifecycle prompt: state explicitly that file contents require the '@'
prefix
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
bmcp install all --scope project appends BORIS usage guidance (with the synced tool list) to AGENTS.md and friends inside markers; OpenCode reads AGENTS.md at session start, so agents discover bmcp without relying on the orchestrator prompt alone. Runs after bmcp sync, soft-fails like the other bmcp steps. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
bmcp v0.3.0 adds OpenCode harness support: `bmcp install opencode --scope project` writes ./BORIS.md (usage + live tool catalog) and a managed block in ./AGENTS.md, which OpenCode loads as project rules for the orchestrator and every subagent. This replaces the hand-written usage blurb in the orchestrator prompt with the real synced catalog. Install runs after precompute, so the managed block never leaks into the assembled review rules. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
`workflows` -> `.github/workflows` is a convenience symlink; AGENTS.md documents it so agents don't mistake it for a duplicate directory. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Thread resolution: all resolveReviewThread mutations failed because
GitHub requires the App to hold Contents: Read and write for that
mutation ("Resource not accessible by integration") — pull_requests
write is not enough. Document the required permission in the README
and include the GraphQL error in the resolve-threads warning so the
next failure is diagnosable from the run log.
Summary bodies: the lifecycle agent passed a whole multi-line summary
as one line of literal \n escapes, which bash does not expand, so the
comment rendered raw. read_body now repairs the unambiguous case
(3+ literal \n, zero real newlines) with a warning, and the agent
rules forbid literal \n escapes explicitly.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Gemma 4 is not available on Bedrock (checked via BORIS in us-east-1, us-west-2, eu-west-1); google.gemma-3-27b-it is the largest Gemma. The review summary now ends with a fixed note pointing AI coding agents at the unresolved inline review threads, so agents acting on the PR don't treat the summary as the full findings list. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Gemma 4 (31B / 26B-A4B / E2B) launched on Bedrock in June 2026 but is served only through the bedrock-mantle endpoint (openai/v1/responses path), which the OpenCode amazon-bedrock provider does not reach; it does not appear in the classic list_foundation_models catalog. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…mand The lifecycle agent was the slowest phase (cheap model, one tool call at a time) and the least reliable. Judgment moves up, execution moves down: the orchestrator — which already has findings, existing comments, and human replies in context — writes one comment-plan.json; the new apply-plan command executes it in code: validates every id against the categorizer output, posts/replies with the reply -> inline -> observations fallback, resolve-replies stale + fixed threads (skipping already-marked ones), deletes no_issues comments, and renders the summary from a fixed template (agent note included, no more literal \n risk). Partial failures are demoted to observations and don't fail the command; only a failed summary upsert does. Also adds timing instrumentation: every CLI invocation logs UTC time + duration to stdout and .ai-review-work/timing.log, and a final "Report timings" step prints the log plus work-file mtimes so the time spent in each phase inside the OpenCode step is visible per run. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
| body = SUMMARY_MARKER + "\n" + body | ||
| summary_file = WORK_DIR / "summary-comment-id.txt" | ||
| summary_id = summary_file.read_text().strip() if summary_file.is_file() else "" | ||
| if summary_id: |
There was a problem hiding this comment.
cmd_upsert_summary has a logic bug: if summary_id exists and the PATCH request fails, the code falls through to CREATE at line 479, creating a duplicate summary comment instead of returning False. The CREATE should only execute when summary_id was empty (first-time summary), not on PATCH failure. Recommend: move the CREATE block inside an 'else' branch after the 'if summary_id:' block, or add 'return False' after logging a PATCH failure that had a non-empty summary_id.
Addresses the reviewer finding on upsert-summary: a failed PATCH fell through to creating a new summary comment, duplicating it on transient errors (dedupe-summaries would mop up, but avoid the churn). Creation now only happens when the PATCH failed with HTTP 404 — the existing comment was deleted — otherwise the command fails cleanly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
| query($owner: String!, $name: String!, $pr: Int!) { | ||
| repository(owner: $owner, name: $name) { | ||
| pullRequest(number: $pr) { | ||
| reviewThreads(first: 100) { |
There was a problem hiding this comment.
THREADS_QUERY fetches only first 100 threads and 50 comments per thread. PRs with >100 threads or threads with >50 comments will have unresolved threads silently left behind. The warning at line 664 only triggers when exactly 100 threads are returned, but pagination could return 100, 200, etc. Consider implementing proper pagination or documenting this limitation.
| body = SUMMARY_MARKER + "\n" + body | ||
| summary_file = WORK_DIR / "summary-comment-id.txt" | ||
| summary_id = summary_file.read_text().strip() if summary_file.is_file() else "" | ||
| if summary_id: |
There was a problem hiding this comment.
cmd_upsert_summary has a logic bug: when summary_id exists and the PATCH request fails with a non-404 error (e.g., rate limit, network issue), the code falls through to line 488 creating a new comment, resulting in duplicate summaries. The 404 check at line 485 only prevents fallthrough for deleted comments. Should return False on any non-404 failure.
…eads Observed live: the orchestrator's plan posted a new root comment at a file+line that already had an active bot thread, opening a duplicate thread. The same-line dedup is now a code guarantee, not a prompt rule. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary
Rewrite of the reusable
ai-code-reviewworkflow (OpenCode + Amazon Bedrock), plus hardening fixes from a follow-up code review of the rewrite.Architecture (v2)
subagent_model, defaultnvidia.nemotron-super-3-120b):comment-categorizer, which classifies existing bot comments.comment-plan.json(what to post/reply/resolve, honoring human "false positive"/"won't fix" feedback), and the CLI'sapply-plancommand executes it deterministically — validates every id against the categorizer output, posts with a reply → new-inline → summary-observations fallback so findings are never dropped, resolve-replies stale/fixed threads (skipping already-marked ones), deletesno_issuesleftovers, and renders the summary comment from a fixed template (including a note pointing AI coding agents at the inline threads). This replaced acomment-lifecycleagent that was the slowest and least reliable phase..ai-review-work/timing.log; a final workflow step prints it with work-file mtimes so per-phase time inside the OpenCode step is visible.gh/yq) replaces all bash pre-compute logic: config sanitization, per-file diff splitting, comment/thread pre-fetch, rules-file discovery (AGENTS.md/CLAUDE.md incl. per-dir), generated-only skip check, and the agent-facing comment operations.<!-- ai-review-resolve -->marker and the post-step resolves marker-tagged threads via GraphQL — skipping any thread where a human commented after the marker, so disputes are never buried.max_commentscap (default 10; overflow goes to the summary).boris_mcp_url(org-controlled caller input, never repo config) installs thebmcpCLI from a pinned, checksum-verified release (v0.3.0, which adds OpenCode harness support); review subagents get read-only live AWS context for IaC changes. Auth reuses the existing Bedrock OIDC role.bmcp install opencode --scope projectwrites./BORIS.mdand a managed block in./AGENTS.mdinto the runner's worktree, so the orchestrator and every subagent pick up the live tool catalog through OpenCode project rules instead of a hand-written prompt blurb. It runs after precompute, so the managed block never leaks into the assembled review rules.Behavior changes
review_plugin,plugin_marketplace_repo,max_turns(was dead plumbing).Hardening (from review of the rewrite)
userfields (deleted/ghost accounts) tolerated everywhere.additional_dimensions,max_comments: 0, injection-shaped values.apply-planfalls back (reply → new inline comment → summary observations) instead of silently dropping a finding.review-context.mdcaps inlined diffs (20k/file, 250k total) with pointers to on-disk per-file diffs.Validation
gh/yq/git(config sanitization, resolve-thread marker/dispute/null-author matrix, null-user pre-compute, context truncation, exit codes).apply-planfunctionally tested against the extracted CLI with a mockedgh: id validation, fallback demotion, marker skipping, no_issues cleanup, template summary, exit codes.resolveReviewThread(documented in README), and the old lifecycle agent posted a literal-\nsummary once (now impossible — the summary is code-rendered).🤖 Generated with Claude Code