Add opt-in agent self-approval gate#166
Conversation
|
@sepo-agent /orchestrate there's a merge conflict to main after merging #162 can you check and fix |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| claude_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} | ||
| permission_mode: approve-all | ||
| prompt: agent-self-approve | ||
| route: agent-self-approve |
There was a problem hiding this comment.
run.ts validates each run-agent-task envelope against VALID_ROUTES, but this PR adds the prompt mapping without adding agent-self-approve to .agent/src/envelope.ts. With this route, the self-approval workflow will fail envelope validation before the prompt runs. Please add the route to the allowlist and cover it in validateEnvelope tests.
| github_token: ${{ steps.auth.outputs.token }} | ||
| openai_api_key: ${{ secrets.OPENAI_API_KEY }} | ||
| claude_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} | ||
| permission_mode: approve-all |
There was a problem hiding this comment.
The approval decision is supposed to be enforced only by deterministic post-processing, but the inspection agent runs with permission_mode: approve-all while receiving a write-capable GitHub token. That gives a prompt-injected or mistaken agent path to call gh and approve/comment before resolve-self-approve.js checks the verdict and head SHA. Please run this step with read-only tool permissions or otherwise withhold write-capable GitHub auth until the deterministic approval step.
| }; | ||
| } | ||
|
|
||
| if (inspectedHeadSha && inspectedHeadSha !== expectedHeadSha) { |
There was a problem hiding this comment.
inspected_head_sha is optional in practice here because a blank value skips this check and an APPROVE verdict can still set shouldApprove: true. The prompt makes that field part of the structured contract, and the safety guarantee depends on proving which head the agent inspected. Please reject blank inspected head SHA for approvals and add a regression test.
| TARGET_NUMBER: ${{ inputs.pr_number }} | ||
| run: node .agent/dist/cli/resolve-self-approve.js | ||
|
|
||
| - name: Post self-approval status |
There was a problem hiding this comment.
If resolve-self-approve.js fails while submitting the approval, this status step is skipped because the condition lacks always(), so the PR gets no visible stop or failure comment. Please make the status/artifact path run on result-step failure, or catch submission errors and write a failed status body from the resolver.
| name: Agent / Self Approve | ||
|
|
||
| on: | ||
| workflow_dispatch: |
There was a problem hiding this comment.
Because this workflow is directly workflow_dispatch-able, an enabled repo can run self-approval on any open PR without proving it came from the intended review -> SHIP handoff. Please have preflight require trusted orchestration context or deterministically verify the latest trusted review/rubrics signal before approval.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| id: prepare | ||
| env: | ||
| AGENT_ALLOW_SELF_APPROVE: ${{ vars.AGENT_ALLOW_SELF_APPROVE || 'false' }} | ||
| GH_TOKEN: ${{ github.token }} |
There was a problem hiding this comment.
This preflight uses github.token to determine the trusted actor, but review synthesis and rubrics comments are posted with the resolved App/PAT token in agent-review.yml. In the normal Sepo App setup this compares github-actions[bot] to sepo-agent-app, so self-approval always blocks. Resolve GitHub auth before prepare and use that token for provenance reads, while still passing only the read-only token to the agent step.
| }; | ||
| } | ||
|
|
||
| if (latest.kind === "review synthesis" && latest.conclusion === "ship") { |
There was a problem hiding this comment.
This trusts a SHIP comment without proving it applies to expectedHeadSha. Since issue comments are not commit-scoped, a stale SHIP from an older PR head can authorize approval of new commits. Please pass or persist the reviewed head SHA from the review synthesis/handoff and require it to match the current expected head before approval. A rubrics PASS should not be sufficient by itself to replace a current SHIP review synthesis.
| run: node .agent/dist/cli/post-response.js | ||
|
|
||
| - uses: actions/upload-artifact@v4 | ||
| if: steps.prepare.outputs.should_run == 'true' |
There was a problem hiding this comment.
This artifact upload is also skipped when resolve-self-approve.js fails because Actions applies an implicit success() to steps without always(). That drops the response and session artifacts exactly when they are needed to debug approval API rejection or resolver failure. Please run this on failure too, e.g. always() && steps.prepare.outputs.should_run == 'true'.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| let reviewedHeadSha = ""; | ||
| if (target === "pr" && repo && targetNumber > 0) { | ||
| try { | ||
| reviewedHeadSha = fetchPrMeta(targetNumber, repo).headOid; |
There was a problem hiding this comment.
This fetches the PR head when the synthesis comment is posted, not when the reviewer lanes actually inspected the diff. If the PR head changes during review or synthesis, a stale SHIP synthesis can be stamped with the new head SHA and later authorize self-approval of unreviewed commits. Please capture the reviewed head before the review run, pass that captured SHA through the review/synthesis path, and avoid emitting a trusted current-head marker if the PR changed during the run.
| body: string, | ||
| ): "created" | "updated" { | ||
| const existing = fetchIssueCommentRecords(prNumber, repo) | ||
| .filter((comment) => comment.id && comment.body.includes(marker)) |
There was a problem hiding this comment.
Please filter marker matches to comments authored by the authenticated Sepo actor before updating. As written, any PR commenter can include <!-- sepo-agent-self-approval -->, and the next self-approval status run will try to PATCH that comment instead of creating or updating Sepo’s own status comment.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sepo pushed fixes for this PR. Branch:
|
|
Sepo is dispatching follow-up automation.
Reason: agent planner selected review: fix-pr succeeded on PR #166 and pushed targeted fixes for the latest review synthesis findings; the PR is open, CLEAN, and checks are passing, so one more review should validate the new head before stopping. |
Rubrics Review
Notes
Findings
Final Rubric VerdictPASS |
AI Review Synthesis
Summary of PR/Issue
ReviewCodex marked the PR
Progress
Issue DetailsMerge conflict with main after #162Cause: GitHub reports PR #166 as Candidate solutions: Rebase the PR branch onto Comments: Codex treated this as blocking; Claude called it mechanical but still identified it as unresolved. `agent-self-approve` is not a valid runtime routeCause: Candidate solutions: Add Agent inspection step can mutate GitHub before deterministic checksCause: Candidate solutions: Run the agent inspection step with read-only permissions and reserve the write-capable token for Missing `inspected_head_sha` still allows approvalCause: Candidate solutions: Reject blank Comments: Claude considered this lower risk because current-head still matches expected-head; Codex treated it as blocking because it weakens the stated inspected-head contract. Self-approval can bypass the review-to-SHIP handoffCause: Candidate solutions: Require trusted orchestrator context, or deterministically verify a latest trusted Failure handling can hide or misreport approval-state errorsCause: Candidate solutions: Surface metadata read failures explicitly, catch approval submission failures into a failed status body, and make status/artifact steps run on resolver failure with focused tests. Self-approval status comments are not idempotentCause: Candidate solutions: Reuse or update the existing Comments: Already covered clearly in the posted rubrics review. Recommended Next Step
Final Verdict
Action Items
|
|
@sepo-agent /answer I think this PR becomes quite complicated compared to the original version /vision. Can you summarize what's changed in this process -- what are the new concerns addressed here? |
|
Yes. The PR grew because “agent can approve a PR” is a privileged write path, so the implementation had to defend several trust boundaries. Original The new concerns added during the PR process were:
So the short version: the original idea was “add a conservative final approval gate.” The added complexity is mostly about making sure that gate cannot approve the wrong commit, trust stale review output, use the wrong actor token, let the inspecting agent mutate GitHub directly, silently fail, or overwrite untrusted comments. |
|
@sepo-agent /answer Ok let's address the issues one by one
For this, I worry the requirement might be too strict. This might lead to infinite loops? Alternatively the approval agent can decide to start a new implementation with slightly adjusted decisions? But this may pose a bigger issue for how to change the architecture... |
|
I think your concern is valid, but I’d separate two cases:
So I would not loosen the approval condition, but I would make the orchestration state machine more expressive. A good model would be:
That avoids making the self-approval agent start implementation directly. I’d avoid giving self-approval a free-form “start new implementation with adjusted decisions” power, because then the final approval gate becomes a second orchestrator with approval authority. That mixes two roles that should stay separate:
For loop risk: yes,
So my revised stance: strict current-head |
|
@sepo-agent /answer
What is this? |
|
That item is about making self-approval failures visible on the PR, instead of only showing up as a failed GitHub Actions log. Concrete examples:
Without explicit handling, the workflow could fail before posting anything useful. Then the PR would just have a failed Actions run, and the user would need to dig through logs to understand why no approval happened. So the PR added two behaviors:
This is not central to the product idea. It is operational hardening: if self-approval does not approve, the PR should say why, and the workflow should preserve enough evidence to debug it. |
@sepo-agent /answer ok yeah that's a good addition. Can you finally summarize the system architecture and then we should jointly decide if we should (1) start a new PR since there's a lot of patches on the initial version or (2) split this change into smaller PR and then having orchestrator to run them individually? |
|
System architecture as it stands:
On PR strategy: I would choose option 2: split into smaller stacked PRs, not start over from scratch. Starting a new PR gives a cleaner conversation, but it does not reduce the review surface unless we also split the code. The current patch has become a bundle of several separable concerns, and this is a security/trust-boundary feature, so smaller review units are worth it. Suggested split:
I would not ask the orchestrator to independently invent all four. Better: use the current PR as the reference implementation, then have orchestrator/fix-pr produce stacked PRs with explicit ownership for each slice. That preserves the work already done while making review much easier. |
@sepo-agent /answer Can you create a new issue detailing this? |
Summary
agent-self-approveas an internal orchestrator-dispatched PR approval gate gated byAGENT_ALLOW_SELF_APPROVE.Verification
npm run buildnode --test dist/__tests__/self-approval.test.js dist/__tests__/handoff.test.js dist/__tests__/triage.test.js dist/__tests__/orchestrator-preflight-cli.test.js dist/__tests__/orchestrate-handoff-cli.test.jsagent-self-approve.ymlandagent-orchestrator.ymlgit diff --checkCloses #165