fix(triggers): dispatch respond-to-ci on mixed-state SHA from check-suite-success#1241
Merged
zbigniewsobiecki merged 1 commit intodevfrom Apr 30, 2026
Merged
Conversation
…uite-success GitHub fires `check_suite.completed` once per workflow. When workflow A's suite fails fast (e.g. the E2B template-rebuild on ucho/PR#176 — 38s) and workflow B's suite is still running, the failure handler correctly defers with "not all complete yet". When workflow B's suite eventually completes with `conclusion=success`, only the success handler fires — and it unconditionally dispatches `review`. That review is silently skipped at worker time (`pollWaitForChecks` sees `allPassing=false`), and no later event with `conclusion=failure` ever wakes the failure handler back up. Net: `respond-to-ci` is permanently lost for that SHA. Fix: in the success handler, after the author + base gates, query the aggregate `getCheckSuiteStatus` and fork. When `allComplete && anyFailed`, dispatch `respond-to-ci` (gated by its own trigger config + attempt limit) instead of `review`. Otherwise, current behavior — dispatch review with `waitForChecks: true` so the worker polls if checks are still in progress. Single-source the dispatch envelope: extract `dispatchRespondToCi` plus `fixAttempts` / `MAX_ATTEMPTS` / `resetFixAttempts` into a new shared module `respond-to-ci-dispatch.ts`. Both handlers converge on it. The failure handler keeps its early `gateTriggerEnabled` so disabled projects don't burn GitHub API calls; the helper re-checks the same gate so the success-handler fork is also guarded. Tests: 5 new TDD cases on the success handler — failure conclusion, timed_out conclusion, in-progress checks (defers to worker), all-passing (review), and the "already-reviewed at HEAD with CI failure" case (CI failure trumps prior approval). Adjusted the existing pre-existing "getCheckSuiteStatus not called" assertions to match the new flow: removed where the call now happens (post-base-gate), kept where the gate short-circuits before it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This was referenced Apr 30, 2026
zbigniewsobiecki
added a commit
that referenced
this pull request
May 2, 2026
ucho/PR #231 had its CI fully green at the latest attempt but cascade dispatched respond-to-ci anyway, with triggerType=check-failure. Wasteful agent run; respond-to-ci is the agent that fixes failing CI, so running it on a green PR confuses the loop semantics. Root cause: each CI workflow on the PR's head_sha ran TWICE on the same SHA (push-then-rerun, or two pushes that resolved to the same SHA). The first attempt of `Rebuild ucho-cli template` FAILED at 19:44:40; the second attempt SUCCEEDED at 19:45:28. GitHub's `listWorkflowRunsForRepo` returns BOTH workflow_runs. cascade's `getCheckSuiteStatus` iterated both and concatenated their jobs — including the stale FAILURE record. `check-suite-success.handle` (the fork from PR #1241/#1243) computes `anyFailed = checkRuns.some(cr => cr.conclusion === 'failure' || ...)`. With the stale failure in the list, `anyFailed=true` even though the PR is green at the latest attempt. The handler mistakenly forks to `dispatchRespondToCi(...)`. The triggerType=check-failure in the run record confirms it came through this fork. GitHub's `listJobsForWorkflowRun` accepts `filter='latest'` (default) which dedupes job ATTEMPTS within a single workflow_run (the "Re-run failed jobs" case). It does NOT dedupe across multiple workflow_runs of the same workflow on the same SHA — which is what bit us. Fix: dedupe `workflowRuns` by `workflow_id` BEFORE fetching jobs. GitHub returns runs sorted by `created_at` desc, so the first occurrence per workflow_id is the latest. Three-line addition in `getCheckSuiteStatus` at the GitHub-client layer — every caller (`check-suite-success`, `check-suite-failure`, anywhere else that asks "current state of CI?") benefits without changing. Why at the client layer: - Source of truth match: GitHub's PR UI uses the latest attempt's status; cascade should match that. - Centralization: the same dedup bug would otherwise surface separately per caller. Closing it once at the boundary eliminates the class. Edge cases handled: - Same `workflow_id` across different events (push vs pull_request) — keeps the most recent regardless of event. - Different `workflow_id`s (e.g. CI + CodeQL) — both kept; new test pins this so over-aggressive dedup can't ship. - No workflow runs — Map yields empty list, downstream code already handles `checkRuns.length === 0`. Tests: - New `dedupes workflow runs by workflow_id, keeping only the latest re-run` test directly pins the PR #231 incident: 2 workflow_runs same workflow_id, first failed second succeeded — assert allPassing=true and only 1 check_run returned. Also asserts `listJobsForWorkflowRun` is NOT called for the older run (saves API quota and proves the dedup actually skipped the stale jobs). - New `keeps separate workflow_ids distinct (CI vs CodeQL on same SHA)` guard against over-aggressive dedup. - Updated `mockWorkflowRuns` helper to default `workflow_id` to the run id so existing tests (which don't care about workflow_id) keep semantically matching. - Updated two pagination tests to include `workflow_id` so the dedup doesn't collapse multiple-runs-with-undefined-workflow_id to one. Out of scope (follow-up): - Audit `getFailedWorkflowRunJobs` (used by respond-to-ci agent) — different semantics (it WANTS to show what failed even if subsequently succeeded). - Eventually-consistent GitHub API — the workflow-runs-list endpoint may take a few seconds to reflect a new attempt. Orthogonal to dedup. Verification: vitest 7687/7687, typecheck clean, lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
When a check_suite completes with
conclusion=successbut another suite on the same SHA failed earlier, dispatchrespond-to-ci(instead ofreview) so the CI failure actually gets fixed.Why — the live incident on ucho/PR#176 (2026-04-30)
GitHub fires
check_suite.completedonce per workflow. OnuchoPR #176 the timeline was:E2B Template Rebuildworkflow finishes in ~38s withconclusion=failure. The webhook hitscheck-suite-failurewhich correctly skips:Not all checks complete yet (6/7 still running).CIworkflow finishes withconclusion=success. The webhook hitscheck-suite-successwhich dispatchesreview.pollWaitForChecks, seesallPassing=false(E2B is failing), and silently skips the agent.conclusion=failureever fires — GitHub already delivered that one. Socheck-suite-failurenever re-evaluates.Net:
respond-to-ciis permanently lost on that SHA, the user has to manually nudge or push a fix commit.Root cause
check-suite-failureis event-keyed onconclusion=failure, but the failure event arrives before aggregate state is "all complete." It correctly defers, but the only event that would have re-fired it is gone.The system needed any check_suite completion to be able to make the right dispatch decision based on aggregate state — not just the polarity of the event that triggered the matcher.
The fix
check-suite-successalready needed to know aggregate state to decide whether to setwaitForChecks: true. Pull that aggregate query forward and fork on it:review(current behavior)respond-to-ci(new — closes the gap)reviewwithwaitForChecks: trueThe fork sits after the author + base gates and before the "already-reviewed at HEAD" check — because a SHA that's both already-reviewed and CI-failing still needs the CI fixed (review approval doesn't make tests pass).
Single-sourced dispatch
Both handlers now converge on
dispatchRespondToCiin the newsrc/triggers/github/respond-to-ci-dispatch.ts:fixAttemptsmap,MAX_ATTEMPTS = 3, attempt-limit gate, warning-comment side effect, and the dispatch result envelope.CheckSuiteFailureTriggerkeeps its earlygateTriggerEnabled(so disabled projects don't burn GitHub API calls), then delegates the post-aggregate dispatch to the helper.CheckSuiteSuccessTriggercalls the helper inside the new fork — gated by the helper's owngateTriggerEnabledso a project withrespond-to-cidisabled won't dispatch.This is the same shape the spec-017 PM-ack consolidation enforces (see
CLAUDE.md"PM-ack dispatch coverage invariant"): one helper, two call sites, no parallel-path drift.Tests
5 new TDD cases on the success handler:
respond-to-ciwhen aggregate has any failure on the SHArespond-to-ciontimed_outconclusionreviewwhen aggregate is all-complete and all-passingreviewwithwaitForCheckswhen not all checks complete yetrespond-to-cieven when SHA was already reviewed (CI failure still needs fixing)Adjusted 3 pre-existing assertions for the new flow:
PR not authored by implementer persona,no personaIdentities) keepgetCheckSuiteStatus.not.toHaveBeenCalled()— gates short-circuit before the aggregate query.already reviewed at HEAD) drop that assertion — the aggregate query now happens before that check.getCheckSuiteStatusIS called.Test plan
npx vitest run --project unit-triggers— 968/968 passnpx vitest run --project unit-api— 1526/1526 pass (router/webhook coverage)npx vitest run --project unit-core— 5174/5174 pass (conformance)npm run typecheck— cleannpm run lint— no new warnings on changed files🤖 Generated with Claude Code