Skip to content

Promote dev to main: redis-back review-dispatch dedup#1249

Merged
zbigniewsobiecki merged 1 commit intomainfrom
dev
May 1, 2026
Merged

Promote dev to main: redis-back review-dispatch dedup#1249
zbigniewsobiecki merged 1 commit intomainfrom
dev

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Promotes 1 commit from `dev`:

What

Move the review-dispatch dedup state from a per-process in-memory `Map` to Redis so the dedup holds across all processes (router, IMPL workers running post-completion-hook, and any future router replicas).

Closes the silent-duplicate-dispatch bug where ucho/PR #194 (CI fully green, single SHA) had two reviews run 29 s apart from the same dedup key — the IMPL worker process and the router process each saw an empty in-memory Map and both claimed.

Reviewed and approved on #1248 by @nhopeatall. Full CI green on dev (3/3 workflows expected, mirroring prior promotions).

Risk

Low. Pure dedup-state migration to Redis (`SET NX EX` atomic primitive, fail-closed on Redis errors with Sentry capture). Tests (7678/7678) all pass; cross-process invariant pinned by direct-instance test.

🤖 Generated with Claude Code

…s gap (#1248)

ucho/PR #194 (2026-05-01, CI fully green, single SHA) had two review runs
dispatched 29 s apart — both `triggerType=ci-success`, same engine/model/
workItemId, both completed normally, both burned LLM tokens and posted
reviews. Confirmed verbatim in Loki: identical `reviewDispatchKey`
(`zbigniewsobiecki/ucho:194:9ed484df...`) appeared in claim logs from TWO
processes — one from `agent-execution.ts:279` post-completion-hook running
in the IMPL worker container, one from `check-suite-success.ts:240` running
in the cascade-router. Both `claimReviewDispatch(key, ...)` returned `true`
because the dedup `recentlyDispatched` Map at `review-dispatch-dedup.ts:5`
was module-scoped, in-memory, per-process state. Two processes = two
independent Maps = no dedup. Pure waste.

PR #1246 explicitly called this out as out-of-scope ("With the architectural
shift above, the in-memory dedup is sufficient. Skip."). That was wrong:
the architectural shift in #1246 closes the worker-bail-out path but does
nothing about the post-completion-hook path running in a different process.
The bug class is broader than ucho/PR #194 — anywhere two processes might
independently call `claimReviewDispatch` for the same (project, PR, SHA) is
exposed: post-completion-hook ↔ check-suite-success (live now), review-
requested ↔ post-completion-hook (waiting to bite), and any future
horizontally-scaled router replicas.

Architectural fix — move the dedup state to Redis so it's shared across all
processes. Atomic primitive: `SET key value NX EX <ttl>` returns `'OK'`
exactly once per key within the TTL, regardless of how many processes race
it. No application-level locking, no race window.

`src/triggers/github/review-dispatch-dedup.ts` — rewritten:
- Drops the in-memory `Map<string, number>` and `cleanupExpiredEntries`.
- Lazy IORedis singleton (mirrors `src/queue/cancel.ts:28-37`).
- `claimReviewDispatch` / `releaseReviewDispatch` are now async.
- Keys namespaced under `cascade:review-dedup:`.
- Fail-closed on Redis errors: `claim` returns `false` and Sentry-captures
  under tag `review_dedup_redis_down`. Better to skip a legit dispatch than
  duplicate. Mirrors spec-017 fail-closed pipeline-capacity-gate posture.
- 5-min TTL preserved from #1246. The pre-Redis "30-min TTL clears stale
  entries on router restart" CLAUDE.md note is now obsolete.
- Test-only `__resetForTests()` flushes the namespace.

Callers — minimal mechanical await additions:
- `check-suite-success.ts:240` — `await claimReviewDispatch(...)`.
- `check-suite-success.ts:274` — `onBlocked: () => { void releaseReviewDispatch(...) }`
  (callback signature is sync; fire-and-forget the release).
- `review-requested.ts:101-102` — both calls become `await`.
- `review-requested.ts:131` — same fire-and-forget shape.
- `agent-execution.ts:279` (post-completion-hook) — `await`.

Tests:
- `tests/unit/triggers/github/review-dispatch-dedup.test.ts` rewritten
  around a vi.mock('ioredis', ...) factory whose closure captures a single
  in-memory store. Every `new Redis(...)` instance shares that backend, so
  the cross-process invariant is trivially testable: instantiate two
  IORedis clients and verify the second `SET NX EX` for the same key
  returns `null`. **This is the regression pin for ucho/PR #194.**
- New "fails closed when Redis errors" test covers the Sentry-capture
  path under tag `review_dedup_redis_down`.
- `check-suite-success.test.ts` and `review-requested.test.ts`: replaced
  the per-test `recentlyDispatched.clear()` with `vi.mock` of the dedup
  module. Tests that assert the dedup-skip path now use
  `mockClaimReviewDispatch.mockResolvedValueOnce(true).mockResolvedValueOnce(false)`
  to simulate the SET NX EX race.

Verification:
- npx vitest run --project unit-triggers --project unit-api --project unit-core
  → 7678/7678 pass.
- npm run typecheck + npm run lint clean.
- Direct-instance test demonstrates two IORedis clients against the shared
  backend correctly reject the second claim — the production scenario
  pinned in code.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

❌ Patch coverage is 82.35294% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/triggers/github/review-dispatch-dedup.ts 84.48% 9 Missing ⚠️
src/triggers/github/review-requested.ts 40.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@zbigniewsobiecki zbigniewsobiecki merged commit 47ed959 into main May 1, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant