feat: re-register repeatable jobs on Redis reconnect by genisd · Pull Request #37 · gynzy/optio

genisd · 2026-06-25T09:51:17Z

Problem

Repeatable BullMQ jobs — the poll ticks for pr-watcher, repo-cleanup (health-check + stall-check), ticket-sync, workflow-trigger-checker, token-validation, and reconcile-resync — were registered only at API boot (inside each startXxxWorker()).

A Redis restart/flush (e.g. a Redis pod OOMKill) wipes the repeat scheduler keys. Nothing re-adds them, so the pollers silently stop until the API itself is restarted. Notably this halts the schedule-trigger poll loop, so timed Tasks stop firing.

(The user-defined schedules themselves live in Postgres workflow_triggers.nextFireAt and are unaffected — only the BullMQ tick that reads them is lost.)

Change

services/repeatable-jobs.ts — REPEATABLE_JOBS, a single source of truth for every repeat job (same OPTIO_* env vars + defaults as before), and ensureRepeatJobs(): idempotent queue.add(..., {repeat}) per entry, in-flight guard, never throws.
services/repeat-job-monitor.ts — a dedicated ioredis connection. The first ready (boot) is ignored; every later ready (reconnect) debounces 2s then calls ensureRepeatJobs(). Wired into index.ts startup and graceful shutdown.
Workers — dropped the inline queue.add(...repeat...) registrations so boot and reconnect share the same definitions (no drift). index.ts now does cleanRepeatJobs → ensureRepeatJobs() → start workers → start monitor.

Scope note

Recovery triggers on reconnect only (the actual OOMKill/restart failure mode — the TCP socket drops and ioredis re-fires ready). A logical FLUSHALL/eviction that leaves the socket up emits no ready and is intentionally out of scope.

Tests

repeatable-jobs.test.ts — registers exactly the REPEATABLE_JOBS entries with correct repeat options, closes each queue, in-flight guard shares one pass across concurrent calls.
repeat-job-monitor.test.ts — first ready ignored, reconnect re-registers after debounce, burst of reconnects debounced into one, stop quits the connection.

Verification

pnpm turbo typecheck ✓
pnpm turbo test ✓ (2022 passing, incl. 7 new)
prettier --check ✓ / eslint 0 errors

Repeatable BullMQ jobs (poll ticks for pr-watcher, repo-cleanup, ticket-sync, workflow-trigger-checker, token-validation, reconcile-resync) were registered only at API boot. A Redis restart/flush wipes the repeat schedulers, so the pollers silently stop until the API is restarted. Add a single source of truth (REPEATABLE_JOBS) plus idempotent ensureRepeatJobs(), and a dedicated monitor connection that re-registers them on every reconnect (debounced). Workers no longer self-register inline, so boot and reconnect share the same definitions.

gynzy-virko temporarily deployed to production June 25, 2026 10:03 Inactive

genisd merged commit e50b34b into gynzy Jun 29, 2026
16 checks passed

gynzy-virko deployed to production June 29, 2026 10:02 Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: re-register repeatable jobs on Redis reconnect#37

feat: re-register repeatable jobs on Redis reconnect#37
genisd merged 1 commit into
gynzyfrom
feat/re-register-crons-on-redis-restart

genisd commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

genisd commented Jun 25, 2026

Problem

Change

Scope note

Tests

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants