Skip to content

fix(redis): drop cached client and restart PING loop after forced reconnect#4501

Merged
TheodoreSpeaks merged 1 commit intostagingfrom
fix/redis-client-cache-after-ping-failure
May 7, 2026
Merged

fix(redis): drop cached client and restart PING loop after forced reconnect#4501
TheodoreSpeaks merged 1 commit intostagingfrom
fix/redis-client-cache-after-ping-failure

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

Summary

  • After 2 consecutive PING failures, startPingHealthCheck was disconnecting the dead client but leaving globalRedisClient cached, so the next getRedisClient() returned the dead reference and every command sat in its offline queue until the 5s commandTimeout. Now the global is cleared and the PING interval is reset before disconnect, so the next call builds a fresh client with a fresh health check.
  • Added a 200ms deadline race to releaseDistributedLease in isolated-vm.ts (matching tryAcquireDistributedLease) so a hung Redis client doesn't make releases wait the full 5s.
  • Added two tests covering the cache-drop and PING-restart behavior.

Type of Change

  • Bug fix

Testing

  • bun run test lib/core/config/redis.test.ts (13/13)
  • bun run test lib/execution/isolated-vm.test.ts (9/9)
  • bunx tsc --noEmit -p tsconfig.json clean
  • bun run lint clean
  • bun run check:api-validation:strict passed

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 7, 2026 10:32pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 7, 2026

PR Summary

Medium Risk
Touches Redis connectivity and distributed lease bookkeeping; incorrect behavior could cause unnecessary reconnects or missed lease releases, impacting execution throttling and stability.

Overview
Fixes the forced-reconnect path in startPingHealthCheck to clear globalRedisClient and stop the existing PING interval before disconnecting, ensuring the next getRedisClient() creates a fresh client and health-check loop (avoiding offline-queued commands on a dead connection).

Adds a 200ms deadline to releaseDistributedLease in isolated-vm.ts (via Promise.race) so lease releases don’t block on a hung Redis client, and extends redis.test.ts with coverage for the cache-drop and PING-loop restart behavior.

Reviewed by Cursor Bugbot for commit 331d735. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR fixes a stale-client bug in the Redis reconnection path and adds a release timeout to releaseDistributedLease. When consecutive PING failures triggered a forced reconnect, the global client reference was left pointing at the dead connection; the fix clears it (and stops the health-check interval) before invoking reconnect listeners, so the next getRedisClient() call creates a fresh client with its own PING loop.

  • redis.ts: On MAX_PING_FAILURES, globalRedisClient and pingInterval are now nulled out before listeners run and before disconnect(true) is called, ensuring any listener that calls getRedisClient() sees a fresh client.
  • isolated-vm.ts: releaseDistributedLease gains the same 200 ms Promise.race deadline already used in tryAcquireDistributedLease, preventing a hung Redis client from blocking the release call for the full 5 s commandTimeout.
  • redis.test.ts: Two new test cases verify the cache-drop and PING-restart behaviour end-to-end.

Confidence Score: 5/5

Safe to merge — the fix correctly breaks the stale-client cycle and the release timeout is a faithful copy of the existing acquire pattern.

Both changed code paths are tightly scoped: the redis module clears its own state in the right order (global null → interval cleared → listeners → disconnect → pingInFlight reset in finally), and the isolated-vm change is a mechanical copy of an already-working pattern. New tests directly exercise the cache-drop and PING-restart scenarios that were missing before.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/core/config/redis.ts Core fix: nulls out globalRedisClient and clears pingInterval before disconnect so listeners and the next getRedisClient() always see a clean state; logic is correct and well-ordered.
apps/sim/lib/execution/isolated-vm.ts Adds 200 ms deadline race to releaseDistributedLease, matching the existing pattern in tryAcquireDistributedLease; deadlineTimer is always cleared in finally.
apps/sim/lib/core/config/redis.test.ts Two new tests exercise cache-drop and PING-restart; they reuse the shared mockRedisInstance, which is appropriate for this test structure.

Sequence Diagram

sequenceDiagram
    participant Interval as PING Interval
    participant HC as Health Check
    participant Global as Module State
    participant Listener as Reconnect Listeners
    participant OldClient as Old Redis Client
    participant NewClient as New Redis Client

    Interval->>HC: "tick (failure 2)"
    HC->>Global: "globalRedisClient = null"
    HC->>Global: "clearInterval / pingInterval = null"
    HC->>Listener: "invoke callbacks"
    Listener->>Global: "getRedisClient() called"
    Global->>NewClient: "new Redis() + startPingHealthCheck"
    HC->>OldClient: "disconnect(true)"
    HC->>Global: "pingInFlight = false (finally)"
    note over NewClient: "Fresh PING loop active against new client"
Loading

Reviews (1): Last reviewed commit: "fix(redis): drop cached client and resta..." | Re-trigger Greptile

@TheodoreSpeaks TheodoreSpeaks merged commit 00b5b74 into staging May 7, 2026
14 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the fix/redis-client-cache-after-ping-failure branch May 7, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant