Skip to content

fix: Handle Redis connection errors gracefully instead of panicking#111

Merged
sbernauer merged 4 commits intomainfrom
fix/redis-connection
Apr 20, 2026
Merged

fix: Handle Redis connection errors gracefully instead of panicking#111
sbernauer merged 4 commits intomainfrom
fix/redis-connection

Conversation

@sbernauer
Copy link
Copy Markdown
Member

Maybe fixes #109

When Redis goes down (e.g. master failover), a broken pipe error would cause get_queued_query_count to panic due to an .unwrap(), which then poisoned the RwLock used in metrics callbacks, cascading into further panics and leaving pods unresponsive until the liveness probe killed them.

  • Replace .unwrap() with proper error propagation in get_queued_query_count
  • Handle poisoned locks, closed channels, and panicked threads gracefully in both metrics callbacks

sbernauer and others added 4 commits April 17, 2026 15:57
When Redis goes down (e.g. master failover), a broken pipe error would
cause `get_queued_query_count` to panic due to an `.unwrap()`, which
then poisoned the RwLock used in metrics callbacks, cascading into
further panics and leaving pods unresponsive until the liveness probe
killed them.

- Replace `.unwrap()` with proper error propagation in
  `get_queued_query_count`
- Handle poisoned locks, closed channels, and panicked threads
  gracefully in both metrics callbacks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sbernauer sbernauer self-assigned this Apr 17, 2026
@sbernauer sbernauer moved this to Development: In Progress in Stackable Engineering Apr 17, 2026
@sbernauer sbernauer moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Apr 17, 2026
@NickLarsenNZ NickLarsenNZ self-requested a review April 17, 2026 14:57
Copy link
Copy Markdown
Member

@NickLarsenNZ NickLarsenNZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread trino-lb-persistence/src/redis/mod.rs
Comment thread trino-lb/src/metrics.rs
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Apr 20, 2026
@sbernauer
Copy link
Copy Markdown
Member Author

Kuttl tests passed 🚀

@sbernauer sbernauer added this pull request to the merge queue Apr 20, 2026
Merged via the queue into main with commit b1319ed Apr 20, 2026
9 checks passed
@sbernauer sbernauer deleted the fix/redis-connection branch April 20, 2026 07:06
@sbernauer sbernauer moved this from Development: In Review to Development: Done in Stackable Engineering Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development: Done

Development

Successfully merging this pull request may close these issues.

trino-lb is stuck after redis master shuts down

2 participants