-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Expected Behavior
Under high concurrency, we expected go-redis v9.17 to behave similarly to v9.16 in terms of request stability, connection pooling, and timeout behavior—unless a documented breaking or behavioral change was introduced.
Both versions should ideally show consistent performance once warmed up, with no significant increase in timeout frequency simply due to the library upgrade.
Current Behavior
When running controlled high-load experiments, we consistently observed that v9.17 produces significantly more timeouts than v9.16, even though the environment, Redis instance, traffic profile, and application code remained identical across tests.
Some highlights:
- v9.17 showed recurring context deadline exceeded errors during sustained high RPS (up to ~8000).
- v9.16 only showed a small burst of timeouts during ramp-up, which then dropped to nearly zero.
- Repeated experiments showed stable behavior with v9.16, but sustained timeout activity with v9.17.
- After simulating password failures and then redeploying with the correct password, v9.16 returned to healthy behavior quickly, while v9.17 continued showing higher timeout rates.
Possible Solution
We’re not sure yet whether this is due to a regression, an unintended side effect, or an expected change in v9.17.
Since the biggest difference appears to relate to connection pooling behavior under high concurrency, we’re wondering whether any recent changes might:
- Increase queueing or blocking on connection creation
- Serialize connection acquisition
- Slow down pool warm-up
- Propagate timeouts differently than before
We’d appreciate any insight into whether this is expected or something worth digging into.
Steps to Reproduce
Here is the high-level setup we used:
- Create a small HTTP service that performs one Redis GET and one SET on each request using go-redis.
- Deploy it using v9.17.
- Generate load using Locust (or similar) ramping to ~8000 RPS.
- Observe Redis client errors, latencies, and timeouts.
- Repeat the exact same test with v9.16.
Observed outcome:
v9.17 shows noticeably more timeouts under the same load.
v9.16 stabilizes quickly and remains almost error-free.
Context (Environment)
go-redis versions tested: v9.16.0 and v9.17.0
- Redis: hosted Redis instance (same instance used for all runs)
- Load generator: Locust, peak ~8000 RPS
- Service workload: simple GET + SET per request
- All experiments reused:
- Same infrastructure
- Same Redis cluster
- Same traffic pattern
- Same code except the library version
We’re trying to understand whether the v9.17 upgrade could contribute to increased timeouts at high concurrency
Detailed Description
Because connection pool behavior changed between these versions, we’re wondering whether something in the new pooling or queueing logic could cause increased blocking or longer waits under load.
We’d be grateful for any guidance on:
- Whether this is expected behavior
- Whether additional tuning is recommended for v9.17
- Whether this may indicate a regression
Possible Implementation
Not proposing a specific fix yet, but a few ideas:
- Review recent pooling/queueing changes for potential bottlenecks under high concurrency
- Provide guidance on optimal pool settings for users upgrading from v9.16
- If applicable, adjust pool behavior to avoid serialized or slow connection creation under load