Skip to content

go-redis v9.17 shows noticeably more timeouts than v9.16 under high concurrency (8000 RPS) in controlled experiments #3623

@schmittjoaopedro

Description

@schmittjoaopedro

Expected Behavior

Under high concurrency, we expected go-redis v9.17 to behave similarly to v9.16 in terms of request stability, connection pooling, and timeout behavior—unless a documented breaking or behavioral change was introduced.

Both versions should ideally show consistent performance once warmed up, with no significant increase in timeout frequency simply due to the library upgrade.

Current Behavior

When running controlled high-load experiments, we consistently observed that v9.17 produces significantly more timeouts than v9.16, even though the environment, Redis instance, traffic profile, and application code remained identical across tests.

Some highlights:

  • v9.17 showed recurring context deadline exceeded errors during sustained high RPS (up to ~8000).
  • v9.16 only showed a small burst of timeouts during ramp-up, which then dropped to nearly zero.
  • Repeated experiments showed stable behavior with v9.16, but sustained timeout activity with v9.17.
  • After simulating password failures and then redeploying with the correct password, v9.16 returned to healthy behavior quickly, while v9.17 continued showing higher timeout rates.

Possible Solution

We’re not sure yet whether this is due to a regression, an unintended side effect, or an expected change in v9.17.
Since the biggest difference appears to relate to connection pooling behavior under high concurrency, we’re wondering whether any recent changes might:

  • Increase queueing or blocking on connection creation
  • Serialize connection acquisition
  • Slow down pool warm-up
  • Propagate timeouts differently than before

We’d appreciate any insight into whether this is expected or something worth digging into.

Steps to Reproduce

Here is the high-level setup we used:

  1. Create a small HTTP service that performs one Redis GET and one SET on each request using go-redis.
  2. Deploy it using v9.17.
  3. Generate load using Locust (or similar) ramping to ~8000 RPS.
  4. Observe Redis client errors, latencies, and timeouts.
  5. Repeat the exact same test with v9.16.

Observed outcome:
v9.17 shows noticeably more timeouts under the same load.
v9.16 stabilizes quickly and remains almost error-free.

Context (Environment)

go-redis versions tested: v9.16.0 and v9.17.0

  • Redis: hosted Redis instance (same instance used for all runs)
  • Load generator: Locust, peak ~8000 RPS
  • Service workload: simple GET + SET per request
  • All experiments reused:
    • Same infrastructure
    • Same Redis cluster
    • Same traffic pattern
    • Same code except the library version

We’re trying to understand whether the v9.17 upgrade could contribute to increased timeouts at high concurrency

Detailed Description

Because connection pool behavior changed between these versions, we’re wondering whether something in the new pooling or queueing logic could cause increased blocking or longer waits under load.

We’d be grateful for any guidance on:

  • Whether this is expected behavior
  • Whether additional tuning is recommended for v9.17
  • Whether this may indicate a regression

Possible Implementation

Not proposing a specific fix yet, but a few ideas:

  • Review recent pooling/queueing changes for potential bottlenecks under high concurrency
  • Provide guidance on optimal pool settings for users upgrading from v9.16
  • If applicable, adjust pool behavior to avoid serialized or slow connection creation under load

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions