Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Latencies for Redis Operation #2378

Closed
jakshay21 opened this issue Apr 6, 2023 · 4 comments
Closed

High Latencies for Redis Operation #2378

jakshay21 opened this issue Apr 6, 2023 · 4 comments
Labels
status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-feedback We need additional information before we can continue

Comments

@jakshay21
Copy link

jakshay21 commented Apr 6, 2023

Bug Report

There are high latencies for Redis Operations - EXPIRE for creating and fetching the session.

Current Behavior

Our project is based on Reactor, Kotlin Coroutines and Netty.
We're using Spring Session Data Redis with LettuceClient for Redis related operations.
Our APM tool DataDog show cases that while creating/fetching the session, EXPIRE ops have very high latencies ranging in minutes. Rest of the operations like HGETALL, DELETE, EXISTS and HMSET 99th percentile is within 100ms.

Our Redis Infra latencies looks good. So we suspect DataDog showing high latencies in spans could be due to thread waiting.

There is only one NIO thread created - lettuce-nioEventLoop-5-1

Redis Operations logs
[logLevel=DEBUG] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.RedisStateMachine -- Decode done, empty stack: true
[logLevel=INFO ] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1              com.apple.carrierservices.preauth.utils.LoggingCommandLatencyRecorder -- appServer=/AppServerIP, redisServer=redisServer, redisCommandType=EXPIRE, firstResponseLatency=277454292, completionLatency=277478875, TimeTaken=24583 nanoseconds
[logLevel=DEBUG] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Completing command LatencyMeteredCommand [type=EXPIRE, output=BooleanOutput [output=true, error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Received: 5 bytes, 6 commands in the stack
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Stack contains: 6 commands
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.RedisStateMachine -- Decode done, empty stack: true
[logLevel=INFO ] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              com.apple.carrierservices.preauth.utils.LoggingCommandLatencyRecorder -- appServer=/AppServerIP, redisServer=redisServer, redisCommandType=HMSET, firstResponseLatency=278740375, completionLatency=278768500, TimeTaken=28125 nanoseconds
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Completing command LatencyMeteredCommand [type=HMSET, output=StatusOutput [output=OK, error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Received: 4 bytes, 5 commands in the stack
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Stack contains: 5 commands

Session Flow

Any Session fetch is associated with below Redis Ops

HGETALL -> New Session -> DELETE -> save -> HMSET -> EXPIRE
HGETALL -> Existing Session -> Expired -> DELETE -> save -> HMSET -> EXPIRE
HGETALL -> Existing Session -> Non Expired -> EXISTS -> save -> HMSET -> EXPIRE

Expected behavior/code

The Redis operations latencies should be in admissible ranges, few Millis.

Environment

  • Redis Standalone, Version: 6.2.6
  • Lettuce version(s): 6.1.10.RELEASE
  • spring-session-data-redis:jar: 2.7.0
  • Redis version: 6.2.6
  • reactor-core:jar: 3.4.27
  • kotlinx-coroutines-reactor:jar: 1.6.4
  • reactor-netty-http:jar: 1.0.28
  • 1 AWS Instance used 3Cores, 10GB

Spring Redis config in use

spring.redis.lettuce.pool.enabled=true
spring.redis.lettuce.pool.max-active=16
spring.redis.lettuce.pool.min-idle=8
lettuce.computation.threads.count=32

The issue is intermittent. It has been observed under very low load as well.
The application recovers itself after some time and is able to serve high traffic usually.

@mp911de
Copy link
Collaborator

mp911de commented Apr 12, 2023

Using the reactive API with a single connection, the I/O event loop is being used as single thread for all subsequent operations (e.g. Jackson deserialization). You could try enabling ClientOptions.publishOnScheduler to check whether spreading load across multiple worker threads would help.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Apr 12, 2023
@jakshay21
Copy link
Author

We've had ClientOptions.publishOnScheduler enabled.

We're using Snappy for compression/decompression which could be counted as one of a heavy workload.
org.xerial.snappy:snappy-java:jar:1.1.9.1

One thing we noticed as part of connection pooling doc is we have an apace common-pool2 dependency which might not be relevant with Reactive APIs.
So we are trying to simulate our work load again with it's removal and using Lettuce inbuilt implementations.

@mp911de
Copy link
Collaborator

mp911de commented Apr 18, 2023

Please run your workloads through a profiler to identify what is causing the slowdowns. Alternatively, use multiple connections through the AsyncPool that provides non-blocking connection pooling.

Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 30 days this issue will be closed.

@github-actions github-actions bot added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-feedback We need additional information before we can continue
Projects
None yet
Development

No branches or pull requests

2 participants