Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request org.redisson.client.RedisTimeoutException during high load #1402

Closed
k3vinwan opened this issue Apr 20, 2018 · 50 comments
Closed

Request org.redisson.client.RedisTimeoutException during high load #1402

k3vinwan opened this issue Apr 20, 2018 · 50 comments

Comments

@k3vinwan
Copy link

Hello,

I'm using redisson v3.6.5 with Redis 4. I have 3 masters and 3 slaves nodes for 5 tomcat clusters. However during high load, my team experienced a lot of org.redisson.client.RedisTimeoutException exceptions such as below:

20-Apr-2018 11:21:16.641 SEVERE [http-nio-18080-exec-1790] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [jsp] in context with path [] threw exception [org.redisson.client.RedisTimeoutException: Unable to send command! Node source: NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=org.redisson.connection.MasterSlaveEntry@6a5e3f8], connection: [id: 0x8ccbc494, L:/184.170.243.238:38841 - R:/104.200.148.138:6380], command: (HGET), command params: [redisson_tomcat_session:7AB9E70459E2041FED447330E28EF802, PooledUnsafeDirectByteBuf(ridx: 0, widx: 12, cap: 256)] after 3 retry attempts] with root cause
org.redisson.client.RedisTimeoutException: Unable to send command! Node source: NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=org.redisson.connection.MasterSlaveEntry@6a5e3f8], connection: [id: 0x8ccbc494, L:/184.170.243.238:38841 - R:/104.200.148.138:6380], command: (HGET), command params: [redisson_tomcat_session:7AB9E70459E2041FED447330E28EF802, PooledUnsafeDirectByteBuf(ridx: 0, widx: 12, cap: 256)] after 3 retry attempts
at org.redisson.command.CommandAsyncService$8.run(CommandAsyncService.java:544)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:668)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:743)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:471)
at java.lang.Thread.run(Thread.java:748)

org.redisson.client.RedisTimeoutException: Unable to send command! Node source: NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=org.redisson.connection.MasterSlaveEntry@6a5e3f8], connection: [id: 0x354aeaed, L:/184.170.243.238:57085 - R:/104.200.148.138:6379], command: (HSET), command params: [redisson_tomcat_session:D5C32289A6DD5483FBEABA4A98C5B32A, PooledUnsafeDirectByteBuf(ridx: 0, widx: 12, cap: 256), PooledUnsafeDirectByteBuf(ridx: 0, widx: 178, cap: 256)] after 3 retry attempts
at org.redisson.command.CommandAsyncService$8.run(CommandAsyncService.java:544)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:668)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:743)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:471)
at java.lang.Thread.run(Thread.java:748)

We have increased the timeout param but there seemed to make no difference. Are the parameters mentioned in https://github.com/redisson/redisson/wiki/16.-FAQ#q-i-saw-a-redistimeoutexception-what-does-it-mean-what-shall-i-do-can-redisson-team-fix-it the only variables to tune?
Could you please point me to the direction, as I'm lost with these issues here?

Thank you.
Kevin

@luryson
Copy link

luryson commented Apr 20, 2018

same situation

@k3vinwan
Copy link
Author

Hello, can we get some advice on this please?

@mrniko
Copy link
Member

mrniko commented Apr 23, 2018

Can network be a bottleneck here?

@k3vinwan
Copy link
Author

Hi Nikita,

Thanks for your response. Unfortunately it's not caused by the network.
This issue took a toll on our system today after we provisioned 6 new nodes. There were too many errors of this kind and we had to rollback to tomcat sticky session.

Any help would be greatly appreciated.
Thanks.

@mrniko
Copy link
Member

mrniko commented Apr 23, 2018

Is there any other error preceded this error? Like it was in #1377

@k3vinwan
Copy link
Author

No, not on our side. It's just different command type with same error pattern:
org.redisson.client.RedisTimeoutException: Redis server response timeout (3000 ms) occured for command: (HGETALL) with params: [redisson_tomcat_session:76C583597C60A8EBF47939F5E506AFC2] channel: [id: 0x08412271, L:/199.192.79.34:40715 - R:/104.200.148.138:6382]
at org.redisson.command.CommandAsyncService$11.run(CommandAsyncService.java:704)

We are using org.redisson.codec.FstCodec for serialization (but I supposed it didn't matter?).

@anjia0532
Copy link
Contributor

+1

1 similar comment
@gmatsu-sc
Copy link

+1

@mrniko
Copy link
Member

mrniko commented May 2, 2018

@k3vinwan

It's just different command type with same error pattern:
org.redisson.client.RedisTimeoutException: Redis server response timeout

This error means that Redisson unable to get response from Redis in time

@jackygurui
Copy link
Member

@mrniko
Copy link
Member

mrniko commented May 2, 2018

@jackygurui I think they already did it that as mentioned in description.

@jackygurui
Copy link
Member

I think there is a bit of misunderstanding about the types of timeouts.

We need to clarify that RedisTimeoutException: Unable to send command seen in the description is different to RedisTimeoutException: Redis server response timeout seen later in this thread. The differences are explained in the chart embedded in the FAQ. A different strategy should have taken to solve each types timeout.

@hypercube1024
Copy link

+1
We also encounter the same exception. We use the redisson v3.6.5.

org.redisson.client.RedisTimeoutException: Unable to send command! 

@mrniko
Copy link
Member

mrniko commented May 7, 2018

@hypercube1024 what is the CPU load in your case?

@weixiang1862
Copy link

+1

@sosoeo
Copy link

sosoeo commented May 19, 2018

try reduce connectionMinimumIdleSize to small value ? especially on test environment, because the redission application will create at least (connectionMinimumIdleSize + 1) TCP connection at the same time when RedissonClient instance created. (I noticed this by wireshark)

@KavenSu
Copy link

KavenSu commented May 22, 2018

+1

@lying1234
Copy link

lying1234 commented May 24, 2018

+1
We got the same exception with redisson v3.6.3. We increased ConnectionPoolSize/retryAttempts/retryInterval, and we can see server side connection is way below the pool size we set on client side.

org.redisson.client.RedisTimeoutException: Unable to send command!

@anjia0532
Copy link
Contributor

@lying1234 use redis-benchmark on the redisson client server to connect redis server .
ref https://redis.io/topics/benchmarks

@lying1234
Copy link

@anjia0532 I tried redis-benchmark on client server to connect to redis server, it runs fast.

@anjia0532
Copy link
Contributor

anjia0532 commented May 25, 2018

@lying1234 paste your redisson config plz.

@lying1234
Copy link

Here it is:
{
"clusterServersConfig":{
"idleConnectionTimeout":10000,
"pingTimeout":1000,
"connectTimeout":10000,
"timeout":6000,
"retryAttempts":6,
"retryInterval":3000,
"failedSlaveReconnectionInterval":3000,
"failedSlaveCheckInterval":3,
"password":null,
"subscriptionsPerConnection":5,
"clientName":null,
"loadBalancer":{
"class":"org.redisson.connection.balancer.RoundRobinLoadBalancer"
},
"subscriptionConnectionMinimumIdleSize":1,
"subscriptionConnectionPoolSize":50,
"slaveConnectionMinimumIdleSize":10,
"slaveConnectionPoolSize":610,
"masterConnectionMinimumIdleSize":10,
"masterConnectionPoolSize":610,
"readMode":"SLAVE",
"subscriptionMode":"SLAVE",
"nodeAddresses":[
"redis://RedisRemoteServer:6379"
],
"scanInterval":1000
},
"threads":0,
"nettyThreads": 0,
"codec":!<org.redisson.codec.SerializationCodec> {},
"useLinuxNativeEpoll":false
}

@anjia0532
Copy link
Contributor

anjia0532 commented May 25, 2018

u config is clusterServerConfig, but node addr only one(and readMode is slave). is this RedisRemoteServer:6379 was lb? or redis server +port?

@anjia0532
Copy link
Contributor

anjia0532 commented May 25, 2018

@lying1234 paste your redis server config.
redis-cli -h(ost) -p(ort) -a(uth)

config get timeout
config get save
config get appendonly

@lying1234
Copy link

We have two environments: AWS and on-prem. For AWS, it will be configuration endpoint; on-prem, we list all master/salve redis servers actually.

From one of the server:
LQASRDSBRDB001V:0>config get timeout

  1. "timeout"
  2. "0"
    LQASRDSBRDB001V:0>config get save
  3. "save"
  4. ""
    LQASRDSBRDB001V:0>config get appendonly
  5. "appendonly"
  6. "no"

@anjia0532
Copy link
Contributor

anjia0532 commented May 28, 2018

@lying1234 first , exec config set timeout 60 (set redis server timeout 60s to release tcp connection when client pool leaks). note: CONFIG REWRITE will save this change to redis.conf.

exec INFO CLIENTS to redis-cli and paste output. exec netstat -nap|grep redis |wc -l to redis server and paste output.

maybe u should use Replicated mode for AWS redis server , rather than Cluster mode. both of aws and on-premises are timeout?

accidental or un-accidental?

the max value length (get showlog ref SLOWLOG subcommand [argument])?

the dbsize?

ulimit -a

redis server's io , cpu , mem when redisson throws timeout exception?

is the redis startup log print WARNING e.g. overcommit_memory The TCP backlog setting of 511 .... ?

@lying1234
Copy link

@anjia0532 Thanks for checking. The timeout issue was happened mostly on on-prem, we don't have many users on AWS now. When the timeout issue happened, we didn't see a lot of load on Redis server, connected_clients was lower, and never over 132 on each redis server, which make me to think this is not server side issue.

@drluorose
Copy link

the same ,+1
org.redisson.client.RedisTimeoutException: Unable to send command! Node source: NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=org.redisson.connection.MasterSlaveEntry@7c7b4927], connection: [id: 0x095f0dc7, L:/10.1.41.137:26712 - R:/10.1.104.11:6007], command: (HGETALL), command params: [redisson_spring_session:ec3ffff2-02c5-4b74-8681-382340ad4d0d] after 3 retry attempts
at org.redisson.command.CommandAsyncService$8.run(CommandAsyncService.java:544)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
at java.lang.Thread.run(Thread.java:748)

@anjia0532
Copy link
Contributor

@drluorose command:(HGETALL) ? exec HLEN key by redis-cli plz.

@drluorose
Copy link

hlen key = 4,not the key size or redis problem,is redisson network problem!I guess.

@drluorose
Copy link

change redisson to jedis.

@jinshubao
Copy link

jinshubao commented Jun 7, 2018

+1

compile("org.redisson:redisson:3.7.0")

Redis server response timeout (3000 ms) occured for command: (EVAL) with params: [if (redis.call('exists', KEYS[1]) == 0) then redis.call('publish', KEYS[2], ARGV[1]); return 1; end;..., 2, lock.reserve.1045, redisson_lock__channel:{lock.reserve.1045}, 0, 30000, cdf1c068-2f21-4003-b835-8a82bf22e1df:88] channel: [id: 0x4a5cfe9d, L:/172.16.16.89:55572 - R:172.16.16.95/172.16.16.95:6379]

java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1108) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:345) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745)

@jackygurui
Copy link
Member

jackygurui commented Jun 7, 2018

@drluorose Your issue is related to the network environment, as you might have guessed

@jackygurui
Copy link
Member

@jinshubao Your problem is totally different to others here. The problem you have described is caused by a java.io.IOException: Connection timed out. It seems the connections itself has been closed, somehow.

@jinshubao
Copy link

@jackygurui 问题解决了,连接池没有空闲连接导致的

@jackygurui
Copy link
Member

@jinshubao java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0 没有空闲连接是不会导致这个错误的。

@weixiang1862
Copy link

@jinshubao 问题解决了,使用的jackson的序列化方式,entity的包名改了,反序列化出现错误,紧接着就出现了很多超时。把redis清空就好了。

@lying1234
Copy link

We still have this RedisTimeoutException happened sporadically. It happened after we got decode error during deserialization for some reason. We already adjusted ConnectionPoolSize/retryAttempts/retryInterval, but looks it didn't help, and we didn't see a lot of connections from server side when the error thrown. It makes me think this may relate to redisson code since others have the similar issue. Also flush redis data is not an option for us.

@jackygurui
Copy link
Member

@lying1234 The timeout exception caused by decode error during deserialisation is inevitable, since the server response timer covers deserialisation process. These type errors are considered unrecoverable and they would cause results to be discarded and the waiting threads would never able to receive the results they are waiting for until the timer is up. Because those commands had already dispatched and executed, Redisson would not going to repeat them. Please consider the timeout exception happens under this circumstance as an indication of a programming error.

@mrniko
Copy link
Member

mrniko commented Jul 19, 2018

@lying1234

We still have this RedisTimeoutException happened sporadically. It happened after we got decode error during deserialization for some reason

Also please update to 3.6.5 or later. This release contains bug fix for your case. #1377

@lying1234
Copy link

I understand timeout exception happened on the request/response with deserialisation failure, but just wonder why different requests from different uses will get the same RedisTimeoutException at that time? Is it because they share the same connection pool?

Also, here is part of the error I got for decode error, not sure #1377 will handle this kind of error or not:
[redisson-netty-1-4] WARN io.netty.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.handler.codec.DecoderException: java.io.IOException: java.lang.reflect.InvocationTargetException
at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:422)
....22 more
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:247)
... 28 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor319.invoke(Unknown Source)
... 35 more
Caused by: java.io.StreamCorruptedException
at org.apache.ibatis.executor.loader.AbstractSerialStateHolder.readResolve(AbstractSerialStateHolder.java:99)
... 55 more
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2679)
... 55 more

@soockie
Copy link

soockie commented Sep 11, 2018

The same issue with v3.7.5

@jackygurui
Copy link
Member

@soockie Please open a new issue ticket and describe your case according to the template requirement. In most cases, your cause is different from those above.

@rishisha19
Copy link

We are using client v3.7.5 and server is v4 .
We are facing RedisTimeoutException a lot during the load while there are no retries for the request. I suppose this is an existing issue. Has someone figured out a way to made it working and can share the information. It would e much appreciated.

@mrniko
Copy link
Member

mrniko commented Oct 12, 2018

@rishisha19

Could you share the stack trace?

@rishisha19
Copy link

Here it is: i currently have this much.

exception Redis server response timeout (200 ms) occured for command: (EVAL) with params: [local value = redis.call('hget', KEYS[1], ARGV[2]); if value == false then return nil; end; local t,..., 5, mya.contracts, redisson__timeout__set:{mya.contracts}, redisson__idle__set:{mya.contracts}, redisson__map_cache__last_access__set:{mya.contracts}, {mya.contracts}:redisson_options, 1539290794706, PooledUnsafeDirectByteBuf(ridx: 0, widx: 114, cap: 256)] channel: [id: 0xadcc4c88, L:/10.110.1.219:45992 - R:/10.110.151.90:6379],org.redisson.client.RedisTimeoutException: Redis server response timeout (200 ms) occured for command: (EVAL) with params: [local value = redis.call('hget', KEYS[1], ARGV[2]); if value == false then return nil; end; local t,..., 5, mya.contracts, redisson__timeout__set:{mya.contracts}, redisson__idle__set:{mya.contracts}, redisson__map_cache__last_access__set:{mya.contracts}, {mya.contracts}:redisson_options, 1539290794706, PooledUnsafeDirectByteBuf(ridx: 0, widx: 114, cap: 256)] channel: [id: 0xadcc4c88, L:/10.110.1.219:45992 - R:/10.110.151.90:6379]

@satishcode
Copy link

+1 We too face the same issue. We tried various options on client configurations, and reduced keys. Sometimes this is random more frequent, and sometimes the issue happens 1% of times.

@jackygurui
Copy link
Member

@satishcode Can you create a separate issue ticket for your problem, please? The RedisTimeoutException is Redisson reporting to you that it has encountered an issue that's related to the environment it is in, so what you are seeing might not be the same as others.

@mrniko
Copy link
Member

mrniko commented Jan 21, 2019

@satishcode

I did some improvements in retry logic 1a08db7#diff-ed3e437a905b4c04c09be926b2d0c93dR886. So now command will be retried if response couldn't be get. Released in 2.15/3.10 version

@mrniko mrniko closed this as completed Jan 21, 2019
@zpczpczpc
Copy link

@jinshubao @jackygurui
请问java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0 这个问题是什么原因导致的?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests