Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.redisson.client.RedisTimeoutException: Unable to acquire connection! RedissonPromise #4299

Closed
muchengyang opened this issue May 16, 2022 · 30 comments

Comments

@muchengyang
Copy link

Is your feature request related to a problem? Please describe.
version 3.16.1,see the exceptions:
org.redisson.client.RedisTimeoutException: Unable to acquire connection! RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@4bc618ba(failure: java.util.concurrent.CancellationException)]Increase connection pool size. Node source: NodeSource [slot=null, addr=null, redisClient=null, redirect=null, entry=MasterSlaveEntry [masterEntry=[freeSubscribeConnectionsAmount=0, freeSubscribeConnectionsCounter=value:49:queue:0, freeConnectionsAmount=22, freeConnectionsCounter=value:-2144903244:queue:26923, freezeReason=null, client=[addr=redis://], nodeType=MASTER, firstFail=0]]], command: null, params: null after 0 retryattempts
at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:187)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)

why freeConnectionsCounter is more then range of int?
reeConnectionsAmount=22, freeConnectionsCounter=value:-2144903244:queue:26923
The above errors can be repeated by local simulation disconnection test. Freeconnectionscounter is always equal to 0

The production environment doesn't know why freeconnectionscounter exceeds the integer range every time a problem occurs. Why is the semaphore range the maximum number of connections, but freeconnectionscounter can decrease at will? What causes freeconnectionscounter to exceed the integer range?

@muchengyang
Copy link
Author

The solution given at that time was never the final answer, and did not give the reasons for the above problems
https://github.com/redisson/redisson/wiki/16.-FAQ#q-what-is-the-cause-of-redistimeoutexception

@mrniko
Copy link
Member

mrniko commented May 16, 2022

Could you try 3.17.1 version?

@muchengyang
Copy link
Author

Do you mean the above problems exist in 3.16.1?

@mrniko
Copy link
Member

mrniko commented May 17, 2022

I'm unable to reproduce it with the latest version.

@muchengyang
Copy link
Author

image

3.16.1
image

@muchengyang
Copy link
Author

3.17.1 there is indeed a change

@mrniko
Copy link
Member

mrniko commented May 17, 2022

Yes, I improved AsyncSemaphore a bit.

@muchengyang
Copy link
Author

After upgrading to the latest version, the above problems are not found yet, but the following problems occasionally occur
image
Code block used:
RAtomicLong atomicLong = redissonClient.getAtomicLong(CacheUtil.getCommonKey(RedisCacheKeys.LOG_ID_AUTO_INCREMENT_KEY));
long incId = atomicLong.incrementAndGet();
QPS is about 2000. After analyzing the logs before and after the problem, no abnormal traffic is found

@otbe
Copy link

otbe commented Jun 3, 2022

Really dont want to hijack this issue but since upgrading to 3.17.0/3.17.3 we also have a lot of these errors in our log but also other things start to throw sporadically exceptions like this one for example:

org.redisson.client.RedisException: Unexpected exception while processing command
	at org.redisson.command.CommandAsyncService.convertException(CommandAsyncService.java:276)
	at org.redisson.command.CommandAsyncService.get(CommandAsyncService.java:115)
	at org.redisson.RedissonObject.get(RedissonObject.java:82)
	at org.redisson.RedissonLock.tryAcquire(RedissonLock.java:144)
	at org.redisson.RedissonLock.lock(RedissonLock.java:113)
	at org.redisson.RedissonLock.lock(RedissonLock.java:68)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:89)
	at myclass
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:634)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:624)
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:72)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
Caused by: java.lang.ClassCastException: null

or other class cast exceptions like this

Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Long (java.lang.String and java.lang.Long are in module java.base of loader 'bootstrap')
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.redisson.command.CommandBatchService.lambda$executeAsync$7(CommandBatchService.java:322)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.redisson.command.RedisCommonBatchExecutor.handleResult(RedisCommonBatchExecutor.java:162)
	at org.redisson.command.RedisExecutor.checkAttemptPromise(RedisExecutor.java:478)
	at org.redisson.command.RedisExecutor.lambda$execute$3(RedisExecutor.java:172)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.redisson.client.handler.CommandDecoder.decodeCommandBatch(CommandDecoder.java:318)
	at org.redisson.client.handler.CommandDecoder.decodeCommand(CommandDecoder.java:210)
	at org.redisson.client.handler.CommandDecoder.decode(CommandDecoder.java:137)
	at org.redisson.client.handler.CommandDecoder.decode(CommandDecoder.java:113)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
	at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:366)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

(both are RLock related)

Interesting is that it only happens on one node so it does not seem to be a redis cluster issue (we use ElastiCache). Sometimes it works again after a couple of minutes, sometimes it takes 30minutes for this single node to get operational again (almost all calls through redisson fail in this time).

I think we will try to downgrade to 3.16.7 and see how things evolve.

We use java 11 and redisson 3.17.0/3.17.3 (but the same version in all nodes at the same time).

@mrniko
Copy link
Member

mrniko commented Jun 3, 2022

@otbe

Could you set trace logging level for org.redisson package for that node and share output?

@mrniko
Copy link
Member

mrniko commented Jun 3, 2022

@ycy1991

Can you share your Redisson config?

@otbe
Copy link

otbe commented Jun 3, 2022

@mrniko
I'll configure one of our test systems to trace log org.redisson. It basically happens everywhere sporadically. I cant say for sure what triggers it except for its not a load issue. Will keep you posted if something happens over the weekend. Thanks!

Edit:
Another kind of exception during some RMap related calls:

org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 3 retry attempts, is non-idempotent command: false. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (HGETALL), params: ...

The only thing I noticed whenever this happens we also have a spike in JVM heap usage. Sometimes 100% more heap allocated during these incidents. Not all connections seem to be affected, some calls are going through without any problem. Maybe a dead/stale netty connection?

@mrniko
Copy link
Member

mrniko commented Jun 6, 2022

@otbe

Which Redisson objects do you use?

@muchengyang
Copy link
Author

@mrniko
Through the following pressure test, it is found that when the number of threads is 5000, which is the default parameter, no exception is found. This problem occurs at any time after the number of threads continues to increase. Adjust the timeout parameter for times. This problem does not occur anymore. However, the throughput is much different from using spring boot data redis, which is about 2000,I am now worried about whether adjusting the timeout will cause other problems
image

image
image

@mrniko
Copy link
Member

mrniko commented Jun 7, 2022

However, the throughput is much different from using spring boot data redis, which is about 2000

Which value is about 2000 ?

@otbe
Copy link

otbe commented Jun 7, 2022

@otbe

Which Redisson objects do you use?

RLock/RMap/RSet, nothing fancy :D

Yesterday something happened in our test stage with trace logging enabled.

These are the only related log I found for a simple RLock.tryAcquire

reply: +PONG , channel: [id: 0x0b482b89, L:/10.1.72.184:40220 - R:cachecluster-at-amazon.ng.0001.euc1.cache.amazonaws.com/10.1.77.36:6379], command: CommandsData [promise=java.util.concurrent.CompletableFuture@7217d73[Not completed, 1 dependents], commands=[CommandData [promise=java.util.concurrent.CompletableFuture@7d78e303[Not completed], command=(EVAL), params=[if (redis.call('exists', KEYS[1]) == 0) then redis.call('hincrby', KEYS[1], ARGV[2], 1); redis.call(..., 1, my_very_long_lock_name.., 30000, 4709e19d-d4b6-46d9-8ddd-2e32ef27f39e:85], codec=org.redisson.client.codec.LongCodec]]]

channel: [id: 0x0b482b89, L:/10.1.72.184:40220 - R:cachecluster-at-amazon.ng.0001.euc1.cache.amazonaws.com/10.1.77.36:6379] message: *6
$4
EVAL
$339
if (redis.call('exists', KEYS[1]) == 0) then redis.call('hincrby', KEYS[1], ARGV[2], 1); redis.call('pexpire', KEYS[1], ARGV[1]); return nil; end; if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then redis.call('hincrby', KEYS[1], ARGV[2], 1); redis.call('pexpire', KEYS[1], ARGV[1]); return nil; end; return redis.call('pttl', KEYS[1]);
$1
1
$118
my_very_long_lock_name
$5
30000
$39
4709e19d-d4b6-46d9-8ddd-2e32ef27f39e:85

at the end it results in this exception

Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Long (java.lang.String and java.lang.Long are in module java.base of loader 'bootstrap')
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.redisson.command.CommandBatchService.lambda$executeAsync$7(CommandBatchService.java:322)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.redisson.command.RedisCommonBatchExecutor.handleResult(RedisCommonBatchExecutor.java:162)
	at org.redisson.command.RedisExecutor.checkAttemptPromise(RedisExecutor.java:478)
	at org.redisson.command.RedisExecutor.lambda$execute$3(RedisExecutor.java:172)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.redisson.client.handler.CommandDecoder.decodeCommandBatch(CommandDecoder.java:318)
	at org.redisson.client.handler.CommandDecoder.decodeCommand(CommandDecoder.java:210)
	at org.redisson.client.handler.CommandDecoder.decode(CommandDecoder.java:137)
	at org.redisson.client.handler.CommandDecoder.decode(CommandDecoder.java:113)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
	at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:366)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

If it works the the response looks different from my point of view:

reply: :1 , channel: [id: 0xe78a95e4, L:/10.1.72.184:57452 - R:cachecluster-at-amazon.0001.euc1.cache.amazonaws.com/10.1.77.36:6379], command: CommandsData [promise=java.util.concurrent.CompletableFuture@4dfbf328[Not completed, 1 dependents], commands=[CommandData [promise=java.util.concurrent.CompletableFuture@667d8561[Not completed], command=(EVAL), params=[if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) then return nil;end; local counter = redis.call('h..., 2, my_very_long_lock_name, redisson_lock__channel:{my_very_long_lock_name, 0, 30000, 4709e19d-d4b6-46d9-8ddd-2e32ef27f39e:85], codec=org.redisson.client.codec.LongCodec]]]

Fun fact:
Our app triggered an OOM alarm and exactly at this point in time we can see this CPU/MEM utilisation.
Screenshot 2022-06-07 at 09 21 30

Screenshot 2022-06-07 at 09 21 43

CPU is fine again but memory consumption stays high after this happening.

3.16.7 seems to be stable for our staging systems so far 🤞

@mrniko
Copy link
Member

mrniko commented Jun 7, 2022

@otbe

Thank you for provided details. Did you get a dump at OOM moment?

@mrniko
Copy link
Member

mrniko commented Jun 7, 2022

@otbe

Can you also share Redisson settings set at that moment?

@otbe
Copy link

otbe commented Jun 7, 2022

@otbe

Thank you for provided details. Did you get a dump at OOM moment?

Unfortunately not. Something on my long list of things to do :D

@otbe

Can you also share Redisson settings set at that moment?

default spring boot starter config except for

Screenshot 2022-06-07 at 12 37 45

(without this we had issue with thread locals e.g. from Spring Data...)

@mrniko
Copy link
Member

mrniko commented Jun 7, 2022

(without this we had issue with thread locals e.g. from Spring Data...)

Can you clarify here?

@otbe
Copy link

otbe commented Jun 7, 2022

(without this we had issue with thread locals e.g. from Spring Data...)

Can you clarify here?

Sorry I mixed things up here a little bit. At some point we introduced parallelStream() in different places which lead to exceptions like this while deserialising our own classes from some RMaps for example:

Caused by: java.lang.ClassNotFoundException: my.package.MyClass
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:398)
	at org.jboss.marshalling.AbstractClassResolver.loadClass(AbstractClassResolver.java:129)
	at org.jboss.marshalling.AbstractClassResolver.resolveClass(AbstractClassResolver.java:110)
	at org.jboss.marshalling.river.RiverUnmarshaller.doReadClassDescriptor(RiverUnmarshaller.java:1048)
	at org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject(RiverUnmarshaller.java:1381)
	at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:298)
	at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:231)
	at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41)
	at org.redisson.codec.MarshallingCodec.lambda$new$0(MarshallingCodec.java:146)
	... 23 common frames omitted

I think this is an issue with the underlying tomcat in SB if you use the standard pool for parallelStream().

@mrniko mrniko added this to the 3.17.4 milestone Jun 7, 2022
@mrniko
Copy link
Member

mrniko commented Jun 7, 2022

@otbe

3.16.7 seems to be stable for our staging systems so far

Can you try 3.16.8 version as well?

@otbe
Copy link

otbe commented Jun 7, 2022

@mrniko
Yes Ill deploy it soon but it might take a while to verify if it works (basically waiting for no error :D)

@mrniko
Copy link
Member

mrniko commented Jun 10, 2022

@otbe

I think I have fixed the issue.

Please try version attached.

redisson-3.17.4-SNAPSHOT.jar.zip

@otbe
Copy link

otbe commented Jun 10, 2022

@mrniko
No issue so far with 3.16.8. Seems to be stable for us.

Could you publish this snapshot to a maven repository? Quite hard to get this into our applications otherwise

@otbe
Copy link

otbe commented Jun 10, 2022

Just fyi:
3.16.7 ( the version we run in production right now) seems to be also affected somehow.

Exceptions this time:

Unable to send PING command over channel: [id: 0x095a8536, L:/10.1.9.25:56856 - R:my-cluster..euc1.cache.amazonaws.com/10.1.12.209:6379]org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://my-cluster.euc1.cache.amazonaws.com:6379]
Caused by: org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (3000 ms) occured after 0 retry attempts. Increase nettyThreads and/or timeout settings. Try to define pingConnectionInterval setting. Command: (SET), params: [my_name., UnpooledHeapByteBuf(ridx: 0, widx: 16, cap: 16/16)], channel: [id: 0x095a8536, L:/10.1.9.25:56856 - R:my-cluster.euc1.cache.amazonaws.com/10.1.12.209:6379]
Caused by: org.redisson.client.RedisTimeoutException: Command still hasn't been written into connection! Try to increase nettyThreads setting. Payload size in bytes: 0. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@1762614537 [redisClient=[addr=redis://my-cluster.euc1.cache.amazonaws.com:6379], channel=[id: 0xa04990e8, L:/10.1.9.25:56854 - R:my-cluster.euc1.cache.amazonaws.com/10.1.12.209:6379], currentCommand=null, usage=1], command: (HGETALL), params: [my_name] after 3 retry attempts
Caused by: java.lang.IllegalMonitorStateException: attempt to unlock lock, not locked by current thread by node id: f4699fae-225c-464c-b67b-a083bddfb136 thread-id: 78

@mrniko
Copy link
Member

mrniko commented Jun 10, 2022

@otbe

I'll apply changes and make a release.

Exceptions this time

It seems a connection issue.

@mrniko
Copy link
Member

mrniko commented Jun 16, 2022

@otbe

3.17.4 version has been released. Please try it

@mrniko mrniko modified the milestones: 3.17.4, 3.17.5 Jun 16, 2022
@otbe
Copy link

otbe commented Jun 24, 2022

@mrniko

we tested 3.17.4 for several days and it seems to be fine for us.
The root cause of the weird behaviour was indeed something connection related. To be more precise its a limitation of ElastiCache that we were not aware of. We have some bigger maps/map entries (around 18mb) and if you use very small instances (like we do in test systems) theres an "Bandwith Allowance" and if you exceed it it will drop TCP packets basically which lead to all these strange errors. After reducing the map size / switching instance sizes we cannot reproduce the errors anymore.

@mrniko mrniko modified the milestones: 3.17.5, 3.17.4 Jun 24, 2022
@mrniko
Copy link
Member

mrniko commented Jun 24, 2022

Great. Thanks for feedback. I'm closing it.

@mrniko mrniko closed this as completed Jun 24, 2022
@mrniko mrniko removed this from the 3.17.4 milestone Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants