Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redisson Client not ignoring defunct nodes in Redis Cluster #5754

Closed
Yazwa opened this issue Apr 4, 2024 · 13 comments
Closed

Redisson Client not ignoring defunct nodes in Redis Cluster #5754

Yazwa opened this issue Apr 4, 2024 · 13 comments
Labels
Milestone

Comments

@Yazwa
Copy link

Yazwa commented Apr 4, 2024

As per ElastiCache best practices : https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/BestPractices.Clients-lettuce.html , we are suggested to use nodeFilter which will ignore the failed/defunct nodes while connecting to Redis Cluster as per below.

f```
inal ClusterClientOptions clusterClientOptions =
ClusterClientOptions.builder()
... // other options
.nodeFilter(it ->
! (it.is(RedisClusterNode.NodeFlag.FAIL)
|| it.is(RedisClusterNode.NodeFlag.EVENTUAL_FAIL)
|| it.is(RedisClusterNode.NodeFlag.HANDSHAKE)
|| it.is(RedisClusterNode.NodeFlag.NOADDR)))
.validateClusterNodeMembership(false)
.build();
redisClusterClient.setOptions(clusterClientOptions);


Wanted to understand if there is any similar config in RedissonClient as we faced an issue when our client tried to connect to failed/defunct nodes which impacted our availability.

Redisson Version we are using : 3.23.x
@Yazwa Yazwa changed the title Does Redisson Client has option to ignore defunct nodes from Redis Cluster. Redisson Client not ignoring defunct nodes in Redis Cluster Apr 4, 2024
@mrniko
Copy link
Member

mrniko commented Apr 4, 2024

EVENTUAL_FAIL state handling is missed. I'll add it.

@mrniko mrniko added this to the 3.27.3 milestone Apr 4, 2024
@mrniko mrniko added the bug label Apr 4, 2024
@mrniko
Copy link
Member

mrniko commented Apr 4, 2024

Can you share Redisson logs?

@Yazwa
Copy link
Author

Yazwa commented Apr 4, 2024

Are the other states like FAIL , HANDSHAKE handled?

@mrniko
Copy link
Member

mrniko commented Apr 4, 2024

yes

@mrniko mrniko removed the bug label Apr 4, 2024
@mrniko
Copy link
Member

mrniko commented Apr 4, 2024

I need the logs to make sure that the issue caused by state

@Yazwa
Copy link
Author

Yazwa commented Apr 4, 2024

{ "timestamp": "2024-04-02T16:29:59,997", "logger": "org.redisson.connection.ServiceManager", "level": "ERROR", "threadID": "104", "threadName": "redisson-netty-2-16", "message": "Unable to resolve rediss://prri0mzh6nsauw6-0015-002.prri0mzh6nsauw6.wcbpsb.euw1.cache.amazonaws.com:6379", "exception": " io.netty.resolver.dns.DnsResolveContext$SearchDomainUnknownHostException: Failed to resolve 'prri0mzh6nsauw6-0015-002.prri0mzh6nsauw6.wcbpsb.euw1.cache.amazonaws.com' [A(1)] and search domain query for configured domains failed as well: [eu-west-1.compute.internal]\n\tat io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:1097)\n\tat io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:1044)\n\tat io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:432)\n\tat io.netty.resolver.dns.DnsResolveContext.onResponse(DnsResolveContext.java:662)\n\tat io.netty.resolver.dns.DnsResolveContext.access$500(DnsResolveContext.java:66)\n\tat io.netty.resolver.dns.DnsResolveContext$2.operationComplete(DnsResolveContext.java:489)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)\n\tat io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)\n\tat io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)\n\tat io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)\n\tat io.netty.resolver.dns.DnsQueryContext.trySuccess(DnsQueryContext.java:317)\n\tat io.netty.resolver.dns.DnsQueryContext.finishSuccess(DnsQueryContext.java:309)\n\tat io.netty.resolver.dns.DnsNameResolver$DnsResponseHandler.channelRead(DnsNameResolver.java:1400)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:97)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n" }

@Yazwa
Copy link
Author

Yazwa commented Apr 4, 2024

Can you also help us how to add configuration for the other failed states?

@mrniko
Copy link
Member

mrniko commented Apr 4, 2024

it's related to DNS and not node status. Here is the solution: #5726 (comment)

@Yazwa
Copy link
Author

Yazwa commented Apr 4, 2024

Redis team suggested to use the node filter which will ignore failed nodes. And mentioned abive can happen because of node status. Can you help us with the config for it? Because we are afraid that the above config change can cause some other performance issues on our end.

@Yazwa
Copy link
Author

Yazwa commented Apr 4, 2024

We also have logs with message : Unable to parse cluster nodes state got from: 10.0.150.252/10.0.150.252:6379

@asafpamzn
Copy link

Our current theory is that the parsePartitions function failed since the failed nodes does not exist in the DNS server. Is there a way not to resolve the address of failed nodes?
See
https://github.com/redisson/redisson/blob/92ac6c5a1709f5dedd11227033ed42b8a342f0b0/redisson/src/main/java/org/redisson/cluster/ClusterConnectionManager.java#L868C68-L868C96

@mrniko
Copy link
Member

mrniko commented Apr 6, 2024

Unable to parse cluster nodes state got from

Can you share the fullstacktrace?

@mrniko mrniko added the bug label Apr 6, 2024
mrniko pushed a commit that referenced this issue Apr 9, 2024
@mrniko
Copy link
Member

mrniko commented Apr 9, 2024

Fixed. Thanks for report

@mrniko mrniko closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants