IORedis Cluster mode redirections: Not following MOVED redirections correctly? #1582

ligreman · 2022-05-20T10:28:06Z

UPDATE: Not an ioredis problem, but help would be very appreciated. Go to the 3rd comment.

Hello. I have a Redis Cluster (6 nodes) and I'm using ioredis to connect to them. I use the last version of ioredis.

The configuration has this information:

StartUp Nodes: from localhost:63791 to localhost:63796
NatMap: because there is a proxy in my localhost that redirects the requests to "10.20.1.X", I have the natmap configured like:
- localhost:63791 ---> 10.20.1.220:6379
- localhost:63792 ---> 10.20.1.221:6379
- etc
maxRedirections: 16
scaleReads: all
enableAutoPipelining: true

All seems to be working fine, it connects flawlessly to the cluster, the commands are sent and all works. But from time to time I saw errors "Error: Too many Cluster redirections". It seems that they are random errors, maybe when I send multiple commmands in a short time space its more likely to rise the error but sometimes It just rises with the first command.

I activated the debug in ioredis and saw that it detects the "MOVED" response from Redis, traces the correct host with the natmap correctly, but then ioredis tries to send the command again to a random node, not the node advised in the "MOVED" response. So, sometimes it just selects the correct node and the command is OK, but other times it selects another random node and receives another MOVED response... In summary if it doesn't select the correct node in less than the 16 maxRedirections, it fails.

Something like this:

  ioredis:redis write command[127.0.0.1:63796]: 0 -> zrangebyscore([ 'my_key:{hashtag}:one', '1', '+inf' ]) +10ms
  
  ioredis:cluster command zrangebyscore is moved to 10.20.1.221:6379 +17ms

  ioredis:cluster NAT mapping 10.20.1.221:6379 -> { host: 'localhost', port: 63792 } +1ms	

  ioredis:cluster refreshing slot caches... (triggered by MOVED error) +2ms

  ioredis:cluster cluster slots result count: 3 +0ms

  ioredis:cluster NAT mapping 10.20.1.220:6379 -> { host: 'localhost', port: 63791 } +2ms

  ioredis:cluster NAT mapping 10.20.1.224:6379 -> { host: 'localhost', port: 63795 } +0ms

  ioredis:cluster cluster slots result [0]: slots 0~5460 served by [ 'localhost:63791', 'localhost:63795' ] +1ms

  ioredis:cluster NAT mapping 10.20.1.221:6379 -> { host: 'localhost', port: 63792 } +3ms

  ioredis:cluster NAT mapping 10.20.1.225:6379 -> { host: 'localhost', port: 63796 } +3ms

  ioredis:cluster cluster slots result [1]: slots 5461~10922 served by [ 'localhost:63792', 'localhost:63796' ] +1ms

  ioredis:cluster NAT mapping 10.20.1.223:6379 -> { host: 'localhost', port: 63794 } +0ms

  ioredis:cluster NAT mapping 10.20.1.222:6379 -> { host: 'localhost', port: 63793 } +1ms

  ioredis:cluster cluster slots result [2]: slots 10923~16383 served by [ 'localhost:63794', 'localhost:63793' ] +1ms

  ioredis:cluster:connectionPool Reset with [

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63791, readOnly: false },

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63795, readOnly: true },

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63792, readOnly: false },

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63796, readOnly: true },

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63794, readOnly: false },

  ioredis:cluster:connectionPool   { host: 'localhost', port: 63793, readOnly: true }

  ioredis:cluster:connectionPool ] +66ms

  ioredis:redis write command[127.0.0.1:63796]: 0 -> zrangebyscore([ 'my_key:{hashtag}:one', '1', '+inf' ]) +32ms

As you can see, it sends the command to "127.0.0.1:63796", then receives the MOVED pointing to "10.20.1.221:6379", it searches that in the natmap correctly "NAT mapping 10.20.1.221:6379 -> { host: 'localhost', port: 63792 }", but then sends the command again to "127.0.0.1:63796".

Note that I have also tried to use 127.0.0.1 instead of localhost, but got the same result.

So, Am I doing something wrong? Or maybe there is a bug in ioredis and it detects the redirection but does not follow it correctly? Does the hashtag I use in the "key" has something to do here (I do not think so)?

I attach the complete log. ioredis.log.txt Note that in this log I anonymized the data. The real "key" used in the commands uses hashtags like "my_key:{hashtag}:one":

Thank you!

The text was updated successfully, but these errors were encountered:

ligreman · 2022-05-20T11:11:08Z

I've got more info very interesting.

If I configure the ioredis connection property "scaleReads" as "master", it WORKS always. If I set it to "slave" it FAILS always. And in "all" it works sometimes as stated before.

So, I see that my "key" is assigned to a slot stored in:

{ host: 'localhost', port: 63792 } -> master
{ host: 'localhost', port: 63796 } -> slave

IORedis, when scaleReads=master, sends the command to 63792 correctly, and it works.

When scaleReads=slave, it sends the command to 63796, and its the correct slave, but the response I get is a MOVED instruction. And the it enters in a loop and finally fails.

When scaleReads=all, if it sends the command to the slave it fails, and if by any chance it sends it to the master it works.

So, the problem is, why my Redis answers with a MOVED response when I send the command to the slave that holds the slot of "mykey"? It is the correct slave that is supposed to have that key.

My slave nodes are configured as "read only", maybe the command "zrangebyscore" is not considered a read command, so the slave cannot process it and it answers with a "MOVED" to the master? I have also checked that with "get" commands I dont have any problem.

Anyone can bring light to my dark problem? Now I doubt that is an ioredis bug anymore...

Thanks for your help.

ligreman · 2022-05-20T11:26:17Z

Confirmed, if I send the command directly in the Redis cli console, I get the MOVED response... so it is not an ioredis problem.

I don't understand, if I have the MasterA node, and its replica SlaveA node. They are synchronized (in the logs the sync process is OK). So I suppose that if you send a "GET mykey" command to the SlaveA, it would answer with the value of that key, not with a MOVED response pointing to the master... Am I wrong understanding how Redis works?

Anyway, I would be very grateful If someone guides me in why Redis is doing that.

ligreman · 2022-05-20T17:28:21Z

Finally I have reached the bottom of the issue.

It seems that sending read commands to the slaves does not work if you don't send a "READONLY" command first, to configurate the connection. If I connect to the cluster and send "READONLY" to the slave nodes, then all works nicely.

So, the only think that I'm not sure is if ioredis is supposed to work like that. I mean, if I configure IORedis with "scaleReads" all or slave, shouldn't ioredis send the READONLY command by itself to the slaves? If you don't send it, it fails sending when the commands.

I think that I'll close this issue and open a new one with just this last question, to keep it clean.

Thanks.

NurimOnsemiro · 2023-06-08T02:24:43Z

I ran into the same problem. I have configured 2 shards and 4 nodes using AWS ElastiCache, and ioredis keeps calling the same domain address, resulting in a series of MOVED errors.

2023-06-08T01:47:25.920Z ioredis:cluster command fcall is moved to redacted-0002-001.redacted.uusa8q.apn2.cache.amazonaws.com:6379
2023-06-08T01:47:25.920Z ioredis:redis write command[10.1.2.174:6379]: 0 -> fcall([ 'AtcsMessagePop', '3', 'atcs_message_queue_id:{9998:default:2}', 'redacted:{9998:default:2}:current_request_count', 'redacted:{9998:default:2}:max_rps', '190' ])
2023-06-08T01:47:25.920Z ioredis:cluster refreshing slot caches... (triggered by MOVED error)
2023-06-08T01:47:25.922Z ioredis:cluster command fcall is moved to redacted-0002-001.redacted.uusa8q.apn2.cache.amazonaws.com:6379
2023-06-08T01:47:25.922Z ioredis:redis write command[10.1.2.174:6379]: 0 -> fcall([ 'AtcsMessagePop', '3', 'atcs_message_queue_id:{9998:default:2}', 'redacted:{9998:default:2}:current_request_count', 'redacted:{9998:default:2}:max_rps', '190' ])
2023-06-08T01:47:25.922Z ioredis:cluster refreshing slot caches... (triggered by MOVED error)
2023-06-08T01:47:25.924Z ioredis:cluster command fcall is moved to redacted-0002-001.redacted.uusa8q.apn2.cache.amazonaws.com:6379
2023-06-08T01:47:25.924Z ioredis:redis write command[10.1.2.174:6379]: 0 -> fcall([ 'AtcsMessagePop', '3', 'atcs_message_queue_id:{9998:default:2}', 'redacted:{9998:default:2}:current_request_count', 'redacted:{9998:default:2}:max_rps', '190' ])
2023-06-08T01:47:25.924Z ioredis:cluster refreshing slot caches... (triggered by MOVED error)
2023-06-08T01:47:25.928Z ioredis:cluster command fcall is moved to redacted-0002-001.redacted.uusa8q.apn2.cache.amazonaws.com:6379
2023-06-08T01:47:25.928Z ioredis:redis write command[10.1.2.174:6379]: 0 -> fcall([ 'AtcsMessagePop', '3', 'atcs_message_queue_id:{9998:default:2}', 'redacted:{9998:default:2}:current_request_count', 'redacted:{9998:default:2}:max_rps', '190' ])
2023-06-08T01:47:25.928Z ioredis:cluster refreshing slot caches... (triggered by MOVED error)
2023-06-08T01:47:25.929Z ioredis:cluster command fcall is moved to redacted-0002-001.redacted.uusa8q.apn2.cache.amazonaws.com:6379
2023-06-08T01:47:25.930Z ioredis:redis write command[10.1.2.174:6379]: 0 -> fcall([ 'AtcsMessagePop', '3', 'atcs_message_queue_id:{9998:default:2}', 'redacted:{9998:default:2}:current_request_count', 'redacted:{9998:default:2}:max_rps', '190' ])
2023-06-08T01:47:25.930Z ioredis:cluster refreshing slot caches... (triggered by MOVED error)

Fernandomr88 · 2023-06-23T23:13:38Z

having the same issue here

ligreman closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2022

ligreman mentioned this issue May 20, 2022

Redis Cluster slave nodes and scaleReads, possible bug managing the MOVED response? #1583

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IORedis Cluster mode redirections: Not following MOVED redirections correctly? #1582

IORedis Cluster mode redirections: Not following MOVED redirections correctly? #1582

ligreman commented May 20, 2022 •

edited

ligreman commented May 20, 2022

ligreman commented May 20, 2022

ligreman commented May 20, 2022

NurimOnsemiro commented Jun 8, 2023

Fernandomr88 commented Jun 23, 2023

IORedis Cluster mode redirections: Not following MOVED redirections correctly? #1582

IORedis Cluster mode redirections: Not following MOVED redirections correctly? #1582

Comments

ligreman commented May 20, 2022 • edited

ligreman commented May 20, 2022

ligreman commented May 20, 2022

ligreman commented May 20, 2022

NurimOnsemiro commented Jun 8, 2023

Fernandomr88 commented Jun 23, 2023

ligreman commented May 20, 2022 •

edited