How to avoid CLUSTERDOWN and UNBLOCKED? #28

thelinuxlich · 2015-05-06T22:09:33Z

Revisiting this problem, I asked on the redis google group about it and here it is what @antirez said:

Hello,

CLUSTERDOWN is a transient error that happens when at least one master
node is down. If you want the partial portion of the cluster which is
still up to run regardless of a set of hash slots not covered, there
is an option inside the example "redis.conf" file, doing exactly this.

UNBLOCKED is unavoidable since it is delivered to clients that are
blocked into lists that are moved from a different master because of
resharding. We don't want them to wait forever for something that will
never happen, since those lists are not moved into a different master.

So CLUSTERDOWN should be handled by the application and or at client
level directly by retrying. UNBLOCKED should be handled rescanning the
config with CLUSTER SLOTS and connecting to the right node.

Cheers,
Salvatore

Configuration-wise, I've set my cluster with "cluster-require-full-coverage" to "no" so I just need to know how to cover this on the application side

The text was updated successfully, but these errors were encountered:

luin · 2015-05-07T00:49:15Z

It seems that we can just resend the command when a CLUSTERDOWN error is received. Thank you for the information.

If you want to handle CLUSTERDOWN on the application side, you have to catch all errors returned from Redis and resend the command if the error is CLUSTERDOWN.

thelinuxlich · 2015-05-07T02:23:02Z

I'm also getting a lot of these:

ReplyError: EXECABORT Transaction discarded because of previous errors.

luin · 2015-05-07T02:25:18Z

You can get previous errors by error.previousErrors to see what they are. For instance:

redis.multi().set('foo').get('foo').exec().catch(function (err) {
  console.log(err.previousErrors);
});

thelinuxlich · 2015-05-07T02:26:19Z

And by the way, if the library has the autoResendUnfulfilledCommands option enabled by default, shouldn't it resend automatically after recovering from CLUSTERDOWN?

luin · 2015-05-07T02:36:08Z

autoResendUnfulfilledCommands is used to send unfulfilled commands after a reconnection. Since CLUSTERDOWN is caused by a master's being offline, we may use clusterRetryStrategy option to retry the node.

thelinuxlich · 2015-05-07T04:04:53Z

Now I have 3 masters and 9 slaves, let's keep this open for some days so I can see if those errors persist. I've also configured the cluster with "cluster-slave-validity-factor 0" and "cluster-migration-barrier 1"

thelinuxlich · 2015-05-07T04:38:12Z

good reference: http://redis.io/presentation/Redis_Cluster.pdf

thelinuxlich · 2015-05-07T04:54:54Z

even after increasing slaves and changing config to be more available, I'm receiving EXECABORT ocasionally with this previousErrors:

previousErrors:
 [ { [ReplyError: MOVED 1684 192.168.0.1:7000]
 name: 'ReplyError',
 message: 'MOVED 1684 192.168.0.1:7000',
command: [Object] },
{ [ReplyError: MOVED 1684 192.168.0.1:7000]
name: 'ReplyError',
 message: 'MOVED 1684 192.168.0.1:7000',
 command: [Object] } ] }

AVVS · 2015-05-07T05:00:04Z

@thelinuxlich I believe you are trying to perform multi-key operations on the cluster, therefore you get the following errors

If you need to that, you need to make sure they have the same hash, ie {somehash}your-key-name1, {somehash}your-key-name2 etc

thelinuxlich · 2015-05-07T05:02:03Z

You mean, using multi(), right?

thelinuxlich · 2015-05-07T05:03:49Z

My code has one transaction:

redis.multi().setnx(new_key, possible_new_session_id).expire(new_key, 1800).exec()

luin · 2015-05-07T05:04:45Z

Is there a resharding or failover happens after cluster has been initialized?

thelinuxlich · 2015-05-07T05:06:06Z

no, probably the multi() is not going to the right node

AVVS · 2015-05-07T05:06:57Z

In that case multi() is fine (same key -> new_key), but the error says that the hash, that was resolved to new_key is now on another machine in the cluster. These errors should really be handled by the library. What the error says is that, hey, your hash slot caching is wrong, it needs to be updated and the operation needs to be retried.

@luin, please take a look at this, as I believe it needs to be improved. IE, MOVED reply must be handled.

On May 6, 2015, at 10:03 PM, Alisson Cavalcante Agiani notifications@github.com wrote:

My code has one transaction:

redis.multi().setnx(new_key, possible_new_session_id).expire(new_key, 1800).exec()
—
Reply to this email directly or view it on GitHub #28 (comment).

thelinuxlich · 2015-05-07T05:08:23Z

By the way, I didn't have this problem with a similar library: https://github.com/thunks/thunk-redis

Their API forces you to set the key explicitly on the multi() and exec() methods

luin · 2015-05-07T05:09:15Z

Yes, you are right. ioredis should be able to handle these MOVED errors. I'll try to fix these errors tonight.

AVVS · 2015-05-07T05:12:07Z

@thelinuxlich thunk-redis takes hash from the first key, and then applies it to all the operations in the multi, but they don't make use of pipeline. Here the first key from pipeline is taken and then the hash is applied to the whole pipeline.

Beside that its pretty much the same, except that ioredis seems cleaner (and with a bug not handling MOVED responses 💨)

luin · 2015-05-07T05:12:41Z

@thelinuxlich As stated in the README, ioredis will use the first key in the pipeline queue to calculate the slot. So the problem here isn't ioredis use the wrong key, instead is ioredis doesn't handle MOVED errors properly in the transaction.

thelinuxlich · 2015-05-07T05:13:19Z

ok, maybe refreshAfterFails = 1 can solve the issue?

luin · 2015-05-07T05:15:56Z

@thelinuxlich No, it doesn't help since it's a bug of ioredis. I'll fix it soon :-)

thelinuxlich · 2015-05-07T17:23:13Z

Dont know if it is luck, but since setting refreshAfterFails to 1 no error happened

luin · 2015-05-07T17:36:38Z

@thelinuxlich That's interesting. However I'm implementing a more stable transaction strategy in cluster mode.

thelinuxlich · 2015-05-08T19:26:03Z

let's test it ;)

thelinuxlich · 2015-05-09T07:01:43Z

still getting MOVED errors :(

luin · 2015-05-09T07:04:17Z

@thelinuxlich Yes, this commit just fixed CLUSTERDOWN errors, and I'm still working on handling MOVED errors :-)

luin · 2015-05-14T17:14:47Z

There is lot of work to do to implement a stable transaction in cluster mode. However the job is getting done :-)
Pull request #33 should handle MOVED, ASK and CLUSTERDOWN error properly. I'm writing more tests for it and will ship it in the next version. Welcome to do some tests if you have time.

thelinuxlich · 2015-05-14T17:25:30Z

Great, every new release I'm always testing :)

luin · 2015-05-15T12:47:32Z

Released in 1.3.0

thelinuxlich · 2015-05-16T02:52:58Z

No errors so far...

thelinuxlich · 2015-05-17T20:41:51Z

Seems fixed!

luin added a commit that referenced this issue May 8, 2015

Handle CLUSTERDOWN error. Related #28

1cbe733

luin added a commit that referenced this issue May 14, 2015

Support pipeline redirecting in Cluster mode. #28

643c4a0

luin added a commit that referenced this issue May 15, 2015

Support pipeline redirecting in Cluster mode. #28

785df7b

luin added a commit that referenced this issue May 15, 2015

Support pipeline redirecting in Cluster mode. #28

deafcb3

thelinuxlich closed this as completed May 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to avoid CLUSTERDOWN and UNBLOCKED? #28

How to avoid CLUSTERDOWN and UNBLOCKED? #28

thelinuxlich commented May 6, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

thelinuxlich commented May 7, 2015

thelinuxlich commented May 7, 2015

AVVS commented May 7, 2015

thelinuxlich commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

AVVS commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

AVVS commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 8, 2015

thelinuxlich commented May 9, 2015

luin commented May 9, 2015

luin commented May 14, 2015

thelinuxlich commented May 14, 2015

luin commented May 15, 2015

thelinuxlich commented May 16, 2015

thelinuxlich commented May 17, 2015

How to avoid CLUSTERDOWN and UNBLOCKED? #28

How to avoid CLUSTERDOWN and UNBLOCKED? #28

Comments

thelinuxlich commented May 6, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

thelinuxlich commented May 7, 2015

thelinuxlich commented May 7, 2015

AVVS commented May 7, 2015

thelinuxlich commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

AVVS commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

AVVS commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 7, 2015

luin commented May 7, 2015

thelinuxlich commented May 8, 2015

thelinuxlich commented May 9, 2015

luin commented May 9, 2015

luin commented May 14, 2015

thelinuxlich commented May 14, 2015

luin commented May 15, 2015

thelinuxlich commented May 16, 2015

thelinuxlich commented May 17, 2015