Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Urgent!!!!!!] worker will report cross slot error when working with redis cluster #93

Closed
steven-zou opened this issue Apr 12, 2018 · 23 comments

Comments

@steven-zou
Copy link

Apr 11 22:53:03 172.18.0.1 jobservice[971]: message repeated 439 times: [ ERROR: worker.fetch - CROSSSLOT Keys in request don't hash to the same slot]
Apr 11 22:53:03 172.18.0.1 jobservice[971]: ERROR: requeuer.process - CROSSSLOT Keys in request don't hash to the same slot
Apr 11 22:53:03 172.18.0.1 jobservice[971]: ERROR: requeuer.process - CROSSSLOT Keys in request don't hash to the same slot
Apr 11 22:53:03 172.18.0.1 jobservice[971]: ERROR: worker.fetch - CROSSSLOT Keys in request don't hash to the same slot

https://stackoverflow.com/questions/38042629/redis-cross-slot-error

NOTES: jobservice is our component name.

@steven-zou steven-zou changed the title [Urgent!!!!!!] worker will report cross slot error when work with redis cluster [Urgent!!!!!!] worker will report cross slot error when working with redis cluster Apr 12, 2018
@steven-zou
Copy link
Author

@shdunning any comments?

@shdunning
Copy link
Collaborator

I suspect this has something to do with the Lua scripts referencing keys that aren't defined on that node where the Lua script is running -- like our Lua script for fetching the next job, which requires 5 keys: (1) job queue (2) in progress queue (3) pause key (4) lockKey (5) lock info key (6) set concurrency key.

We'll likely need to investigate how to make these Lua scripts cluster-safe.

@shdunning
Copy link
Collaborator

@austintaylor
Copy link
Contributor

@steven-zou What commit are you running? I believe we were constructing keys in a lua script at one point, which would cause this error, but we fixed this in #52

@steven-zou
Copy link
Author

Thanks, @austintaylor. After rough checking, there might be a mistake when importing work library into our project with dep. Probably, an older version is imported. I'll do more verification to make sure if the above issue existing.

@shdunning
Copy link
Collaborator

@steven-zou if it's any help, we've been having better luck with https://github.com/kardianos/govendor instead of dep.

@stefanoschrs
Copy link

Have the same issue on 0.5.1, when using an elasticache instance I get the

ERROR: requeuer.process - CROSSSLOT Keys in request don't hash to the same slot
ERROR: worker.fetch - CROSSSLOT Keys in request don't hash to the same slot

@sebcoetzee
Copy link

@stefanoschrs any luck fixing this on 0.5.1?

@stefanoschrs
Copy link

Nope, nothing..

@sebcoetzee
Copy link

@stefanoschrs I ended up switching out the clustered Redis instance for a normal one to get around this.

@stefanoschrs
Copy link

That's what I did also, but it's not a solution..

@steven-zou
Copy link
Author

Yes, check with v0.5.1, the issue still is there. Any bits of advice? @shdunning @austintaylor

@steven-zou
Copy link
Author

Just push up:

@shdunning @austintaylor

@shdunning
Copy link
Collaborator

@steven-zou i would need to do some digging. this (normally) occurs if we are dynamically referencing a key in the lua scripts instead of explicitly passing them in via the KEYS argv. I'll try to spend some time this week digging into the lua scripts to see if I can track down where this might be occuring.

@steven-zou
Copy link
Author

@shdunning
Thanks a lot! Expecting we can locate the root cause can fix it.

@shdunning
Copy link
Collaborator

@steven-zou when you're initializing your worker pool, can you try setting your namespace to to begin and end in { and } chars? E.g., if your namespace is work then try setting it to {work} in the call to initialize a NewWorkerPool. Our theory is that this will force all of the gocraft/work keys into the same hash slot on one of the nodes in the cluster (see docs).

Hopefully this is easy enough for you to try out and get back to us.

@steven-zou
Copy link
Author

@shdunning

I'll have a try following the way you mentioned. Will let u know the results later. Thanks.

@shdunning
Copy link
Collaborator

Cool. If that works, we can update the README for now with this info.

I'll also create a followup issue to see what it would take to make this lib redis-cluster (we use redis sentinel and have no issues, but that doesn't do key distribution across the nodes like redis cluster).

@steven-zou
Copy link
Author

@shdunning

We tried, putting namespace key in {} does work well. Thanks.

I'll close this issue as the way you mentioned can fix the problem with an easy way.

@shdunning
Copy link
Collaborator

@steven-zou nice! I'm glad this worked. Note that this isn't an ideal solution because it forces all of the gocraft/work keys onto one node in the cluster. I need to think some more on how we would solve this problem for real; that is, allow gocraft/work LUA scripts to take advantage of multi-node redis clusters. I'll create a separate issue for that.

@ifraixedes
Copy link

ifraixedes commented Feb 27, 2019

Hi folks,

We have tried to run work in a Redis cluster hosted by AWS ElastiCache and we have got some issues.

The first issue was the one commented (ERROR: requeuer.process - CROSSSLOT) which mostly was solved with the workaround commented here and in the README, we were aware of that but we forgot to add the curly braces after we changed the Redis hosted in AWS ElastiCache from single node to a cluster.

The second issue is that we're getting ERROR: requeuer.process - MOVED 3223 10.3.3.127:6379 and ERROR: worker.fetch - MOVED 3223 10.3.3.127:6379 without stopping.

If we understood correctly, it's the client that should do the redirection when it gets MOVED errors.

We are wondering how we can configure work to use a Redis cluster because it's based on github.com/gomodule/redigo and we found that it doesn't support a redis cluster.

Could help us telling what are we missing?

Thank you ver much in advance.

EDIT

Our current AWS ElastiCache Redis is formed by 2 nodes redis (v5.0.0) cluster in multi-AZ; using 1 shards, 1 master/1 slave.

In case that this information could help to give us an answer.

@pranayvarma77
Copy link

The second issue is that we're getting ERROR: requeuer.process - MOVED 3223 10.3.3.127:6379 and ERROR: worker.fetch - MOVED 3223 10.3.3.127:6379 without stopping.

@ifraixedes Hi, Did you find the fix for above issue? I am also facing the same.

@ifraixedes
Copy link

@pranayvarma77 I cannot remember. That was a long time ago, and I stopped working on it a month after I posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants