Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk insert in redis running in cluster mode #6294

Open
knowdbtech opened this issue Aug 1, 2019 · 12 comments
Open

Bulk insert in redis running in cluster mode #6294

knowdbtech opened this issue Aug 1, 2019 · 12 comments
Labels
state:help-wanted No member is currently implementing this change

Comments

@knowdbtech
Copy link

Hi, I tried to make a mass insert in a redis cluster mode following this tutorial https://redis.io/topics/mass-insert but the methods does not work properly.

The first option with redis-cli --pipe does not work in cluster mode, I get errors like "MOVED 15045 127.0.0.1:7003" the --pipe does not follow the redirect even using -c option with redis-cli.

The second option with nc I don't get any answer about the inserts, maybe I loose data or get errors and with this method I can't handle that.

How is the best or correct and safe way to load mass data into a redis in cluster mode?

@K-Jean
Copy link

K-Jean commented Aug 23, 2019

Hi.

In our redis cluster, we used the --pipe with the same file in all our nodes to propagate the data because the --pipe does not follow the redirection.

@charlenezheng
Copy link

Hi, I am facing a same error
Last reply received from server. errors: 100000, replies: 100000
Is there any solution for this question?

@yossigo
Copy link
Member

yossigo commented Nov 24, 2020

This is indeed a limitation of redis-cli which does not support cluster mode with --pipe.

@yossigo yossigo added the state:help-wanted No member is currently implementing this change label Nov 24, 2020
@hwware
Copy link
Collaborator

hwware commented Dec 9, 2020

just take a quick glance of the code, we might need to change the approach used in the pipe mode implementation if we need to fix this issue. since currently it is implemented in a non-blocking way, however during cluster redirect it need to know which command it fails and doing resend rather than check and count the reply only. If anyone has some comments in this issue? Also glad to hear and discuss if anyone has more thoughts on this. thanks

@yossigo
Copy link
Member

yossigo commented Dec 9, 2020

@hwware the problem is that pipelining is required when attempting bulk operations, as if we block the next command until we receive a reply we'll end up binding our throughput to latency. I think the problem with redirects is not so much about knowing which command it applies to, but actually maintaining a backlog so we can rewind and re-transmit on demand. It can be done of course but it's going to take a lot of work.

A possible compromise can be to do CLUSTER SLOTS and set up all connections in advance, without supporting redirects - so a redirect response is treated as an error that terminates the session. In a way it's aligned with how redis-cli already doesn't handle errors or tries to re-connect/re-transmit.

@hwware
Copy link
Collaborator

hwware commented Dec 10, 2020

Hello @yossigo , thank you for your reply, maybe my last comment is a little bit confusing, what i mean for we need to know which command it fails is the previous executed commands it fails executing,since we need to redirect and send the command again.I think we are mentioning the samething.

For option 1 using backog, I think since we are doing a non-blocking way, therefore we cannot guarantee if we got a MOVED or ASK error reply, we can successfully find the original commands in the backlog, unless we wait for all the commands executed in backlog, blocking for getting all reply for this batch, and send another batch, if we wait for each batch finished, maybe we can think it as a buffer..

For option 2, I would think it may cause issue in this case: if we setting up the connection before and during the transmission, redis did slot migration we may have some data cannot successfully transmitted to the correct node, IMHO I wouldn't think this is a rare case since normally we use --pipe mode we will do the mass insertion and it may take long time.. please let me know if i am missing anything here, Thank you!

@yossigo
Copy link
Member

yossigo commented Dec 13, 2020

@hwware I agree with you that it would be better to be able to handle migrations during redis-cli --pipe.

@DaveLanday
Copy link

Any update on this? I was working in single mode, and having to work in cluster mode has broken many of my simple but extremely important scripts using --pipe.

@Ilan-StartIO
Copy link

Same for me

@Miguelme
Copy link

Is there a workaround to be able to use the mass insert functionality from redis on cluster-mode ?

@ssndhu01
Copy link

ssndhu01 commented Jan 8, 2023

For a workaround, We calculated the cluster slot manually and separated the commands on the basis of cluster slots in different files. and created 1 file specific to each master.

the slot can be calculated simply by the below formula,
for example:-
set xyz 123123123
key = xyz
slot = crc16(key) % 16384

@sambhavk
Copy link

sambhavk commented May 8, 2023

  1. We also had a use-case to bulk insert entries in a redis cluster but as that is not possible right now so instead we divided our cluster into multiple single master node cluster and did client level sharding.
  2. This allowed us to have cluster advantage with pipe operation speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state:help-wanted No member is currently implementing this change
Projects
Status: Backlog
Development

No branches or pull requests

10 participants