Bulk insert in redis running in cluster mode #6294

knowdbtech · 2019-08-01T21:02:43Z

Hi, I tried to make a mass insert in a redis cluster mode following this tutorial https://redis.io/topics/mass-insert but the methods does not work properly.

The first option with redis-cli --pipe does not work in cluster mode, I get errors like "MOVED 15045 127.0.0.1:7003" the --pipe does not follow the redirect even using -c option with redis-cli.

The second option with nc I don't get any answer about the inserts, maybe I loose data or get errors and with this method I can't handle that.

How is the best or correct and safe way to load mass data into a redis in cluster mode?

K-Jean · 2019-08-23T09:30:55Z

Hi.

In our redis cluster, we used the --pipe with the same file in all our nodes to propagate the data because the --pipe does not follow the redirection.

charlenezheng · 2020-11-12T02:26:21Z

Hi, I am facing a same error
Last reply received from server. errors: 100000, replies: 100000
Is there any solution for this question?

yossigo · 2020-11-24T12:20:32Z

This is indeed a limitation of redis-cli which does not support cluster mode with --pipe.

hwware · 2020-12-09T20:38:11Z

just take a quick glance of the code, we might need to change the approach used in the pipe mode implementation if we need to fix this issue. since currently it is implemented in a non-blocking way, however during cluster redirect it need to know which command it fails and doing resend rather than check and count the reply only. If anyone has some comments in this issue? Also glad to hear and discuss if anyone has more thoughts on this. thanks

yossigo · 2020-12-09T21:30:27Z

@hwware the problem is that pipelining is required when attempting bulk operations, as if we block the next command until we receive a reply we'll end up binding our throughput to latency. I think the problem with redirects is not so much about knowing which command it applies to, but actually maintaining a backlog so we can rewind and re-transmit on demand. It can be done of course but it's going to take a lot of work.

A possible compromise can be to do CLUSTER SLOTS and set up all connections in advance, without supporting redirects - so a redirect response is treated as an error that terminates the session. In a way it's aligned with how redis-cli already doesn't handle errors or tries to re-connect/re-transmit.

hwware · 2020-12-10T04:39:52Z

Hello @yossigo , thank you for your reply, maybe my last comment is a little bit confusing, what i mean for we need to know which command it fails is the previous executed commands it fails executing,since we need to redirect and send the command again.I think we are mentioning the samething.

For option 1 using backog, I think since we are doing a non-blocking way, therefore we cannot guarantee if we got a MOVED or ASK error reply, we can successfully find the original commands in the backlog, unless we wait for all the commands executed in backlog, blocking for getting all reply for this batch, and send another batch, if we wait for each batch finished, maybe we can think it as a buffer..

For option 2, I would think it may cause issue in this case: if we setting up the connection before and during the transmission, redis did slot migration we may have some data cannot successfully transmitted to the correct node, IMHO I wouldn't think this is a rare case since normally we use --pipe mode we will do the mass insertion and it may take long time.. please let me know if i am missing anything here, Thank you!

yossigo · 2020-12-13T20:09:51Z

@hwware I agree with you that it would be better to be able to handle migrations during redis-cli --pipe.

DaveLanday · 2022-03-29T18:26:52Z

Any update on this? I was working in single mode, and having to work in cluster mode has broken many of my simple but extremely important scripts using --pipe.

Ilan-StartIO · 2022-04-11T08:58:56Z

Same for me

Miguelme · 2022-04-28T14:44:57Z

Is there a workaround to be able to use the mass insert functionality from redis on cluster-mode ?

ssndhu01 · 2023-01-08T10:17:25Z

For a workaround, We calculated the cluster slot manually and separated the commands on the basis of cluster slots in different files. and created 1 file specific to each master.

the slot can be calculated simply by the below formula,
for example:-
set xyz 123123123
key = xyz
slot = crc16(key) % 16384

sambhavk · 2023-05-08T11:25:52Z

We also had a use-case to bulk insert entries in a redis cluster but as that is not possible right now so instead we divided our cluster into multiple single master node cluster and did client level sharding.
This allowed us to have cluster advantage with pipe operation speed

yossigo added the state:help-wanted No member is currently implementing this change label Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk insert in redis running in cluster mode #6294

Bulk insert in redis running in cluster mode #6294

knowdbtech commented Aug 1, 2019

K-Jean commented Aug 23, 2019

charlenezheng commented Nov 12, 2020

yossigo commented Nov 24, 2020

hwware commented Dec 9, 2020 •

edited

yossigo commented Dec 9, 2020

hwware commented Dec 10, 2020

yossigo commented Dec 13, 2020

DaveLanday commented Mar 29, 2022

Ilan-StartIO commented Apr 11, 2022

Miguelme commented Apr 28, 2022

ssndhu01 commented Jan 8, 2023

sambhavk commented May 8, 2023

Bulk insert in redis running in cluster mode #6294

Bulk insert in redis running in cluster mode #6294

Comments

knowdbtech commented Aug 1, 2019

K-Jean commented Aug 23, 2019

charlenezheng commented Nov 12, 2020

yossigo commented Nov 24, 2020

hwware commented Dec 9, 2020 • edited

yossigo commented Dec 9, 2020

hwware commented Dec 10, 2020

yossigo commented Dec 13, 2020

DaveLanday commented Mar 29, 2022

Ilan-StartIO commented Apr 11, 2022

Miguelme commented Apr 28, 2022

ssndhu01 commented Jan 8, 2023

sambhavk commented May 8, 2023

hwware commented Dec 9, 2020 •

edited