Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOADING Redis is loading the dataset in memory #4624

Open
Hronom opened this issue Jan 22, 2018 · 23 comments
Open

LOADING Redis is loading the dataset in memory #4624

Hronom opened this issue Jan 22, 2018 · 23 comments
Labels
class:feature state:needs-design the solution is not obvious and some effort should be made to design it

Comments

@Hronom
Copy link

Hronom commented Jan 22, 2018

This is a reference issue from Spring Jira https://jira.spring.io/browse/DATAREDIS-757

This problem not allows Redis to gracefully integrate in environment of microservices.

Several servers have the issue that they open a TCP port before they are actually ready. In contrast, Spring Boot opens a port as soon as it's ready with all initialization and I think that would be a better option for Redis, too, to open a port as soon as all data is loaded.

This helps other application to don't crash with exception:

LOADING Redis is loading the dataset in memory

Please open TCP port when Redis completely ready to serve.

@Hronom
Copy link
Author

Hronom commented Jan 22, 2018

Also there is discussion on client lib Lettuce: redis/lettuce#625
This is not a client responsibility handle this.

@scenbuffalo
Copy link

scenbuffalo commented Jan 23, 2018

same problem as yours.
if you use sentinel, you can have a try.
#4561

@Hronom
Copy link
Author

Hronom commented Jan 23, 2018

I have created small workaround sh script that helps to wait until Redis fully started, here is the repo https://github.com/Hronom/wait-for-redis

This will help till developers fix that.

@ghik
Copy link

ghik commented Jan 2, 2019

Another solution to this could be some blocking command which waits until Redis is ready to serve data.

@filipecosta90
Copy link
Contributor

filipecosta90 commented May 3, 2021

@Hronom I believe we can close this issue given this is a by-design feature and not really an issue.
requesting the @redis/core-team opinion about it.

Redis enables certain commands while loading the database (ok-loading) like INFO, SELECT, MONITOR, DEBUG,.....

Clients should be responsible for being able to handle LOADING replies ( like they are able to handle MOVED, ASKED, etc... ).
WDYT?

@Hronom
Copy link
Author

Hronom commented May 3, 2021

@filipecosta90 this is defiantly a feature request, the github issues here used as place where you not only place issues, but where users can place feature request.

So please consider this as feature request.

In the world of docker this is very needed feature and it 3 years old? 3 years not able to implement solution with open port after load or commands that waits?

@oranagra
Copy link
Member

oranagra commented May 4, 2021

This is indeed by design and a feature request, let's see how painful it is and what we can do to improve.

Maybe we can add a config flag that will tell redis not to listen on the socket at startup until after it loaded the persistence file.
This could take a very long time and will mean that no one can monitor it and get any progress checks, so many users would not want to use such a feature, but i suppose for some it can make life simpler.
Note however that it won't solve the -LOADING response returned by a replica when loading the data it gets from the master.

Another approach could be to postpone most commands (excluding INFO, PING, etc) while loading instead of responding with -LOADING error. similar to what CLIENT PAUSE WRITE does today.
This would be a breaking change, and maybe also needs to be controlled by a config flag.
@redis/core-team WDYT?

@yossigo
Copy link
Member

yossigo commented May 10, 2021

@oranagra I think this should be handled by the client, it's fairly easy to catch this and retry if that's what the app wishes to do.

@filipecosta90 filipecosta90 added state:needs-design the solution is not obvious and some effort should be made to design it class:feature and removed state:to-be-closed requesting the core team to close the issue labels May 24, 2021
@collimarco
Copy link

We use the Ruby Redis client and this error in not handled properly- i.e. it simply raise the exception:

LOADING Redis is loading the dataset in memory

It is a pain.

@eduardobr
Copy link
Contributor

@yossigo It could be that the retry will take minutes to succeed depending on dataset size. Even 10 seconds can be unacceptable in some applications.

As I commented in a similar thread: #4561 (comment)

"I've been thinking why can't we have a mode where the replica keeps serving stale data while the Full Sync is happening in background.
That could cost twice the memory and maybe disk space usage for a short period, but I think it definitely worth it and would get rid the LOADING status in most situations.
In my use case where there's no need for sharding (so no Redis Cluster) and where it's ok to have master down for about a minute and replicas serving stale data, it would be very useful. Sometimes a redis standalone master with a few replicas can be a solid setup, if we solve this kind of issues. That of course makes things more Kubernetes friendly without needing to pay for Redis Operator."

@yossigo
Copy link
Member

yossigo commented Aug 2, 2021

@eduardobr You're referring to a specific case here:

  • Replica is already up and has data
  • App is willing to get stale data

In that case, using repl-diskless-load swapdb alread yields a very similar behavior, although AFAIR (need to check) it still returns -LOADING and refuses commands. But that's probably a relatively small change.

@eduardobr
Copy link
Contributor

eduardobr commented Aug 2, 2021

@yossigo I was coincidently testing the behavior of repl-diskless-load swapdb today in an attempt to solve the Loading status.
It will return LOADING about the same amount of time as disk load in our setup, but taking a look into the code it seems to be doable to change so we keep serving read commands. That would be an amazing improvement and make our setup solid here.

What does it take to move on with this change, and how could I help?

Thanks

@oranagra
Copy link
Member

oranagra commented Aug 4, 2021

In theory, if someone is already willing to get stale reads, and is using repl-diskless-load swapdb, i don't see any reason not to allow reads from that backup database. at least as long as modules aren't involved.
However, supporting that can be adding some extra complications to the code which i'm not sure are worth it.

Also, truth be told, i don't really like the swapdb feature.
I added it following Salvatore's request, in order to have a safer diskless loading, in case the master dies after the replica started loading the new data which it can't complete.
But i think in many cases this mode is more dangerous since the system may not have enough memory to host both databases side by side, and the kernel's OOM killer may intervene.

@eduardobr
Copy link
Contributor

@oranagra @yossigo
I started a draft of a change to test the concept, it seems to be ok and solve the problem:
unstable...eduardobr:feature/draft-use-tempdb-swapdb

I started swapping the pointers of whole db array, but later noticed I could use the already tested mechanism of swap db and run it for each db index (will take care of things like maintaining selected db reference).

But there are small things I still don't understand and would need help if we want to move on with this, for example, why calling "touchAllWatchedKeysInDb" works for SWAPDB command but crashes some tests if I use it after sync is finished. And if actually I would need to call it at all (we don't do we when restoring the backup in current implementation).

Also, the failing test may not be relevant anymore, but I would need help with that.

@oranagra
Copy link
Member

oranagra commented Aug 5, 2021

up until now, touchAllWatchedKeysInDb was only needed once when we begin the loading, since no commands could be executed during loading.
but now:

  1. we no longer want to invalidate watched keys when loading begins (we logically, we didn't yet change anything in the db from the user's perspective)
  2. we do need to invalidate watched keys when loading was successful (but not when it failed if we recover from the backup).

regarding the crash and the test, i don't now what you're facing, you'll have to debug it.
if you need assistance, maybe you can submit a PR and ask others expert contributors for help (i'm a bit too busy these days to dive in myself)

@eduardobr
Copy link
Contributor

PR created: #9323

@eduardobr
Copy link
Contributor

Revisiting this original issue, I think we can solve with little code and no breaking changes for scenarios with more than one replica:
Simply offer an option to start full syncs with no more than n replicas at the same time. Then you ensure you won’t put down all replicas with LOADING state simultaneously.

repl-diskless-sync-delay 0 doesn’t seem to guarantee a single replica at a time.

We could have some
repl-max-parallel-sync [n]

@yossigo @oranagra
Makes sense?

@madolson
Copy link
Contributor

@eduardobr I'm not sure how that solves the original request. Instead of having "-Loading" errors the clients will get stale data for a bit, then later get the loading error later.

@eduardobr
Copy link
Contributor

eduardobr commented Oct 22, 2021

@madolson it solves in the sense that proxies, container orchestrators (and maybe client libraries) can send the client to a replica that is not “-Loading”.
In the case of Kubernetes, probes can query for this status to quickly take a replica out of service setting it unready without killing it.
Because it’s optional, the consumer will know the consequences (having stale vs no data at all), kinda same as the the compromise of replica-serve-stale-data but on a different case.

When configured with 50% of total replicas, in worst case master will finish syncs in 2 rounds

@madolson
Copy link
Contributor

@eduardobr I see, I think that is a minor improvement but I'm not sure it solves the original requester's issue, which is that they weren't able to handle the -Loading error. Presumably they could handle the timeout.

@eduardobr
Copy link
Contributor

@madolson right, that is not precisely what the author wants, just an alternative to mitigate the problems that -Loading error brings. Original solution of having port closed when doing full sync could confuse other systems because they won't be able to know if redis is even alive.

@fiedl
Copy link

fiedl commented Mar 16, 2022

As a workaround, I'm using a docker-compose healthcheck:

# docker-compose.yml

services:
  app:
    depends_on:
      redis:
        condition: service_healthy
  redis:
    healthcheck:
      test: ["CMD-SHELL", "redis-cli ping | grep PONG"]
      interval: 1s
      timeout: 3s
      retries: 5

@bradjones1
Copy link

^ This is the way. TIL that the compose spec now allows you to actually wait on other services with healthchecks. dockerize is still great if services open ports only when ready, but this is awesome, and the "Docker way to do it" IMO.

https://docs.docker.com/compose/compose-file/#depends_on

(Also, apparently version is now unnecessary, if you're wondering "what version" this was released on.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
class:feature state:needs-design the solution is not obvious and some effort should be made to design it
Projects
None yet
Development

No branches or pull requests