New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Disk Based Replication Causing OOM in Master (v6.0.5) #7717
Comments
@ganeshkumarganesan i assume that by OOM you mean that the kernel killed the process. What happens in diskless replication mode is a side effect. unlike disk-based where the produced RDB can still be used for persistence purposes, in diskless mode the fork child has no further purpose, so we kill it. |
@oranagra The RDB save process in Master was initiated for the slave replication. As already the output buffer is breached, so the current RDB (in progress) cannot be used for a slave. Anyways again the RDB will be saved in Master and it will be transferred to the slave (full resync will happen).
In my case, the VM got rebooted a few times due to the OOM. When the redis server load remain at peak. |
@ganeshkumarganesan the point is that the rdb that the fork produces can still be used for persistence snapshot (unlike the one produced by diskless replication). Arguably, if your But anyway, I thnk that in your case there's no escape from adding more RAM, without that you are unable to complete any form of background saving: diskless replication, disk-based replication, BGWAVE, AOFRW. I do agree that in a case of failure you prefer to have the child terminated rather than lose the parent process or VM. Future versions of redis will support the above mentioned |
I'm interested the issue, I agree with you @oranagra
Our operation engineers also use this feature, we can not remove this directly @ganeshkumarganesan For your scenario, I also think the valid way is adding more RAM. But your suggestion also is meaningful, replicas need wait more time to start Actually, I'm trying to solve this problem just now, I also think a config is a good idea. but why we use config name 'rdb-del-sync-files' ? @oranagra is 'rdb-del-sync-fails'? but what about 'stop-rdb-sync-fail'? like as follow, just a draft.
|
@ShooterIT are you trying to solve the problem of not being able to try another replication attempt until this fork is done? i.e. not a problem of insufficient memory that kills the server or VM? In both cases i don't see why another attempt will just run into the same problem and fail again. I think we want to make redis behave good by default and avoid adding too many configs for corner cases, this will make it harder for users to understand the configuration (if there are too many), and more complicated to maintain the code base. In this case i think the problem isn't solvable, there's simply not enough RAM to replicate (or persist) under traffic. And the only reason i see that we should kill the fork, is if it doesn't serve any purpose at all (i.e. redis isn't configured to persist to RDB) |
@oranagra Yes, there is no difference between these two cases for my changes. The write traffic will not always be heavy, it may be just a relatively short period of time. After the traffic peak has passed, so much memory is not needed, and the next replication attempts will succeed. It makes sense for this scenario. I also don't like there are many configuration items. Maybe for these two cases, we can change into diskless replication easily. |
@ShooterIT you mean that simply changing to use diskless would solve your problems without any changes to redis? anyway, back to the topic at the top, i don't wanna introduce any new configs, and i don't feel we should kill the fork when the replicas drop, unless either |
@oranagra Yes, i think. Internet speed is getting faster and faster now, diskless replication is really a good choice. I'm sorry i don't notice we already have a config |
@ShooterIT if it doesn't mess up the code a lot, i don't see any reason not to add that kill. feel free to PR. |
Hi @oranagra another related topic i want to say. Redis will use too much memory when full sync, that has many effects, such as OOM, fail to full sync, even the bug you fixed #5126. AFAIK, some company engineers persist the output buffer into a file without using memory when wait child rdb finished, that may be not elegant, but exactly solve some problems.
BTW, for your PR #5126, it is very serious problem for wrong eviction i occurred many times, is there a easy way to fix before 5.0? |
@ShooterIT we have a long term plan to solve that problem properly by multiplexing these replication buffers together with the RDB so that they are cached in the replica side, rather than the master. This will move the burden of keeping this buffer from the master to the replica, and will also properly solve the problems the "meaningful offset" feature aimed to solve (feature that was introduced in 6.0 and was later reverted). I don't think that compressing them or unifying them is gonna solve it (it'll just reduce the overhead a little bit). I'ts probably not that complicated to cherry pick the fix for #5126 into 4.0, but we no longer maintain this version (maybe with the exception of critical issues like newly discovered common crashes, data corruptions or severe security issues). The right thing for people to do is upgrade! |
@oranagra Copy that, thanks
That's truly a good idea. One thing i want to share with you, i try to use a separate thread to send RDB yesterday just because i recalled you said #7096. Maybe i succeeded, it passed all tests |
@oranagra I have another idea, we can write replication buffers to rdb child process just like rewriting AOF. RDB child process can persist it into a RDB(Aux filed or AOF with RDB preamble or tmp aof). And It will be efficient if we send it by a separate thread. what's more, maybe replica also can receive replication buffer when it loads RDB. |
@ShooterIT if we want to store these replica buffers at the end of the RDB (like in preamble AOF), we need to store them in memory till the RDB writing is done (this is what happens for AOFRW too), so it doesn't solve the memory stress on the machine holding the master. |
The Redis master starting the BGSAVE on disk, Within a few mins, the default "client-output-buffer-limit" is reached and slave client connection (psync) getting closed. But the master didn't kill the RDB saving (child) process. So, the Copy-On-Write buffer kept accumulating. Which led to the OOM in master. Attaching the logs below.
Master Log
Whereas the diskless replication working as expected in 6.0.5. #6866
The text was updated successfully, but these errors were encountered: