New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible for a master to restart from AOF and still be able to serve PSYNC #9796
Comments
@oranagra Sometimes I think that an AOF can have metadata just added as any other command - as long as this metadata is useful only after AOF is loaded, and replication-id is one of them. You could even write many times, and only the latest is taken in consideration. As if we had a internal metadata store and you "SETMETA key value" in the AOF. |
Considering that an AOF without tail/incr is just like an RDB, we could support this feature reusing the logic for RDB. Also solves #10523 Suggestion:
Now, a blocking SAVE/AOFRW on shutdown is really bad as it leaves clients timing out and not really knowing what to do (main reason I added the shutdown-on-sigterm/int) so we would be in a better position if that was improved first. An alternative to all of this: store the replication information in some kind of specialized file. EDIT:
Maybe that's the answer for the blocking shutdown issue, but I don't really understand how it works. |
@eduardobr your suggestion for doing an AOFRW on shutdown and then detecting that it has no incremental part in order to do a PSYNC seems valid, but maybe we need to let the user choose if it should be enabled. regarding non-blocking shutdown, the idea was to do an AOFRW during termination, keep serving writes and store them into an incremental file. |
I'm afraid that trying PSYNC after a crash could lead to some complication. Better go for a FULLSYNC in these cases and be safe? About letting the user choose if it AOFRW should be enabled on shutdown. Could it be the shutdown-on-sigint/term |
regarding the manifest. maybe we can add a hint that includes both replication offset / id, and also the size of the incremental file it corresponded to.. so the moment we append anything to the incremental file, that data in the manifest becomes invalid even without requiring to trim it from there. Regarding AOFRW on shutdown, i rather not change the meaning of existing interfaces, and i'm also not certain users should choose this behavior explicitly.
I think there is a use case, which is faster restart (from RDB), and enabling AOF later. |
Sounds good the idea of invalidating the offset as soon as there's another append.
Oh, but that's what we also get on the AOFRW But if this is added to 7.2, isn't it acceptable to change the behavior and always AOFRW by default on shutdown? (with NOSAVE as opt-out). Also one step closer to AOF and RDB consolidation, because for RDB we already always save on shutdown. One of them would have to break to consolidate. |
regarding the first topic, i think we can add the replication offset notation to the manifest on normal shutdown with AOF (not only if we do an AOFRW during shutdown), so it'll work either way.... regarding the other comments, i'll try to respond later, currently out of time. |
yes, but maybe that's a breaking change.. maybe someone does SHUTDOWN SAVE, has some script to copy the RDB file to another machine, and start redis there. |
@oranagra Do we need to record offset and replid for each AOF(base and all incrs)?, IIUC, we just need to record for base AOF (rdb), if so, I think it's better to do it with |
I think we just need to record the replication offset and Id of of the tail of the last incr file (and it's size).
Then when we generate the manifest file again (inserting a new incr or base file) we delete that record (which is why it doesn't have to mention which incr file it refers to). When loading it, we ignore the replication data if the last incr file is not at the right size. |
@oranagra, hi years later =) I would like to know how welcome a change like this is currently, and if the project of AOF and RDB consolidation has moved since then as a better approach. |
@eduardobr i'm currently focused elsewhere so a big project of promoting consolidation of AOF and RDB isn't on my table, and AFAIR we didn't push this forward since the multi-part AOF work in 7.0. i suppose trying to psync from the offset we have a the base (and then if it fails, consider loading the AOF tail), is not such a change. |
We want to make it possible for a master to restart from AOF and still be able to serve PSYNC.
see #8015
In theory, we could either check the the offset in the preamble rdb header has a chance to psync before loading the rdb and AOF tail (see #10523), and then skip the AOF tail and psync instead.
Or we can save (maybe periodically) replication offsets in the AOF file.
The text was updated successfully, but these errors were encountered: