Make it possible for a master to restart from AOF and still be able to serve PSYNC #9796

oranagra · 2021-11-17T06:18:45Z

We want to make it possible for a master to restart from AOF and still be able to serve PSYNC.
see #8015

In theory, we could either check the the offset in the preamble rdb header has a chance to psync before loading the rdb and AOF tail (see #10523), and then skip the AOF tail and psync instead.
Or we can save (maybe periodically) replication offsets in the AOF file.

eduardobr · 2022-01-28T12:36:11Z

@oranagra
Is there a vision of how the implementation may look like?
Is the new manifest file only there to help managing files?

Sometimes I think that an AOF can have metadata just added as any other command - as long as this metadata is useful only after AOF is loaded, and replication-id is one of them. You could even write many times, and only the latest is taken in consideration. As if we had a internal metadata store and you "SETMETA key value" in the AOF.

oranagra · 2022-01-28T19:58:07Z

No concrete design yet, but what you suggested can some day be done using annotations:
#9325
#9282

eduardobr · 2022-09-04T20:36:01Z

Considering that an AOF without tail/incr is just like an RDB, we could support this feature reusing the logic for RDB.

Also solves #10523

Suggestion:

When AOF on and RDB off, do blocking AOFRW on shutdown, respecting shutdown flags. It’s same as a SAVE, but ends up as base AOF (RDB) instead of dump.rdb.
On startup, if AOF on and no AOF tail/incr (meaning graceful shutdown), restore same way as RDB, and get possible partial sync.
If there’s an AOF tail, then it's not a graceful shutdown, so do nothing different for now, but improve it at a later point.

Now, a blocking SAVE/AOFRW on shutdown is really bad as it leaves clients timing out and not really knowing what to do (main reason I added the shutdown-on-sigterm/int) so we would be in a better position if that was improved first.
It would be better if Redis at least returned a proper status while shutdown is in progress, instead of blocking.
The blocking shutdown can cause any platform probing the Redis instance to think it's dead or unhealthy and try to force kill it.

An alternative to all of this: store the replication information in some kind of specialized file.

EDIT:
@oranagra, here #11076 you mention

We could even make that SHUTDOWN SAVE a non-blocking one (in the spirit of #9872, we can initiate an AOFRW, keep processing traffic, even write commands, and then terminate when the rewrite is completed).

Maybe that's the answer for the blocking shutdown issue, but I don't really understand how it works.

oranagra · 2022-09-05T08:25:25Z

@eduardobr your suggestion for doing an AOFRW on shutdown and then detecting that it has no incremental part in order to do a PSYNC seems valid, but maybe we need to let the user choose if it should be enabled.
in some cases the incremental part of the AOF file can be huge, and it makes sense to spend a little time on shutdown in order to load faster on startup, but in some cases it doesn't? it depends on how long is the tail.. maybe we can make a configurable threshold that will by default make the right decision.

regarding non-blocking shutdown, the idea was to do an AOFRW during termination, keep serving writes and store them into an incremental file.
so at the end we have a relatively small incremental part.
if we want that to also serve for psync, we need to sign the replication offset somewhere.
this can be either periodically in the incremental part of the AOF (to also assist on crashes), or maybe just a single offset in the manifest file in case of a graceful shutdown.

eduardobr · 2022-09-05T13:26:58Z

I'm afraid that trying PSYNC after a crash could lead to some complication. Better go for a FULLSYNC in these cases and be safe?
I just looked the manifest file format, and the fact that it declares the type of information as first "column" is great. We could store offset and replicationid there.
What's the consequence in case of a crash, where we would reload some old offset info?
If using the manifest, should the replid and offset be removed on startup and saved on shutdown?

About letting the user choose if it AOFRW should be enabled on shutdown. Could it be the shutdown-on-sigint/term save option? So if AOF only enabled, will do a BGREWRITEAOF on shutdown, if RDB only enabled, do what it does today.
That's breaking, but what's the use case for AOF + creating RDB on shutdown?

oranagra · 2022-09-05T13:57:13Z

regarding the manifest. maybe we can add a hint that includes both replication offset / id, and also the size of the incremental file it corresponded to.. so the moment we append anything to the incremental file, that data in the manifest becomes invalid even without requiring to trim it from there.
i'm not certain if this should be in a column, or maybe some other special notation (e.g. a line that starts with # or something).
@chenyang8094 please advise.

Regarding AOFRW on shutdown, i rather not change the meaning of existing interfaces, and i'm also not certain users should choose this behavior explicitly.
I think we can attempt to make redis do the right thing by default. concluding that the AOF file is too long on shutdown and implicitly deciding to rewrite it (unless maybe the NOSAVE or FORCE arguments are present).

That's breaking, but what's the use case for AOF + creating RDB on shutdown?

I think there is a use case, which is faster restart (from RDB), and enabling AOF later.
it depends if we wanna do this as a small incremental improvement, or tackle the bigger thing which is to completely unify RDB snapshots and AOF (#9795), which requires a bit more planning.

eduardobr · 2022-09-05T14:15:59Z

Sounds good the idea of invalidating the offset as soon as there's another append.
But AOFRW only if the file is too long seems to me an option that will almost never enable PSYNC in a lot of setups. A conscious user will already set the percentage threshold to keep the AOF compact.
Maybe an AOFSAVE flag to be totally non-breaking and opt-in? (While NOSAVE continues to be the option to disable any kind of save on shutdown)

I think there is a use case, which is faster restart (from RDB), and enabling AOF later.

Oh, but that's what we also get on the AOFRW

But if this is added to 7.2, isn't it acceptable to change the behavior and always AOFRW by default on shutdown? (with NOSAVE as opt-out). Also one step closer to AOF and RDB consolidation, because for RDB we already always save on shutdown. One of them would have to break to consolidate.

oranagra · 2022-09-05T14:48:56Z

regarding the first topic, i think we can add the replication offset notation to the manifest on normal shutdown with AOF (not only if we do an AOFRW during shutdown), so it'll work either way....

regarding the other comments, i'll try to respond later, currently out of time.

oranagra · 2022-09-06T10:34:36Z

Oh, but that's what we also get on the AOFRW

yes, but maybe that's a breaking change.. maybe someone does SHUTDOWN SAVE, has some script to copy the RDB file to another machine, and start redis there.
i rather not change the meaning of the SAVE flag, unless it comes together with a much bigger topic of consolidating snapshots and AOF into one.

chenyang8094 · 2022-09-09T01:40:11Z

@oranagra Do we need to record offset and replid for each AOF(base and all incrs)?, IIUC, we just need to record for base AOF (rdb), if so, I think it's better to do it with #（recorded at the very beginning of the manifest） . Otherwise, it would be better to do it in columns if required for every AOF file.

oranagra · 2022-09-10T15:59:21Z

I think we just need to record the replication offset and Id of of the tail of the last incr file (and it's size).
So It could be:

# last-size <last incr file size> repl-id <repl id> repl-ofs <repl offset>

Then when we generate the manifest file again (inserting a new incr or base file) we delete that record (which is why it doesn't have to mention which incr file it refers to).

When loading it, we ignore the replication data if the last incr file is not at the right size.
This mechanism can work with both an aofrw at shutdown, also with just an aof flush at shutdown.

eduardobr · 2024-02-12T17:19:54Z

@oranagra, hi years later =)
I remember I have drafted your last comment and made it work (just needed good validation when loading the manifest and written tests), would be a waste to never bring it back.

I would like to know how welcome a change like this is currently, and if the project of AOF and RDB consolidation has moved since then as a better approach.
Cheers

oranagra · 2024-02-15T09:54:49Z

@eduardobr i'm currently focused elsewhere so a big project of promoting consolidation of AOF and RDB isn't on my table, and AFAIR we didn't push this forward since the multi-part AOF work in 7.0.
if it is a small incremental change that's isolated from other things, we can consider it for 8.0.

i suppose trying to psync from the offset we have a the base (and then if it fails, consider loading the AOF tail), is not such a change.
but maybe writing an offset at graceful termination could be one. i.e. either to the manifest, or as a comment at the end of the last incr file.
i can't afford to deep dive into it right now, but if it makes sense to you and you conclude it's straight forward and doesn't have any down-sides compared to what we have today, you can proceed.

oranagra created this issue from a note in 7.0 (To Do) Nov 17, 2021

oranagra mentioned this issue Nov 17, 2021

Implement Multi Part AOF mechanism to avoid AOFRW overheads. #9788

Merged

2 tasks

oranagra moved this from To Do to Backlog in 7.0 Feb 14, 2022

eduardobr mentioned this issue Apr 4, 2022

[NEW] Skip unnecessary AOF loading from disk on startup when a master is available #10523

Open

oranagra removed this from Backlog in 7.0 Apr 5, 2022

oranagra added this to backlog in 7.2 <obsolete> via automation Apr 5, 2022

This was referenced May 8, 2022

[QUESTION] Why don't save the repl-offset and id when AOF-REWRITE ? #8591

Closed

[BUG] Once start aof-use-rdb-preamble, redis does not use partial REPLICA after restarted #10686

Closed

eduardobr mentioned this issue Aug 21, 2022

Consolidate the BGSAVE snapshots with AOFRW #9795

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible for a master to restart from AOF and still be able to serve PSYNC #9796

Make it possible for a master to restart from AOF and still be able to serve PSYNC #9796

oranagra commented Nov 17, 2021 •

edited

eduardobr commented Jan 28, 2022

oranagra commented Jan 28, 2022

eduardobr commented Sep 4, 2022 •

edited

oranagra commented Sep 5, 2022

eduardobr commented Sep 5, 2022

oranagra commented Sep 5, 2022

eduardobr commented Sep 5, 2022 •

edited

oranagra commented Sep 5, 2022

oranagra commented Sep 6, 2022

chenyang8094 commented Sep 9, 2022 •

edited

oranagra commented Sep 10, 2022

eduardobr commented Feb 12, 2024

oranagra commented Feb 15, 2024

Make it possible for a master to restart from AOF and still be able to serve PSYNC #9796

Make it possible for a master to restart from AOF and still be able to serve PSYNC #9796

Comments

oranagra commented Nov 17, 2021 • edited

eduardobr commented Jan 28, 2022

oranagra commented Jan 28, 2022

eduardobr commented Sep 4, 2022 • edited

oranagra commented Sep 5, 2022

eduardobr commented Sep 5, 2022

oranagra commented Sep 5, 2022

eduardobr commented Sep 5, 2022 • edited

oranagra commented Sep 5, 2022

oranagra commented Sep 6, 2022

chenyang8094 commented Sep 9, 2022 • edited

oranagra commented Sep 10, 2022

eduardobr commented Feb 12, 2024

oranagra commented Feb 15, 2024

oranagra commented Nov 17, 2021 •

edited

eduardobr commented Sep 4, 2022 •

edited

eduardobr commented Sep 5, 2022 •

edited

chenyang8094 commented Sep 9, 2022 •

edited