-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt redis-check-aof tool for Multi Part Aof #10061
Adapt redis-check-aof tool for Multi Part Aof #10061
Conversation
PAT @yoav-steinberg |
@chenyang8094 i didn't review the code yet, but i'd like to respond to the top comment.
|
Before I review here are some questions:
I don't think the tool needs to be atomic. If we have BASE+INCR1+INCR2 and the specified timestamp is in the middle of INCR1 then we need to delete to truncate INCR1, delete INCR2 and update the manifest. I think it's OK not to be atomic here. The user should always backup the directory before running this tool.
|
Ahe AOF file produced by rewrite won't have timestamps. And note that corruptions are only expected at the last INCR file (due to fsync or bad mount options) |
Unless we add special parameters, I can't think of any effective way to identify whether this file is an old AOF or a manifest. It is not reliable to rely on the file name alone.
I thought, in this case, the user will see the following message:
|
regarding the detection of an old AOF file, or a manifest i'm not sure it's that bad to rely on the file name (we do control the name of the manifest file, and it's unlikely that users will name their old AOF files with a "manifest" suffix). regarding the truncation message, that's great.
|
We only recommend that users back up but it is not mandatory. Without the guarantee of atomicity, I don't want to destroy the original data. In addition, if it crashes due to some reason in the middle, and we have no chance to print out enough log, may cause the manifest and AOF to be incompatible, the user will not know whether the problem is caused by the
Of course we can (if BASE is the last file).
|
The manifest must start with So, we can do like this:
Yes, only modifying the manifest is enough, but a better suggestion is to delete the files together, otherwise these files will never be seen by redis and will not be recycled by the GC mechanism. |
ok, so i think we can agree on:
|
For the record I think a recovery tool shouldn't worry about atomicity. For two main reasons:
I'm also fine with not editing the manifest file at this point to keep the tool simple (in 99% of the cases we won't need to truncate a non last INCR file), but I don't think atomicity has anything to do with it. |
@yoav-steinberg Aside from atomicity, but it does bring a lot of complexity. Thinking from another perspective, if the user really wants to truncate the non-last INCR file, we print a friendly message and ask him to manually delete these files (this time he may realize that these files should be backed up first). I think this is a safe practice, and deleting files is a very easy thing for user, and the probability of this happening is very small. |
@oranagra 1,2,3,4 all done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's quite a lot of refactoring here, which is good since the old code was out of shape.
maybe though some of this refactoring could have been left aside for another chance (like the process
function which i'm not sure why you had to refactor).
additionally, if we already do so much work here, let's try to further improve.
specifically i feel some doc comments are still missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Modifications of this PR: 1. Support the verification of `Multi Part AOF`, while still maintaining support for the old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which mode to use according to the incoming file format. `Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>` 2. Refactor part of the code to make it easier to understand 3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF file (may be `BASE` or `INCR`) The reasons for 3 above: - for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis - for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files at most, and `BASE` cannot be truncated(It only contains a timestamp annotation at the beginning of the file), so only `INCR` can be truncated. If we have a `BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2` files can be truncated at this time. If we still insist on truncate `INCR1`, we need to manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof` - If we want to support truncate any file, we need to add very complicated code to support the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
@chenyang8094 please take a look at this failure
|
another similar error:
i don't understand how we could get "aborting...", if we didn't pass the |
@oranagra GOT, I have added more logs (printf the |
finally reappeared. It looks like we are getting a wrong path, is there a bug in the dirname function?
|
I add this log: So it's Copy a glibc implementation:
|
finally found the bug:
Since |
issue link : #9788 (comment)
Modifications of this PR:
Support the verification of
Multi Part AOF
, while still maintaining support for the old-styleAOF/RDB-preamble
.redis-check-aof
will automatically choose which mode to use according to the incoming file format.Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>
Refactor part of the code to make it easier to understand
Currently only supports truncate (
--fix
or--truncate-to-timestamp
) the last AOF file (may beBASE
orINCR
)The reasons for 3 above:
--fix
: Only the last AOF may be truncated, this is guaranteed by redis--truncate-to-timestamp
: Normally, we only haveBASE
+INCR
files at most, andBASE
cannot be truncated(It only contains a timestamp annotation at the beginning of the file), so onlyINCR
can be truncated. If we have aBASE+INCR1+INCR2
file (meaning we have an interrupted AOFRW), OnlyINCR2
files can be truncated at this time. If we still insist on truncateINCR1
, we need to manually deleteINCR2
and update the manifest file, then re-runredis-check-aof