-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aof rewrite and rdb save counters in info #10178
aof rewrite and rdb save counters in info #10178
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yossigo my bash is not as fluent as my tcl 8-)
please review too.
utils/aof_backup.sh
Outdated
while true; do | ||
aof_rewrites=$(get_info_field aof_rewrites) | ||
if [ $(get_info_field "aof_rewrite_in_progress") -eq 1 ]; then | ||
echo "Redis is performing an AOF rewrite, waiting..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be very verbose, maybe we should print just the first one? or add a silent mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to complicate the script with "first time checks". Users can easily remove this or change this. I don't think it's that bad to have your terminal filled with these if you have 1800 seconds of a rewrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do, but we can leave it for Yossi to rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @oranagra, no need for a silent mode just a flag to say we've printed this message so let's not print it again as nothing changed.
Co-authored-by: Oran Agra <oran@redislabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions:
- Do we need to support AOF backups when
redis
not running (When we don't specify-h -p
option) - The current implementation is to directly establish a soft link to
appenddir
, do we need to deal with the case where multipleredis
share the same dir (they have the sameappenddir
), I mean we should strictly follow themanifest
file instructions to backup AOF - Do we need to be compatible with the
macos
? for example, this code (readlink /proc/$pid/cwd
) does not work inmacos
, but AFAIK probably many people will test this script onmacos
btw, can someone check if all this Ohh @chenyang8094 i missed your above edit. |
I would not rely on procfs for one does not exist on macOs and is optional on FreeBSD there is the procstat command line e.g. |
@chenyang8094 thanks for the comments,
I think the aim should be to provide a relatively simple script the user can modify or just read to understand how to safely backup their AOF. The goal I had in mind wasn't to provide a full blown backup solution (we don't even do this for rdb). In that respect I think we need to show how to backup a running server, and if the user needs to backup a downed server they can figure it out from the script.
Right, I'll try to fix this. Thanks.
I think this isn't critical, again, this is only an example. But if there's a good way to support both (trying to avoid adding os checks), I'm for it. |
i'm not sure about treating this script just as an example. @yossigo please share your opinion. regarding multiple servers using the same folder, one option is to parse the manifest file, the other option is to use the prefix / pattern matching. the difference is that it'll include history files, but i suppose that's ok. |
In any case I'll need the prefix in order to find the correct manifest file. I also don't want the complexity of parsing the file. I'll just backup based on |
@yoav-steinberg If we distribute the script with the |
- use `lsof` instead or procfs to find server's working dir. - use no support for `-p` relative dir in `mktemp` - no support for file change warning suppression in `tar`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
utils/aof_backup.sh
Outdated
exit 1 | ||
fi | ||
|
||
if [ $(get_config appendonly) != yes ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If redis
once enabled AOF (and created many AOF files) but disable it later, i think we need to support this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few specific comments.
I think there are bigger issues around this script, as @chenyang8094 mentioned: we expect the server to be up, but what if it's down? or restarting? or changing configuration? Not sure we can cover all cases.
I think we should also consider other options where Redis is instructed to hold back a rewrite because a backup is in progress.
utils/aof_backup.sh
Outdated
pid=$(get_info_field process_id) | ||
appenddirname=$(get_config appenddirname) | ||
appendfilename=$(get_config appendfilename). | ||
working_dir=$(lsof -a -p $pid -d cwd -Fn | grep "^n" | sed "s/n\//\//") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not get_config dir
? I wouldn't assume lsof
is always available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redis dir
is relative to the the working directory. I need the working directory to know how to treat all other locations.
Couldn't find any good way to find the working directory on bsd/macos other than lsof
. I can check the os and use procf fs on linux. Or verify lsof
exists.
utils/aof_backup.sh
Outdated
appendfilename=$(get_config appendfilename). | ||
working_dir=$(lsof -a -p $pid -d cwd -Fn | grep "^n" | sed "s/n\//\//") | ||
appenddir_path=$working_dir/$appenddirname | ||
if [ ! -d $appenddir_path ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [ ! -d $appenddir_path ]; then | |
if [ ! -d "$appenddir_path" ]; then |
Better practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repeats in many places, I did not comment everywhere.
utils/aof_backup.sh
Outdated
working_dir=$(lsof -a -p $pid -d cwd -Fn | grep "^n" | sed "s/n\//\//") | ||
appenddir_path=$working_dir/$appenddirname | ||
if [ ! -d $appenddir_path ]; then | ||
echo "Couldn't find $appenddir_path. redis-server must be run locally. Aborting." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also fail due to permissions.
utils/aof_backup.sh
Outdated
appenddirname=$(get_config appenddirname) | ||
appendfilename=$(get_config appendfilename). | ||
working_dir=$(lsof -a -p $pid -d cwd -Fn | grep "^n" | sed "s/n\//\//") | ||
appenddir_path=$working_dir/$appenddirname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appenddir_path=$working_dir/$appenddirname | |
appenddir_path="$working_dir/$appenddirname" |
utils/aof_backup.sh
Outdated
while true; do | ||
aof_rewrites=$(get_info_field aof_rewrites) | ||
if [ $(get_info_field "aof_rewrite_in_progress") -eq 1 ]; then | ||
echo "Redis is performing an AOF rewrite, waiting..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @oranagra, no need for a silent mode just a flag to say we've printed this message so let's not print it again as nothing changed.
utils/aof_backup.sh
Outdated
backup_path=$tmp_dir/$appenddirname | ||
mkdir $backup_path | ||
|
||
ln $appenddir_path/$appendfilename* $backup_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add an earlier check to make sure we're not crossing filesystem boundaries here, because links won't work in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC we made sure to create the temp folder inside dir
, so shouldn't that be covered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's covered now except if someone mounts another fs under dir
which is a case I'm willing to ignore.
@oranagra @yossigo I'm getting the feeling I'm slowly being nudged to create a full blow backup solution for AOF in bash. I originally proposed we drop official support for AOF backups because it's pretty sketchy whether this makes sense given Redis's RDB feature. I'd like us to have a talk about what's the aim of this script... Once we pass a certain level of complexity I think we should start thinking of some export feature in Redis and not a script. |
Had a talk with @yossigo & @oranagra relating to above concerns:
@yossigo Suggested we create a mechanism for disabling/aborting the rewrite process and persisting this setting between restarts: create an After the server is protected against doing rewrites we can safely create hard links/copy the dir or whatever we want. Then we can delete the lock file or call the command with We'll document this mechanism and it's logic so users can create their own backup scripts/logic instead of having to maintain a script that handles all edge cases. |
I don't think it's a good suggestion to create
|
maybe the lock file should be in |
I'm afraid we need it in
We wanted to make sure we have a solution for these cases. So I'm not sure we want to back down from that now. What we can do is print a big warning if Redis starts with this in the directory, or even fail to start in such a case. |
I think keeping the flag in memory might be enough. A server that restarts and immediately rewrites and completes the rewrite before creating links of all files is possible, but very unlikely.. |
I think we can keep the changes in Also, if we don't plan to maintain and improve this backup script, we shouldn't release it in the redis source code, instead, we can put it in the redis documentation as a suggestion or an example for users to refer to (the only uncertainty is that how many people will actually see it). Another suggestion is that we abandon the use of scripts and instead directly implement the C code tool of p.s. If we want to temporarily disable AOFRW, we can set |
👍 I'll change this PR to only include the stats and update the top comment.
👍 I agree. So given the intention to make the mechanism simpler by adding a block rewrite mechanism, we'll just document this and forget about the script.
I think this is too complex and will hide the internals of the mechanism which we want (advanced) users to be able to understand and customize for their own needs.
This is a decent option, but bear in mind that changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@redis/core-team please approve two new counters in INFO stats.
Add aof_rewrites and rdb_snapshots counters to info. This is useful to figure our if a rewrite or snapshot happened since last check. This was part of the (ongoing) effort to provide a safe backup solution for multipart-aof backups.
@@ -1471,6 +1471,7 @@ int rdbSaveBackground(int req, char *filename, rdbSaveInfo *rsi) { | |||
pid_t childpid; | |||
|
|||
if (hasActiveChildProcess()) return C_ERR; | |||
server.stat_rdb_saves++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i notice we only do server.stat_rdb_saves++
in rdbSaveBackground
and did not do it in SAVE command, (or in FLUSHALL it may trigger rdbsave) (or other)
I think it should be counted in the save command?
or maybe we should put it near server.lastsave
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree that we should increment it in the SAVE command too, but i'm not sure about the others.
- i think there's value (or at least it's very different) setting it before we fork, rather than at completion (in which case we need to decide if we count failures too).
- i don't think the implicit save of FLUSHALL should count, arguably it's the same as deleting the old persistence file, rather than saving an empty one.
- i don't think we should count the empty rdb we save when creating a an empty base for a multipart AOF when starting up empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the one in SAVE command is my first thinking, and then I looked at the others (mention in case we forget).
as long as we agree with the stat field meaning, just a random look
i will make a PR with the SAVE one
Currently, we only increment stat_rdb_saves in rdbSaveBackground, we should also increment it in the SAVE command. The stat counter was introduced in redis#10178
Currently, we only increment stat_rdb_saves in rdbSaveBackground, we should also increment it in the SAVE command. We concluded there's no need to increment when: 1. saving a base file for an AOF 2. when saving an empty rdb file to delete an old one 3. when saving to sockets (not creating a persistence / snapshot file) The stat counter was introduced in #10178 * fix a wrong comment in startSaving
Currently, we only increment stat_rdb_saves in rdbSaveBackground, we should also increment it in the SAVE command. We concluded there's no need to increment when: 1. saving a base file for an AOF 2. when saving an empty rdb file to delete an old one 3. when saving to sockets (not creating a persistence / snapshot file) The stat counter was introduced in redis#10178 * fix a wrong comment in startSaving
Currently, we only increment stat_rdb_saves in rdbSaveBackground, we should also increment it in the SAVE command. We concluded there's no need to increment when: 1. saving a base file for an AOF 2. when saving an empty rdb file to delete an old one 3. when saving to sockets (not creating a persistence / snapshot file) The stat counter was introduced in redis#10178 * fix a wrong comment in startSaving
Add aof_rewrites and rdb_snapshots counters to info. This is useful to figure out if a rewrite or snapshot happened since last check. This was part of the (ongoing) effort to provide a safe backup solution for multipart-aof backups.
Note, it was eventually decided not to include a backup script in the redis repository and instead document how to safely backup AOF files, see: redis/redis-doc#1794
Original top-comment before backup script was removed from this PR:
AOF backup script defined in #10063.
In Redis 7 we introduced multi-part AOFs (#9788) based on a manifest file that indicates which files are part of Redis's AOF persistence. Our updated documentation about backup tells the user to simply copy the directory where all the AOF files (and manifest) reside.
This is wrong and bad advice. The reason is that during the copy files in the directory might change. During a rewrite we might end up copying an old manifest file and new base and increment files. We'll end up with a useless backup.
So this PR adds a backups script that performs the following:
Other changes:
Include stat counters for aof rewrites and bgsaves:
Useful in general but also practical if we want to make sure a certain operation was performed without any aof rewrite in the middle.
auto-aof-rewrite-percentage
(and fail backup if rewrite is in progress) config to disable rewrites.auto-aof-rewrite-percentage
or a specific command for rewrites.