-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new AOF persistence mechanism #9539
Conversation
@chenyang8094 did you happen to look at #6584 (i haven't), i wonder if maybe they had a solution to the rename problem or something else we can learn from. |
Sorry to have time to reply recently. I probable read the LESS paper and the implementation of #6584. I found that these two solutions have a lot in common. For example, the RDB used in LESS is equivalent to the BASE AOF in my PR, and the AOF in LESS is equivalent to PING/PONG AOF. Among them, I carefully looked at the crash and recovery related parts in the LESS paper. In LESS, it faced the same problem as I encountered. LESS needs to rename the RDB file first, and then rename the AOF file. How to ensure these two renames As for the atomicity of operation, I did not find it in the LESS paper. I think the paper does not give a method to recover the crash between rename rdb and raname aof. And the description of how to deal with temp aof when rewrite is retried after last rewrite fails. |
@chenyang8094 from reading this text (didn't look at the code) i think they did intend to handle this case.
So if on startup we find a temp "base" (RDB) file but no "pong" file, we can conclude that "pong" was renamed, and we can rename the temp rdb to "base" and then load these two. |
Hello. I was trying to check the progress of a previously posted pull request (#6584 ), and saw that my pull request was mentioned and left a post. In the case of the paper read by @chenyang8094 , the intention of the LESS module I designed is not explained in detail (due to the limitation of the size of the conference paper). Also, since there are no separate comments in the code, I think it will be difficult to understand. If you have any questions about LESS, please leave a comment. *I hope that I will be able to help you with the module you are developing now. |
Yes, LESS does rely on the order of file rename to solve the problem of rename atomicity. According to this idea, I sorted out a possible solution, but this will bring a lot of complexity when loading data (such as judging Which files are in the current directory, and then decide which files to load), I wonder whether this complexity (including the dependency between multiple files) will cause a lot of difficulties for users to understand. |
@chenyang8094 it looks complicated when you paint it like this, but i think that in fact it's just:
Then if one of the base files was loaded, proceed to load the other parts:
we can add an additional action in [2] to set a I haven't reviewed the code yet (will do so soon), but i assume that when you say "rewrite success" that's another rename operation that renames the I understand that a meta file can maybe make this a bit clearer, but it's also one more thing to take core of, so if the 1-4 plan above is right, i don't think it's that complicated. as you pointed out, the other thing to worry about is for users to know which files to copy / keep. another thing that feels odd to me is that in all these examples the as a side note, there have been some discussions in other issues in the repo lately about complications that can occur if the user configured his server to persist to both AOF and RDB files (not knowing what to load and where to take the last repl_offset from), and it was mentioned that once we implement this PR, there's no reason to ever do that anymore (since the AOFRW overhead vanishes). so considering that, it could be a nice idea to call it P.s. there's another state that's not mentioned here, which is a server that starts up with an initial configuration to write to an AOF that didn't exist in the past. in this case, there's no RDB portion, and i suppose that in the current code of the PR, that's the only case where there's only so bottom line, i'd suggest this scheme:
|
Indeed, it may be easier and more intuitive to use serial numbers for file naming, but these are not the core issues at present. I think aof loading are more complicated than you think. For the convenience of discussion, I directly express the core ideas as pseudo-code, because we also have a double write mechanism, so we also need to deal with
|
@chenyang8094 can you save me the effort and explicitly specify what problem your pseudo code comes to demonstrate? |
First of all, we can see that the above pseudo-code is very complicated. When redis is started, it must check which files currently exist and which files should be loaded. I think this is not a small burden for users (in the face of so many strange files). At the same time, the introduction of double writing (temp-dw.aof) may also be an incomprehensible but sometimes necessary thing, which adds complexity invisibly. The complexity of the above scheme is largely because we do not have a meta file to record and manage all our AOF files. In the absence of meta, we need to solve two problems:
In order to solve the above two problems, we can have the following corresponding means:
Here is the pseudo code:
I think the redo-based non-meta scheme is theoretically feasible, but it has two disadvantages:
Based on the above, I will explain the scheme with meta below (although my current code is written in accordance with no meta), I think the introduction of a meta will make things a lot easier. Let me make a name convention first:
In the meta file, we will record the names of all aof files opened by redis and the types of aof. There are three main types: b (base aof), h (history aof), n (new incr aof) Below I will use a few simple pictures to illustrate the idea of the reform plan: Suppose the current state is as follows: Next, start to perform a rewrite, there will be two steps at the beginning:
Note: If any of the above two steps fails, the redis process will exit, even if redis crashes between 1 and 2 (such as accidentally killed, etc.), the data is correct when redis restarts, the side effect is that there is an invalid aof file in the directory (appendonly.aof_2). During the rewrite, the executed commands will be written into appendonly.aof_i_2. After the rewrite is over, the child process will generate a temp-rewriteaof-bg-xxx.aof file. In the backgroundRewriteDoneHandler, we need to process the following operations:
Here is the pseudo code:
In summary, when there is a meta file, we get two benefits:
Comparing the two solutions I just mentioned (with meta and without meta), I prefer the design with meta (although the code I submitted before was based on the design without meta, but I think it is still very imperfect),It will also make redis look more like a modern database in terms of persistence., @oranagra @redis/core-team What do you think? I think we need to quickly determine which solution to adopt, and then I can write code to implement it according to this goal. |
correct me if i'm wrong, but this advantage or disadvantage it not directly related to meta or not. IIRC we created the double write mechanism in order to make sure that if AOFRW repeatedly fails, we don't create more and more files on the disk (file count remains capped to some 4). It seems to me that the meta file design is not really that much simpler than the other one, involving cron and 'h' files. i think that with the naming scheme i described above ( if we agree that both designs are valid, the pseudo code you wrote for either of them doesn't seem too complicated for redis IMHO, and my main concern is how users be able to easily manage these files. let's wait for more feedback and make a decision. |
seems we have two methods to eliminate the double I/O-write and double rewrite-buffer:
IMHO a meta file is easy to understand since we can have a global information file and lots of databases take this way. ping @redis/core-team (and maybe other users if you want) need more feedback, our goal is to eliminate double I/O-write and double rewrite-buffer, and at the same time we need make it easy to implement(from redis developer's POV) and maintain AOF files(from redis users' POV). |
I think the redo-based meta-free solution and the meta-based solution can ultimately achieve our goals, but as far as I know, many companies will back up @oranagra I think it is worthwhile to bring a clearer state at the cost of adding a meta file. Of course, this is just my idea. I am eager to get feedback from core-team and other interested parties. |
I had a lengthy discussion about the alternatives and future plans with @yossigo and @yoav-steinberg, we concluded that it would indeed be better to take the meta-file approach, mainly since it'll allow more flexibility in the future. features like keeping historical AOF files for longer so a user can restore the db to an older state (before the last rewrite), this is in line with #9326. I still think it would be a bad idea to create infinite number of files on the disk if AOFRW is repeatedly failing, so i still think we should set the threshold of switching to the double write solution after just one failure (don't see a reason to set that threshold to 2 or 5). Other random notes from the discussion: for this PR
for a followup PR:
For some future version:
@redis/core-team please respond if you wanna argue about any of the above. |
I very much agree with the several conclusions you made about this PR(1、can upgrades from old redis versions. 2、minimum the config changes. 3、completely forget the old way,etc.). This is also the key point of my previous thinking. I think another point is that the content of the aof meta file needs to be well designed (even some redundant fields are reserved) , so that we can obtain better flexibility when implementing other functions (such as PITR) in the future. I will implement a meta-based version as soon as possible so that we can discuss more detailed issues. |
Closing in favor of #9788 |
Hi, I implemented a new AOF persistence mechanism (currently it is still in the draft stage). Its main purpose is to remove the AOF rewrite buffer, which can save memory overhead during rewrite and shorten the time consumption of rewrite.
I call it the
ping-pong model
for the time being. The reason why I call it theping-pong model
is because it is similar to theping-pong buffer
. When one buffer is full, I can continue to write another buffer, and then exchange the two buffer without interrupting the entire processing flow.I divide AOF into four types (BASE, PING, PONG, TEMP). The AOF generated after rewrite is called BASE type AOF (configured by
appendfilename
configuration item), which is similar to the rdb file and represents a certain one Snapshot of redis data at the moment. That is, the AOF of the BASE type will not write incremental commands.Next is the PING type AOF (configured by the configuration item
aof-ping-filename
configuration item), which is basically the same as the current AOF concept, mainly used to store the incremental commands written, when the AOF persistence function is turned on at the time, the executed commands will be written into this type AOF one by one.When bgrewriteaof is triggered, a new AOF will be generated at this time, which is a PONG type AOF, which will continue to store the user's incremental commands. At the same time, the original PING type AOF will no longer write any data.
When the child process completes the rewrite task (
temp-rewriteaof-bg.aof
is generated), in the backgroundRewriteDoneHandler, I will rename thetemp-rewriteaof-bg.aof
file to a new BASE type AOF, and at the same time, I will rename the PONG type AOF to PING type. At this time, BASE AOF and PING AOF together constitute the current full amount of data. It can be seen that in the entire rewrite process, no rewrite buffer is used, and no data is exchanged between the parent process and the child process.The above is the success of rewrite. If the rewrite fails, we will eventually have three AOF files at the same time, namely BASE, PING and PONG, which together constitute the current full amount of data. This means that these AOF files need to be loaded in turn when redis restarts. Even worse, if bgrewrite occurs again at this time, we can no longer generate PONG2, PONG3 and other files without limitation, otherwise we will not know which files we need and their order.
Therefore, for the small probability event of rewrite failure, we will generate a new AOF file, which we temporarily call
temp.aof
. The next incremental command will be "simultaneously" double written to PONG type AOF andtemp.aof
. If this rewrite is successful, renametemp.aof
to PING type AOF, and both the original PING and PONG type AOFs can be deleted. If this rewrite still fails, just deletetemp.aof
, the original BASE/PING/PONG still constitute the complete data.Currently, this solution seems to have many problems (although the tests have already been run). For example, how to ensure the atomicity of multiple file renames. Since we did not record any meta informations for AOF files, we must use double writing and multiple renames to limit the number of AOF files. In fact, in our internal practice, we designed a meta file for the AOF file(s) to record the information (including the file name and sequence, etc.), so that we do not need to rename the newly generated AOF file. But this scheme is obviously inappropriate in the community, because once relying on the meta file, it means that it is no longer compatible with the previous redis version.
So, this is my current thinking, I don’t know if I have expressed it clearly. I think this idea is okay, but we still have some problems to solve (such as the atomic problem of multi-file modification, etc.). @redis/core-team Do you have any better suggestions?
TODO: