Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Multi Part AOF mechanism to avoid AOFRW overheads. #9788

Merged
merged 65 commits into from
Jan 3, 2022
Merged

Implement Multi Part AOF mechanism to avoid AOFRW overheads. #9788

merged 65 commits into from
Jan 3, 2022

Conversation

chenyang8094
Copy link
Collaborator

@chenyang8094 chenyang8094 commented Nov 16, 2021

Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.

This PR implements the Meta-File Based Multi-Part AOF mechanism described in 9539.

The main issues with the the original AOFRW mechanism are:

  • buffering of commands that are processed during rewrite (consuming a lot of RAM)
  • freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
  • double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)

The main modifications of this PR:

  1. Remove the AOF rewrite buffer and related code.
  2. Divide the AOF into multiple files, they are classified as two types, one is the the BASE type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one BASE file at most. The second is INCR type, may have more than one. They represent the incremental commands since the last AOFRW.
  3. Use a AOF manifest file to record and manage these AOF files mentioned above.
  4. The original configuration of appendfilename will be the base part of the new file name, for example: appendonly.aof.1.base.rdb and appendonly.aof.2.incr.aof
  5. Add manifest-related TCL tests, and modified some existing tests that depend on the appendfilename
  6. Remove the aof_rewrite_buffer_length field in info.
  7. Add aof-disable-auto-gc configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now.
  8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution
    of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
  9. Support upgrade (load) data from old version redis.
  10. Add appenddirname configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory.
  11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if aof-load-truncated is enabled.

To Do:

  • On upgrades from old redis versions, we need to detect the old AOF file and rename it / create a meta file.
  • At present, if AOFRW fails, redis will automatically retry. If it continues to fail, we may get a lot of very small INCR files. So we need an AOFRW limiting measure.

@eduardobr
Copy link
Contributor

@chenyang8094 Just out of curiosity, does this create a foundation to extend #8015 to AOF-only and not only RDB-only setups? Thank you

@oranagra oranagra added this to Backlog in 7.0 via automation Nov 16, 2021
@oranagra oranagra moved this from Backlog to In Review in 7.0 Nov 16, 2021
@oranagra oranagra mentioned this pull request Nov 16, 2021
6 tasks
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenyang8094 Thank you.
I started reviewing the PR, and also edited the PR top comment to make it clearer (please check if it looks right).
So far i reviewed just parts of it (namely all the files excluding aof.c and the tests).
I.d like to post my comments for discussion before i proceed with the other parts, to see if we agree on things, or maybe i have a misunderstanding.

redis.conf Outdated Show resolved Hide resolved
redis.conf Outdated Show resolved Hide resolved
redis.conf Show resolved Hide resolved
redis.conf Outdated Show resolved Hide resolved
redis.conf Outdated Show resolved Hide resolved
src/config.c Outdated Show resolved Hide resolved
src/config.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.c Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
@chenyang8094
Copy link
Collaborator Author

@chenyang8094 Just out of curiosity, does this create a foundation to extend #8015 to AOF-only and not only RDB-only setups? Thank you

Sure, you can see some followup PR plans mentioned by oranagra herer.

@oranagra
Copy link
Member

I've just created a few new issues to track these ideas (don't like them sitting around in a comment in a closed PR)
I moved one note of that post to a todo bullet in the top comment of this PR.
The first one of the other issues should be done for 7.0 IMHO:

The other ones might not make it for this release, but we can at least start discussion and mark them for next release:

@chenyang8094
Copy link
Collaborator Author

chenyang8094 commented Nov 17, 2021

@oranagra About upgrades from old redis versions, I think we can't directly rename the old AOF to the new name, because we still can't solve the atomic problem of rename and update manifest.

We can write the old AOF name directly into the manifest, so that we can start loading normally. Even if the process crashes after writing the manifest, that is no problem, because we can find the AOF from the manifest. Once AOFRW is executed once, all files will have new file names. (current code is implemented according to this idea)

@oranagra
Copy link
Member

ok, sounds good to me. so the AOF file (possibly with preamble content) will have it's old name, and will be listed in the manifest.
once a rewrite happens it'll be tagged as history and later deleted.
let's be sure to add a test for it (storing an old AOF preamble file in the assets folder)

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done reviewing aof.c, didn't review the tests yet.

redis.conf Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
src/config.c Outdated Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
src/aof.c Outdated Show resolved Hide resolved
src/aof.c Show resolved Hide resolved
@oranagra
Copy link
Member

oranagra commented Jan 3, 2022

🎉 more than 400 comments 2500 LOC, and some 3 months (counting form #9539)
@chenyang8094 thank you for all the time you put in.

@chenyang8094
Copy link
Collaborator Author

🎉 more than 400 comments 2500 LOC, and some 3 months (counting form #9539) @chenyang8094 thank you for all the time you put in.

Thank you, now I am going to deal with the next PR.

@oranagra
Copy link
Member

oranagra commented Jan 23, 2022

@chenyang8094 can you please look into this failure:
https://github.com/redis/redis/runs/4890129519?check_suite_focus=true

*** [err]: AOF will trigger limit when AOFRW fails many times in tests/integration/aof-multi-part.tcl
aof rewrite did trigger limit

another one:
https://github.com/redis/redis/runs/4875669104?check_suite_focus=true

@enjoy-binbin
Copy link
Collaborator

i also saw it in my daily. DB 9: 13640 keys
i guess r config set auto-aof-rewrite-min-size 1mb, in FreeBSD, sometimes it was so slow, and there are not enought key to tigger the rewrite.

so maybe change it to 1KB, or do a for loop and insert some keys (maybe at least 20000 keys)

@oranagra
Copy link
Member

maybe..
maybe we just need a longer timeout for this check.

maybe instead of using start_write_load we rather just add just enough data manually, and only then start the wait_for_condition.
i suppose someone needs to add more prints and reproduce this so that whatever fix we add we can have some certainty it's gonna solve the problem.

@enjoy-binbin
Copy link
Collaborator

enjoy-binbin commented Jan 23, 2022

it is easy to reproduce if we delete all the start_write_load (in case they generate enough keys), and replace it with some for loop set k v, but not let it reach 20000 keys.

like in this commit (7d4f50f), if we change the 20000 to be smaller, it will fail because not able to touch 1MB. (i can trigger a CI in FreeBSD and see the result, but i think this is the right fix, the start_write_load seems a bit fragile). it also speed up the test (total: 32s -> 25s, in my machine. Centos7.

but the time it should be similar in FreeBSD, because we need to generate that many keys all the time (successful use case too))

ci: https://github.com/enjoy-binbin/redis/runs/4912825635?check_suite_focus=true#step:4:3028

maybe we just need a longer timeout for this check.

btw, i think it is not enought, the start_write_load will last 10s, in this case, i think in thoes case of failures none of them produce enough keys.

@chenyang8094
Copy link
Collaborator Author

chenyang8094 commented Jan 24, 2022

I think we can change 1MB to 1KB and Then manually construct a 1KB more (which is easy) data to get a definite test status. @oranagra @enjoy-binbin WDYT?

enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Feb 1, 2022
In redis#9788, now we stores all persistent append-only files in
a dedicated directory. The name of the directory is determined
by the appenddirname configuration parameter in redis.conf

Update create-cluster clean to clean this default directory.
Fixes redis#10222
panjf2000 pushed a commit to panjf2000/redis that referenced this pull request Feb 3, 2022
)

Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.

The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)

The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
  it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
  one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
  incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
  `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
  It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
  we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
  delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
  period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
  manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
  `aof-load-truncated` is enabled.

Co-authored-by: Oran Agra <oran@redislabs.com>
oranagra pushed a commit that referenced this pull request Feb 7, 2022
In #9788, now we stores all persistent append-only files in
a dedicated directory. The name of the directory is determined
by the appenddirname configuration parameter in redis.conf.
Now each node have a separate folder.
Update create-cluster clean to clean this default directory.
@@ -16,7 +16,8 @@ doc-tools
release
misc/*
src/release.h
appendonly.aof
appendonly.aof.*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be appendonly.aof* instead of appendonly.aof.*
now, appendonly.aof will not be ignored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you're right, but the current implementation doesn't create a file named appendonly.aof unless you manually copy an old-style AOF for upgrade testing. So I think we just need to ignore the file we're going to create. @oranagra WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's a good idea to ignore both.
i.e. both the ones we create, and the ones we created in the past, that are still left in the folder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only developers pay attention to .gitignore, and when I updated the unstable brance code, appendonly.aof caught my attention, so I started this discussion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra @yangbodong22011 Sounds reasonable, I'll make a PR to add it, thanks.

enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Mar 26, 2023
…YNC_ALWAYS

This PR fix two bugs:

1. About the MP-AOF one, that code assumes that we started a new AOF file just
now (when we have a new base into which we're gonna incrementally write), but the
fact is that with multipart-aof, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to re-issue
SELECT, and i suppose there's no reason to update any of the fsync metrics.
This should have been deleted with MP-AOF (introduced in redis#9788, 7.0).
The aof_fsync_offset it updates will cause us to miss an fsync in flushAppendOnlyFile,
if we stop write commands in AOF_FSYNC_EVERYSEC

2. About the AOF_FSYNC_ALWAYS one, it is follow redis#6053 (the details is in redis#5985),
we also check AOF_FSYNC_ALWAYS to prevent users from switching from everysec to always.

We detected these problems with WAITAOF test (redis 7.2), introduced in redis#11713.
oranagra pushed a commit that referenced this pull request Mar 29, 2023
…SYNC_ALWAYS (#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang. 

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
oranagra pushed a commit to oranagra/redis that referenced this pull request Apr 17, 2023
…SYNC_ALWAYS (redis#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in redis#11713), could make the `WAITAOF` test hang.

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in redis#9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow redis#6053
(the details is in redis#5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

(cherry picked from commit cb17178)
oranagra pushed a commit that referenced this pull request Apr 17, 2023
…SYNC_ALWAYS (#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

(cherry picked from commit cb17178)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus
Projects
Archived in project
7.0
Done
Development

Successfully merging this pull request may close these issues.

None yet

10 participants