-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v22.3.x] storage: do not compact below log start offset #9612
Merged
VladLazar
merged 6 commits into
redpanda-data:v22.3.x
from
VladLazar:backport-9483-to-v22.3.x
Apr 3, 2023
Merged
[v22.3.x] storage: do not compact below log start offset #9612
VladLazar
merged 6 commits into
redpanda-data:v22.3.x
from
VladLazar:backport-9483-to-v22.3.x
Apr 3, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b8f32bd
to
744d0b7
Compare
3e21df6
to
41f2790
Compare
CI failure is likely an instance of #9293, but there's no logs so I can't tell for certain |
Without knowing the `is_collectible()` condition and the "unlikely" condition are mutually exclusive, it looks as though we're abandoning a futurized `gc()`, which is quite alarming. From John: "Then 95320bc fixed the bug by adding the is_collectable check, and the unlikely block has been dead code ever since." Note: I had to tweak this commit in order to backport to v22.3.x since the name of the gate involved changed from `_compaction_gate` to `_compaction_housekeeping_gate` in dev. I've used the old name in this commit.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
The abort check in disk_log::gc effectively ran inline with the housekeeping loop in log_manager. The log manager should have be returning early on shutdown in any case, so we can remove the check in disk_log::gc. In either case, strategic checks at lower levels of storage remain in place. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Compaction and prefix truncation (retention) of the local log are async operations that can race, despite the fact that their futures are ordered in `disk_log_impl::compact`. This is because `disk_log_impl::gc` does not remove any segments inline, but simply notifies the log eviction STM of the offset recommended by the retention policy. Segments are removed when Raft calls into `disk_log_impl::prefix_truncate`. Hence, compaction may be operating on segments that are being removed, which leads to strange races. The fix is to skip compaction of segments that are below the new start offset which the retention policy determined. Note: I've had to slightly modify `disk_log_impl::gc()` when applying this cherry-picked commit in order to avoid applying both retention policies at once.
Previously `disk_log_impl::garbage_collect_segments` had a code path that actually removed segments from disk. Turns out that this code path was only used by our unit tests because the normal deletion flow is via the Raft layer after the eviction notification generated in the same function is processed. I have cross referenced this with all the logs available to us, and that code path is indeed dead. This was particularly bad because our unit testing were using an entirely different code path for retention than what happens in production.
41f2790
to
452b1b5
Compare
mmaslankaprv
approved these changes
Apr 3, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of PRs #9483 and #9487 (and e7ee3c6) to v22.3.x.
In v22.3.x time based retention runs only when size based retention does not run. #9487 changes this to apply both retention policies on each housekeeping run, however this is too
big of a behavioural change to backport. Hence, this backport preserves the v22.3.x behaviour of exclusive application for retention policies, while still including the bug fix from #9483.
Release Notes
Bug Fixes
happened below the start offset of the log. It did not have a noticeable impact upon user workloads.