Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] storage: do not compact below log start offset #9611

Merged
merged 6 commits into from
Mar 30, 2023

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Mar 22, 2023

Backport of PRs #9483 and #9487 (and e7ee3c6) to v23.1.x.

In v23.1.x time based retention runs only when size based retention does not run. #9487 changes this to apply both retention policies on each housekeeping run, however this is too
big of a behavioural change to backport. Hence, this backport preserves the v23.1.x behaviour of exclusive application for retention policies, while still including the bug fix from #9483.

Without knowing the `is_collectible()` condition and the "unlikely"
condition are mutually exclusive, it looks as though we're abandoning a
futurized `gc()`, which is quite alarming.

From John:
"Then 95320bc fixed the bug by adding the is_collectable check, and
the unlikely block has been dead code ever since."
dotnwat and others added 4 commits March 29, 2023 16:11
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
The abort check in disk_log::gc effectively ran inline with
the housekeeping loop in log_manager. The log manager should
have be returning early on shutdown in any case, so we can
remove the check in disk_log::gc. In either case, strategic
checks at lower levels of storage remain in place.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Compaction and prefix truncation (retention) of the local log are async
operations that can race, despite the fact that their futures are
ordered in `disk_log_impl::compact`. This is because `disk_log_impl::gc`
does not remove any segments inline, but simply notifies the log
eviction STM of the offset recommended by the retention policy. Segments
are removed when Raft calls into `disk_log_impl::prefix_truncate`.

Hence, compaction may be operating on segments that are being removed,
which leads to strange races.

The fix is to skip compaction of segments that are below the new start
offset which the retention policy determined.

Note: I've had to slightly modify `disk_log_impl::gc()` when applying
this cherry-picked commit in order to avoid applying both retention
policies at once.
Previously `disk_log_impl::garbage_collect_segments` had a code path
that actually removed segments from disk. Turns out that this code path
was only used by our unit tests because the normal deletion flow
is via the Raft layer after the eviction notification generated in the
same function is processed. I have cross referenced this with all the
logs available to us, and that code path is indeed dead.

This was particularly bad because our unit testing were using an
entirely different code path for retention than what happens in
production.
@jcsp jcsp merged commit 6832c35 into redpanda-data:v23.1.x Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants