Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.1.x] cloud_storage: Implement "carryover" cache trimming mechanism #18134

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #18056

Lazin added 10 commits April 29, 2024 10:39
Add new parameter that controls cache carryover behavior.

(cherry picked from commit 940fcd4)
The "carryover" behavior allows cache to use information from the
previous trim to quickly trim the cache without scanning the whole
directory. This allows cache to avoid blocking readers.

In a situation when the cache cntains very large number of files the
recursive directory walk could take few minutes. We're not allowing
number of objects stored in the cache to overshoot so all the readers
are blocked until the walk is finished.

This commit adds new "carryover" trim mechanism which is running before
the normal trim and uses information obtained during the previous fast
or full trim to delete some objects wihtout walking the directory tree.

(cherry picked from commit 5ece9d4)
Change the configuration parameter and treat the value as number of
bytes that we can use to store carryover data.

(cherry picked from commit e1c30bc)
Reserve memory units for the carryover mechanism in
materialized_resrouces.

(cherry picked from commit 0f84fdb)
(cherry picked from commit 61a09b4)
In case if carryover trim was able to release enough space start trim in
the background and return early. This unblocks the hydration that
reserved space and triggered the trim. We need to run normal trim anyway
to avoid corner case when the carryover list becomes empty and we have
to block readers for the duration of the full trim.

(cherry picked from commit 9f1b51b)
@vbotbuildovich vbotbuildovich requested a review from a team as a code owner April 29, 2024 10:39
@vbotbuildovich vbotbuildovich added this to the v24.1.x-next milestone Apr 29, 2024
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Apr 29, 2024
sm::make_counter(
"carryover_trims",
[this] { return _carryover_trims; },
sm::description("Number of times we invoked carryover trim."))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sm::description("Number of times we invoked carryover trim."))
sm::description("Number of times carryover trim has been invoked."))

vlog(
cst_log.info,
"Failed to reserve {} units for the cache carryover "
"mechanism because tiered-storage is likely under memory "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"mechanism because tiered-storage is likely under memory "
"mechanism because Tiered Storage is likely under memory "

@@ -2359,6 +2359,18 @@ configuration::configuration()
// Enough for a >1TiB cache of 16MiB objects. Decrease this in case
// of issues with trim performance.
100000)
, cloud_storage_cache_trim_carryover_bytes(
*this,
"cloud_storage_cache_trim_carryover_bytes",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@piyushredpanda
Copy link
Contributor

#13841 and #15312 are the known failures

@piyushredpanda piyushredpanda merged commit f2e6efb into redpanda-data:v24.1.x May 3, 2024
14 of 18 checks passed
@piyushredpanda piyushredpanda modified the milestones: v24.1.x-next, v24.1.3 May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants