Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.3.x] cloud_storage: Implement "carryover" cache trimming mechanism #18138

Merged

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Apr 29, 2024

Backport of the #18056
Fixes #18133

The carryover is a collection of deletion candidates found during the previous trim. The trim collects full list of objects in the directory and sorts them in LRU order. Then it deletes first N objects. We're saving first M objects that remain in the list after the trim. These objects are deletion candidates.

During the next trim the cache service first uses carryover list to a quick cache trim. This trim doesn't need a directory scan and it can quickly decrease bytes/objects counts so other readers could reserve space successfully. The trim doesn't delete objects from the carryover list blindly. It compares access time recorded in the carryover list to the access time stored in the accesstime_tracker. If the time is different then the object was accessed and the trim is not deleting it during this phase.

New configuration option cloud_storage_cache_trim_carryover is added. This config option sets the limit on the size of the carryover list. The list is stored on shard 0. The default value is 512. We are storing a file path for every object so this list shouldn't be too big. Even relatively small carryover list might be able to make a difference and prevent readers from being blocked.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Improve cloud storage cache to prevent readers from being blocked during cache eviction.

michael-redpanda and others added 10 commits April 29, 2024 13:08
Signed-off-by: Michael Boquard <michael@redpanda.com>
(cherry picked from commit e088ac3)
Add new parameter that controls cache carryover behavior.

(cherry picked from commit 940fcd4)
The "carryover" behavior allows cache to use information from the
previous trim to quickly trim the cache without scanning the whole
directory. This allows cache to avoid blocking readers.

In a situation when the cache cntains very large number of files the
recursive directory walk could take few minutes. We're not allowing
number of objects stored in the cache to overshoot so all the readers
are blocked until the walk is finished.

This commit adds new "carryover" trim mechanism which is running before
the normal trim and uses information obtained during the previous fast
or full trim to delete some objects wihtout walking the directory tree.

(cherry picked from commit 5ece9d4)
Change the configuration parameter and treat the value as number of
bytes that we can use to store carryover data.

(cherry picked from commit e1c30bc)
Reserve memory units for the carryover mechanism in
materialized_resrouces.

(cherry picked from commit 0f84fdb)
(cherry picked from commit 61a09b4)
In case if carryover trim was able to release enough space start trim in
the background and return early. This unblocks the hydration that
reserved space and triggered the trim. We need to run normal trim anyway
to avoid corner case when the carryover list becomes empty and we have
to block readers for the duration of the full trim.

(cherry picked from commit 9f1b51b)
@Lazin Lazin force-pushed the backport-pr-18056-v23.3.x-62 branch from ada7062 to bc434c6 Compare April 29, 2024 13:08
@piyushredpanda piyushredpanda added this to the v23.3.14 milestone Apr 29, 2024
@piyushredpanda piyushredpanda merged commit 36f688c into redpanda-data:v23.3.x Apr 29, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants