-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disk usage based eviction: consider relative LRU order #5304
Comments
Perhaps no floats are needed, the EDIT: except that it will not be invariant to tenants having different amounts of layers, so we will still need to normalize it to f32. |
Discussed this with Heikki. He suggested reviewing what kind of algorithms there are in literature. |
Adds a new disk usage based eviction option, EvictionOrder, which selects whether to use the current `AbsoluteAccessed` or this new proposed but not yet tested `RelativeAccessed`. Additionally a fudge factor was noticed while implementing this, which might help sparing smaller tenants at the expense of targeting larger tenants. Cc: #5304 Co-authored-by: Arpad Müller <arpad@neon.tech>
Next steps:
|
Discussion from the planning meeting:
|
I just failed to see this earlier on #6136. layer counts are used as an abstraction, and each of the two tenants lose proportionally about the same amount of layers. sadly there is no difference in between `relative_spare` and `relative_equal` as both of these end up evicting the exact same amount of layers, but I'll try to add later another test for those. Cc: #5304
…bsolute (#6384) With testing the new eviction order there is a problem of all of the (currently rare) disk usage based evictions being rare and unique; this PR adds a human readable summary of what absolute order would had done and what the relative order does. Assumption is that these loggings will make the few evictions runs in staging more useful. Cc: #5304 for allowing testing in the staging
This has been tested a bit on staging. For the most part the results are encouraging via
It was interesting that in some cases absolute ordering did a more or less better job by only evicting from a single fast growing tenant and not any idle. I suspect this is because of imitation and per timeline eviction task: in our staging there is very little activity and most tenants on
Full logs can be found via this search. |
Refactor out test_disk_usage_eviction tenant creation and add a custom case with 4 tenants, 3 made with pgbench scale=1 and 1 made with pgbench scale=4. Because the tenants are created in order of scales [1, 1, 1, 4] this is simple enough to demonstrate the problem with using absolute access times, because on a disk usage based eviction run we will disproportionally target the *first* scale=1 tenant(s), and the later larger tenant does not lose anything. This test is not enough to show the difference between `relative_equal` and `relative_spare` (the fudge factor); much larger scale will be needed for "the large tenant", but that will make debug mode tests slower. Cc: #5304
Biggest problem was the 10min "layer collection" which happened together with Layer deletions hanging. Only #6634 so far as a partial solution. Using quotes around "layer collection" because it may have been the "layer collection" and absolute order reporting. However for this case the reporting was easy since there was just one tenant (above log message). Layer deletions hanging can be explained with |
Consideration complete, didn't find anything else needing to be updated here. |
Currently our disk usage based eviction evicts layers in the absolute order of recent accesses. It means that for example a single new timeline will first see all other timeline layers get evicted before one of it's layers gets evicted.
What if instead of using the absolute timestamps around below we would come up with a 0..1 f32 based on how relatively recently the layer has been accessed?
"Relatively recently": for layer
x
it'srelative_recency
is1.0 - (x.last_activity_ts.as_secs_f32() / oldest_layer_access.as_secs_f32())
so that 1.0 would be for the most recently accessed layer and 0.0 for the oldest access.This relative measure would put all timelines on the same equal footing, and would allow to handle the fast growing timeline at worst giving it some slower performance because of trashed layers but for the overall health of.
Relevant parts:
neon/pageserver/src/disk_usage_eviction_task.rs
Lines 570 to 571 in ffd146c
neon/pageserver/src/disk_usage_eviction_task.rs
Lines 592 to 593 in ffd146c
The text was updated successfully, but these errors were encountered: