pageserver: limit total ephemeral layer bytes #7218

jcsp · 2024-03-23T19:13:30Z

Problem

Follows: #7182

Sufficient concurrent writes could OOM a pageserver from the size of indices on all the InMemoryLayer instances.
Enforcement of checkpoint_period only happened if there were some writes.

Closes: #6916

Summary of changes

Add ephemeral_bytes_per_memory_kb config property. This controls the ratio of ephemeral layer capacity to memory capacity. The weird unit is to enable making the ratio less than 1:1 (set this property to 1024 to use 1MB of ephemeral layers for every 1MB of RAM, set it smaller to get a fraction).
Implement background layer rolling checks in Timeline::compaction_iteration -- this ensures we apply layer rolling policy in the absence of writes.
During background checks, if the total ephemeral layer size has exceeded the limit, then roll layers whose size is greater than the mean size of all ephemeral layers.
Remove the tick() path from walreceiver: it isn't needed any more now that we do equivalent checks from compaction_iteration.
Add tests for the above.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

github-actions · 2024-03-23T19:54:51Z

2730 tests run: 2590 passed, 0 failed, 140 skipped (full report)

Flaky tests (1)

Postgres 16

test_deletion_queue_recovery[no-validate-lose]: debug

Code coverage* (full report)

functions: 28.2% (6299 of 22348 functions)
lines: 47.0% (44254 of 94189 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
23ca0b3 at 2024-03-26T15:56:44.437Z :recycle:}

pageserver/src/config.rs

pageserver/src/tenant/timeline.rs

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>

…7594) ## Problem In testing of the earlier fix for OOMs under heavy write load (#7218), we saw that the limit on ephemeral layer size wasn't being reliably enforced. That was diagnosed as being due to overwhelmed compaction loops: most tenants were waiting on the semaphore for background tasks, and thereby not running the function that proactively rolls layers frequently enough. Related: #6939 ## Summary of changes - Create a new per-tenant background loop for "ingest housekeeping", which invokes maybe_freeze_ephemeral_layer() without taking the background task semaphore. - Downgrade to DEBUG a log line in maybe_freeze_ephemeral_layer that had been INFO, but turns out to be pretty common in the field. There's some discussion on the issue (#6939 (comment)) about alternatives for calling this maybe_freeze_epemeral_layer periodically without it getting stuck behind compaction. A whole task just for this feels like kind of a big hammer, but we may in future find that there are other pieces of lightweight housekeeping that we want to do here too. Why is it okay to call maybe_freeze_ephemeral_layer outside of the background tasks semaphore? - this is the same work we would do anyway if we receive writes from the safekeeper, just done a bit sooner. - The period of the new task is generously jittered (+/- 5%), so when the ephemeral layer size tips over the threshold, we shouldn't see an excessively aggressive thundering herd of layer freezes (and only layers larger than the mean layer size will be frozen) - All that said, this is an imperfect approach that relies on having a generous amount of RAM to dip into when we need to freeze somewhat urgently. It would be nice in future to also block compaction/GC when we recognize resource stress and need to do other work (like layer freezing) to reduce memory footprint.

jcsp added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels Mar 23, 2024

jcsp changed the title ~~Jcsp/issue 6916 pt2~~ pageserver: limit total ephemeral layer bytes Mar 23, 2024

jcsp added 7 commits March 25, 2024 14:29

Implement background layer rolls

b1a7d9f

tests: add test_idle_checkpoints

675a9b7

pageserver: add ephemeral_bytes_per_memory_bytes config

1cf6ae2

Limit total ephemeral layer size

46816ce

tests: add test_total_size_limit

b478994

Rename test

61230af

pageserver: remove ticking from walreceiver

04f20d8

jcsp force-pushed the jcsp/issue-6916-pt2 branch from d49ccd1 to 04f20d8 Compare March 25, 2024 16:31

jcsp marked this pull request as ready for review March 26, 2024 09:27

jcsp requested a review from a team as a code owner March 26, 2024 09:27

jcsp requested review from arpad-m and VladLazar March 26, 2024 09:27

arpad-m reviewed Mar 26, 2024

View reviewed changes

pageserver/src/config.rs Outdated Show resolved Hide resolved

VladLazar approved these changes Mar 26, 2024

View reviewed changes

pageserver/src/tenant/timeline.rs Show resolved Hide resolved

Update pageserver/src/config.rs

23ca0b3

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>

jcsp merged commit 47d2b3a into main Mar 26, 2024
46 of 50 checks passed

jcsp deleted the jcsp/issue-6916-pt2 branch March 26, 2024 15:45

jcsp mentioned this pull request Mar 26, 2024

OOMs in staging triggered by production-like pageserver workload benchmark #6939

Closed

jcsp mentioned this pull request May 2, 2024

pageserver: call maybe_freeze_ephemeral_layer from a dedicated task #7594

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: limit total ephemeral layer bytes #7218

pageserver: limit total ephemeral layer bytes #7218

jcsp commented Mar 23, 2024 •

edited

github-actions bot commented Mar 23, 2024 •

edited

Postgres 16

pageserver: limit total ephemeral layer bytes #7218

pageserver: limit total ephemeral layer bytes #7218

Conversation

jcsp commented Mar 23, 2024 • edited

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Mar 23, 2024 • edited

2730 tests run: 2590 passed, 0 failed, 140 skipped (full report)

Postgres 16

Code coverage* (full report)

jcsp commented Mar 23, 2024 •

edited

github-actions bot commented Mar 23, 2024 •

edited