refactor: remove eviction batching #6060

koivunej · 2023-12-07T10:13:19Z

We no longer have layer_removal_cs since #5108, we no longer need batching.

pageserver/src/disk_usage_eviction_task.rs

github-actions · 2023-12-07T10:52:03Z

2148 tests run: 2064 passed, 0 failed, 84 skipped (full report)

Flaky tests (1)

Postgres 14

test_pageserver_restarts_under_worload: debug

Code coverage (full report)

functions: 54.8% (9352 of 17067 functions)
lines: 82.0% (54280 of 66178 lines)

_{The comment gets automatically updated with the latest test results
0b4109b at 2023-12-11T18:17:04.756Z :recycle:}

pageserver/src/disk_usage_eviction_task.rs

problame

The disk-usage-based eviction batched by timeline.
IIRC one goal behind that was to only update the IndexPart once, instead of on each individual eviction.
I guess we already lost that property when we introduced the struct Layer?

Also, I feel like the per-timeline eviction code could also benefit from the slightly advanced joinset acrobatics you're doing in the disk-usage-based eviction. Not sure if it's easy to extract that into a common function.

If you can't extract it into a common function, please add a comment explaining the acrobatics. My understanding is that it's to limit the number of pending evict_and_wait tasks?

pageserver/src/disk_usage_eviction_task.rs

koivunej · 2023-12-11T13:07:38Z

The disk-usage-based eviction batched by timeline.
IIRC one goal behind that was to only update the IndexPart once, instead of on each individual eviction.
I guess we already lost that property when we introduced the struct Layer?

No. The unlinking has already done at the end of compaction or GC, it was introduced on #5645 perhaps.

one goal behind that was to only update the IndexPart once, instead of on each individual eviction.

I did not remember that, but compaction is still scheduling 1-2 updates even with the inverted l0=>l1 vs. image layer ordering going to prod soon (#5950) and gc schedules one. My motivation for this PR is recently gained absence of layer_removal_cs.

One could say we should have at least one test asserting how many index part updates we do, just in case.

koivunej · 2023-12-11T16:14:41Z

Noted thing: Layer::evict_and_wait can hang by bug (I am quite sure we've seen zero of such bugs) but anyways there should be a timeout like I envisioned in #4745. I'll add it in a follow-up, added only bonus logging here.... Could actually that in the follow-up as well, so this is almost like a refactoring PR.

Failures on Postgres 14

test_peer_recovery: debug

This failure looks weird, but not caused by this PR.

…rics (#6131) Because of bugs evictions could hang and pause disk usage eviction task. One such bug is known and fixed #6928. Guard each layer eviction with a modest timeout deeming timeouted evictions as failures, to be conservative. In addition, add logging and metrics recording on each eviction iteration: - log collection completed with duration and amount of layers - per tenant collection time is observed in a new histogram - per tenant layer count is observed in a new histogram - record metric for collected, selected and evicted layer counts - log if eviction takes more than 10s - log eviction completion with eviction duration Additionally remove dead code for which no dead code warnings appeared in earlier PR. Follow-up to: #6060.

koivunej requested a review from a team as a code owner December 7, 2023 10:13

koivunej requested review from problame and removed request for a team December 7, 2023 10:13

koivunej marked this pull request as draft December 7, 2023 10:13

koivunej commented Dec 7, 2023

View reviewed changes

pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved

koivunej commented Dec 7, 2023

View reviewed changes

pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved

problame removed their request for review December 7, 2023 12:26

koivunej added 5 commits December 8, 2023 12:37

refactor: remove Timeline::evict_batch

0826aea

add back witness, cleanup the http api

ea407e7

chore: remove unused hashmap import

a039617

chore: clippy

6f2810e

doc: fix bad link

0793a76

koivunej force-pushed the remove_eviction_batching branch from f8ea50a to 7769901 Compare December 8, 2023 12:43

refactor: stop using indices if not necessary

835ee7a

koivunej force-pushed the remove_eviction_batching branch from 7769901 to 835ee7a Compare December 8, 2023 13:42

koivunej marked this pull request as ready for review December 8, 2023 16:40

koivunej requested review from problame and jcsp December 8, 2023 16:41

jcsp reviewed Dec 11, 2023

View reviewed changes

pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved

pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved

pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved

problame reviewed Dec 11, 2023

View reviewed changes

pageserver/src/disk_usage_eviction_task.rs Show resolved Hide resolved

pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved

koivunej added 3 commits December 11, 2023 14:29

fix: opportunistically consume every round

0a8b882

refactor: rename join_all => evict_layers

d8690dd

doc: cleanup

0b4109b

koivunej force-pushed the remove_eviction_batching branch from 931511a to 0b4109b Compare December 11, 2023 16:49

koivunej changed the title ~~fix: remove eviction batching~~ refactor: remove eviction batching Dec 13, 2023

koivunej enabled auto-merge (squash) December 13, 2023 11:17

jcsp approved these changes Dec 13, 2023

View reviewed changes

koivunej merged commit a919b86 into main Dec 13, 2023
41 checks passed

koivunej deleted the remove_eviction_batching branch December 13, 2023 16:05

This was referenced Dec 13, 2023

dube: timeout individual layer evictions, log progress and record metrics #6131

Merged

Compaction and eviction are mutually exclusive #4745

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: remove eviction batching #6060

refactor: remove eviction batching #6060

koivunej commented Dec 7, 2023 •

edited

Loading

github-actions bot commented Dec 7, 2023 •

edited

Loading

Postgres 14

problame left a comment

koivunej commented Dec 11, 2023 •

edited

Loading

koivunej commented Dec 11, 2023 •

edited

Loading

Failures on Postgres 14

refactor: remove eviction batching #6060

refactor: remove eviction batching #6060

Conversation

koivunej commented Dec 7, 2023 • edited Loading

github-actions bot commented Dec 7, 2023 • edited Loading

2148 tests run: 2064 passed, 0 failed, 84 skipped (full report)

Postgres 14

Code coverage (full report)

problame left a comment

Choose a reason for hiding this comment

koivunej commented Dec 11, 2023 • edited Loading

koivunej commented Dec 11, 2023 • edited Loading

Failures on Postgres 14

koivunej commented Dec 7, 2023 •

edited

Loading

github-actions bot commented Dec 7, 2023 •

edited

Loading

koivunej commented Dec 11, 2023 •

edited

Loading

koivunej commented Dec 11, 2023 •

edited

Loading