Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove eviction batching #6060

Merged
merged 9 commits into from
Dec 13, 2023
Merged

refactor: remove eviction batching #6060

merged 9 commits into from
Dec 13, 2023

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented Dec 7, 2023

We no longer have layer_removal_cs since #5108, we no longer need batching.

@koivunej koivunej requested a review from a team as a code owner December 7, 2023 10:13
@koivunej koivunej requested review from problame and removed request for a team December 7, 2023 10:13
@koivunej koivunej marked this pull request as draft December 7, 2023 10:13
Copy link

github-actions bot commented Dec 7, 2023

2148 tests run: 2064 passed, 0 failed, 84 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_pageserver_restarts_under_worload: debug

Code coverage (full report)

  • functions: 54.8% (9352 of 17067 functions)
  • lines: 82.0% (54280 of 66178 lines)

The comment gets automatically updated with the latest test results
0b4109b at 2023-12-11T18:17:04.756Z :recycle:

@problame problame removed their request for review December 7, 2023 12:26
@koivunej koivunej marked this pull request as ready for review December 8, 2023 16:40
pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved
pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved
pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disk-usage-based eviction batched by timeline.
IIRC one goal behind that was to only update the IndexPart once, instead of on each individual eviction.
I guess we already lost that property when we introduced the struct Layer?

Also, I feel like the per-timeline eviction code could also benefit from the slightly advanced joinset acrobatics you're doing in the disk-usage-based eviction. Not sure if it's easy to extract that into a common function.

If you can't extract it into a common function, please add a comment explaining the acrobatics. My understanding is that it's to limit the number of pending evict_and_wait tasks?

pageserver/src/disk_usage_eviction_task.rs Show resolved Hide resolved
pageserver/src/disk_usage_eviction_task.rs Outdated Show resolved Hide resolved
@koivunej
Copy link
Member Author

koivunej commented Dec 11, 2023

The disk-usage-based eviction batched by timeline.
IIRC one goal behind that was to only update the IndexPart once, instead of on each individual eviction.
I guess we already lost that property when we introduced the struct Layer?

No. The unlinking has already done at the end of compaction or GC, it was introduced on #5645 perhaps.

one goal behind that was to only update the IndexPart once, instead of on each individual eviction.

I did not remember that, but compaction is still scheduling 1-2 updates even with the inverted l0=>l1 vs. image layer ordering going to prod soon (#5950) and gc schedules one. My motivation for this PR is recently gained absence of layer_removal_cs.

One could say we should have at least one test asserting how many index part updates we do, just in case.

@koivunej
Copy link
Member Author

koivunej commented Dec 11, 2023

Noted thing: Layer::evict_and_wait can hang by bug (I am quite sure we've seen zero of such bugs) but anyways there should be a timeout like I envisioned in #4745. I'll add it in a follow-up, added only bonus logging here.... Could actually that in the follow-up as well, so this is almost like a refactoring PR.

Failures on Postgres 14

  • test_peer_recovery: debug

This failure looks weird, but not caused by this PR.

@koivunej koivunej changed the title fix: remove eviction batching refactor: remove eviction batching Dec 13, 2023
@koivunej koivunej enabled auto-merge (squash) December 13, 2023 11:17
@koivunej koivunej merged commit a919b86 into main Dec 13, 2023
41 checks passed
@koivunej koivunej deleted the remove_eviction_batching branch December 13, 2023 16:05
koivunej added a commit that referenced this pull request Feb 29, 2024
…rics (#6131)

Because of bugs evictions could hang and pause disk usage eviction task.
One such bug is known and fixed #6928. Guard each layer eviction with a
modest timeout deeming timeouted evictions as failures, to be
conservative.

In addition, add logging and metrics recording on each eviction
iteration:
- log collection completed with duration and amount of layers
    - per tenant collection time is observed in a new histogram
    - per tenant layer count is observed in a new histogram
- record metric for collected, selected and evicted layer counts
- log if eviction takes more than 10s
- log eviction completion with eviction duration

Additionally remove dead code for which no dead code warnings appeared
in earlier PR.

Follow-up to: #6060.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants