Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(Layer): metric regression with too many canceled evictions #7363

Merged
merged 10 commits into from Apr 18, 2024

Conversation

koivunej
Copy link
Contributor

@koivunej koivunej commented Apr 11, 2024

#7030 introduced an annoying papercut, deeming a failure to acquire a strong reference to LayerInner from DownloadedLayer::drop as a canceled eviction. Most of the time, it wasn't that, but just timeline deletion or tenant detach with the layer not wanting to be deleted or evicted.

When a Layer is dropped as part of a normal shutdown, the Layer is dropped first, and the DownloadedLayer the second. Because of this, we cannot detect eviction being canceled from the DownloadedLayer::drop. We can detect it from LayerInner::drop, which this PR adds.

Test case is added which before had 1 started eviction, 2 canceled. Now it accurately finds 1 started, 1 canceled.

@koivunej koivunej requested a review from a team as a code owner April 11, 2024 19:05
@koivunej koivunej requested a review from arpad-m April 11, 2024 19:05
@koivunej koivunej enabled auto-merge (squash) April 11, 2024 19:05
Copy link

github-actions bot commented Apr 11, 2024

2766 tests run: 2648 passed, 0 failed, 118 skipped (full report)


Code coverage* (full report)

  • functions: 28.0% (6446 of 23012 functions)
  • lines: 46.7% (45355 of 97151 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
b837401 at 2024-04-18T15:35:23.297Z :recycle:

libs/utils/src/sync/heavier_once_cell.rs Show resolved Hide resolved
@koivunej koivunej disabled auto-merge April 16, 2024 08:48
@koivunej koivunej enabled auto-merge (squash) April 16, 2024 08:48
the api is not perfect, but assume that this will not be used for
anything except the `Drop` time inspection.
otherwise in the case where the value was uninitialized, we would had
lost the permit as shown by test case in the next commit. of course, it
is unnecessary to reinit the semaphore at all in this case.
this revealed a bug with missing InitPermit wrapping which I've fixed in
previous commit.
intention is clearer, and we avoid the unnecessary reinit of a new
semaphore.
@koivunej koivunej dismissed arpad-m’s stale review April 18, 2024 07:42

stale and not available right now

@koivunej koivunej requested a review from VladLazar April 18, 2024 12:15
pageserver/src/tenant/storage_layer/layer.rs Outdated Show resolved Hide resolved
@koivunej koivunej merged commit 3df67bf into main Apr 18, 2024
48 checks passed
@koivunej koivunej deleted the joonas/layer_gone_gone branch April 18, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants