-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(layer): remove the need to repair internal state #7030
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
koivunej
force-pushed
the
joonas/layer-cancellation-safety
branch
from
March 6, 2024 11:06
d95f733
to
42b37c4
Compare
koivunej
commented
Mar 6, 2024
2706 tests run: 2576 passed, 0 failed, 130 skipped (full report)Flaky tests (3)Postgres 15Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
d265126 at 2024-03-20T23:57:57.268Z :recycle: |
koivunej
force-pushed
the
joonas/layer-cancellation-safety
branch
from
March 11, 2024 09:22
42b37c4
to
77ca39d
Compare
koivunej
force-pushed
the
joonas/layer-cancellation-safety
branch
from
March 14, 2024 08:36
77ca39d
to
91c9e69
Compare
Rebased away the conflict, handled the |
This was referenced Mar 15, 2024
koivunej
added a commit
that referenced
this pull request
Mar 15, 2024
Aiming for the design where `heavier_once_cell::OnceCell` is initialized by a future factory lead to awkwardness with how `LayerInner::get_or_maybe_download` looks right now with the `loop`. The loop helps with two situations: - an eviction has been scheduled but has not yet happened, and a read access should cancel the eviction - a previous `LayerInner::get_or_maybe_download` that canceled a pending eviction was canceled leaving the `heavier_once_cell::OnceCell` uninitialized but needing repair by the next `LayerInner::get_or_maybe_download` By instead supporting detached initialization in `heavier_once_cell::OnceCell` via an `OnceCell::get_or_detached_init`, we can fix what the monolithic #7030 does: - spawned off download task initializes the `heavier_once_cell::OnceCell` regardless of the download starter being canceled - a canceled `LayerInner::get_or_maybe_download` no longer stops eviction but can win it if not canceled Split off from #7030. Cc: #5331
This was referenced Mar 15, 2024
koivunej
force-pushed
the
joonas/layer-cancellation-safety
branch
3 times, most recently
from
March 19, 2024 10:56
16cdd9c
to
f2aec12
Compare
This was referenced Mar 19, 2024
koivunej
force-pushed
the
joonas/layer-cancellation-safety
branch
from
March 19, 2024 13:45
f2aec12
to
c1c92b0
Compare
koivunej
changed the base branch from
main
to
joonas/layer-always-init-on-download
March 19, 2024 15:03
koivunej
force-pushed
the
joonas/layer-cancellation-safety
branch
from
March 20, 2024 11:40
c1c92b0
to
23a9aae
Compare
koivunej
changed the title
draft: layer enhancements
fix(layer): make internal state need no repairs
Mar 20, 2024
koivunej
changed the title
fix(layer): make internal state need no repairs
feat(layer): make internal state need no repairs
Mar 20, 2024
koivunej
changed the base branch from
joonas/layer-always-init-on-download
to
joonas/heavier-once-cell-small-fix
March 20, 2024 13:21
koivunej
commented
Mar 20, 2024
koivunej
commented
Mar 20, 2024
koivunej
added a commit
that referenced
this pull request
Mar 20, 2024
Since #6115 with more often used get_value_reconstruct_data and friends, we should not have needless INFO level span creation near hot paths. In our prod configuration, INFO spans are always created, but in practice, very rarely anything at INFO level is logged underneath. `ResidentLayer::load_keys` is only used during compaction so it is not that hot, but this aligns the access paths and their span usage. PR changes the span level to debug to align with others, and adds the layer name to the error which was missing. Split off from #7030.
at best, it was always the same as the internal state in LayerInner::inner. At worst, it added at least one possible bad state we should care: what if we get to the drop with a strong LayerInner reference but the wanted_evicted was false? luckily we don't seem to have hit this case ever.
it was added very early, but was never used.
revert it while keeping the log message -- do not mention the metrics. because there is now a re-check against status being Evicted, this is no longer expected.
introduced in #7175, and we focused on other things in the review.
koivunej
force-pushed
the
joonas/layer-cancellation-safety
branch
from
March 20, 2024 22:46
befd60e
to
76fee8c
Compare
koivunej
changed the title
feat(layer): make internal state need no repair
fix(layer): make internal state need no repair
Mar 20, 2024
koivunej
changed the title
fix(layer): make internal state need no repair
fix(layer): remove the need to repair internal state
Mar 20, 2024
koivunej
added a commit
that referenced
this pull request
Apr 18, 2024
#7030 introduced an annoying papercut, deeming a failure to acquire a strong reference to `LayerInner` from `DownloadedLayer::drop` as a canceled eviction. Most of the time, it wasn't that, but just timeline deletion or tenant detach with the layer not wanting to be deleted or evicted. When a Layer is dropped as part of a normal shutdown, the `Layer` is dropped first, and the `DownloadedLayer` the second. Because of this, we cannot detect eviction being canceled from the `DownloadedLayer::drop`. We can detect it from `LayerInner::drop`, which this PR adds. Test case is added which before had 1 started eviction, 2 canceled. Now it accurately finds 1 started, 1 canceled.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The current implementation of struct Layer supports canceled read requests, but those will leave the internal state such that a following
Layer::keep_resident
call will need to repair the state. In pathological cases seen during generation numbers resetting in staging or with too many in-progress on-demand downloads, this repair activity will need to wait for the download to complete, which stalls disk usage-based eviction. Similar stalls have been observed in staging near disk-full situations, where downloads failed because the disk was full.Fixes #6028 or the "layer is present on filesystem but not evictable" problems by:
LayerInner::get_or_maybe_download
LayerInner::inner
from the download taskNot canceling evictions above case (1) and always initializing (2) lead to plain
LayerInner::inner
always having the up-to-date information, which leads to the oldLayer::keep_resident
never having to wait for downloads to complete. Finally, theLayer::keep_resident
is replaced withLayer::is_likely_resident
. These fix #7145.Summary of changes
watch
internally rather than abroadcast
to avoid hanging eviction while a download is ongoingLayer::keep_resident
to use justself.0.inner.get()
as truth asLayer::is_likely_resident
LayerInner::wanted_evicted
boolean as no longer neededBuilds upon: #7185. Cc: #5331.