Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layer: evict downloaded-but-not-initialized #6028

Closed
Tracked by #5331
koivunej opened this issue Dec 4, 2023 · 4 comments · Fixed by #7030
Closed
Tracked by #5331

layer: evict downloaded-but-not-initialized #6028

koivunej opened this issue Dec 4, 2023 · 4 comments · Fixed by #7030
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug

Comments

@koivunej
Copy link
Contributor

koivunej commented Dec 4, 2023

This issue is a catch-all for struct Layer situations where the internal state (LayerInner::inner) does not match filesystem (file present, not present).

"downloaded-but-not-initialized" happens when a call to LayerInner::get_or_maybe_download changes state such that an eviction is cancelled, but the call is cancelled before it reinitializes or downloads and initializes the internal state. Currently the workaround is to have Layer::keep_resident be async, to do initialization (repair) but not download.

As recently noticed, the Layer::keep_resident will need to wait for an ongoing download. This can mean 10min wait during an overloaded situation. Slow Layer::keep_resident slows down collecting resident layers (LayerManager::resident_layers) called from most importantly disk usage based eviction. This need to wait for a download is very much against original design goals, which only succeeded for Layer::evict_and_wait.

Solution alternatives

Keep only the fastest path spawn-less

Currently only downloading the layer involves spawning a new task. A spawn would make the make the "file is present locally" case initialization cancellation safe. Spawning however is difficult due to heavier_once_cell::OnceCell using a std::sync::Mutex internally, but also because we would have to do the stat call while holding the mutex.

Alternative to doing stat while holding the mutex is to have the spawn_blocking'd eviction wait for the semaphore with Handle::block_on, exposing it to wait on a download. The Handle::block_on wait on semaphore permit could be short-circuited by signalling when we've decided to download and not just reinitialize.

Additional state to track the last known filesystem status

During timeline layermap initialization, we know if the layer is evicted or present. When new layers are created locally, we know they are present because a blocking std::fs::rename from the temporary path succeeds. Seems obvious we could cache this knowledge, only later updated when eviction's std::fs::remove_file succeeds.

Caching the value will expose another problem I was hoping to avoid originally. If we end up in a state where a previous spawn_blocking eviction was "cancelled", we should fire up a new one from Layer::evict_and_wait. In the meantime the Layer::keep_resident should report status based on this cached value.

Race Layer::keep_resident to when download path is entered

Would a watch channel change the size of the Layer?


"downloaded-but-not-initialized" happens if there is a single cancelled request which ends up firing up the download, which is not cancellation safe. Could make it, or could make evictions work even if we don't know if we are initialized or not. Filesystem is the ground truth for this.

Originally posted by @koivunej in #6000 (comment)

@koivunej koivunej added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels Jan 29, 2024
@koivunej
Copy link
Contributor Author

Found a good abstraction for this, fixing it will not be difficult.

@jcsp
Copy link
Contributor

jcsp commented Feb 14, 2024

(noting for my benefit):

  • Currently LayerManager::resident_layers is an async function that calls keep_resident on all the layers, rather than just yielding the layers directly.
  • The purpose of that keep_resident is to handle a case where a layer might have been downloaded but then a future was cancelled during download, leaving the layer in the "downloaded but not initialized" state described in this ticket.
  • Getting the layer out of that bad state is important during eviction (which uses resident_layers()), in order to avoid leaking disk space.

@koivunej
Copy link
Contributor Author

koivunej commented Mar 4, 2024

Next steps:

  • just do this; it has just been deprioritized by urgent work

@koivunej
Copy link
Contributor Author

koivunej commented Mar 4, 2024

Other effects this bug causes:

  • Layer::keep_resident called by LayerMap::resident_layers will join any wait to download a layer
  • normally layer downloads are fast, but in an overloaded situation they are not because we are also waiting on concurrency limiter
  • on a broken by generation numbers resetting tenant for which remote layers are removed but listed on index_part.json, this causes very long collection times because there is an infinite loop retrying the initial logical size and compaction tries to access the layer contents, so there are many queued download attempts
    • struct Layer has an internal backoff mechanism which also plays a role

koivunej added a commit that referenced this issue Mar 21, 2024
## Problem

The current implementation of struct Layer supports canceled read
requests, but those will leave the internal state such that a following
`Layer::keep_resident` call will need to repair the state. In
pathological cases seen during generation numbers resetting in staging
or with too many in-progress on-demand downloads, this repair activity
will need to wait for the download to complete, which stalls disk
usage-based eviction. Similar stalls have been observed in staging near
disk-full situations, where downloads failed because the disk was full.

Fixes #6028 or the "layer is present on filesystem but not evictable"
problems by:
1. not canceling pending evictions by a canceled
`LayerInner::get_or_maybe_download`
2. completing post-download initialization of the `LayerInner::inner`
from the download task

Not canceling evictions above case (1) and always initializing (2) lead
to plain `LayerInner::inner` always having the up-to-date information,
which leads to the old `Layer::keep_resident` never having to wait for
downloads to complete. Finally, the `Layer::keep_resident` is replaced
with `Layer::is_likely_resident`. These fix #7145.

## Summary of changes

- add a new test showing that a canceled get_or_maybe_download should
not cancel the eviction
- switch to using a `watch` internally rather than a `broadcast` to
avoid hanging eviction while a download is ongoing
- doc changes for new semantics and cleanup
- fix `Layer::keep_resident` to use just `self.0.inner.get()` as truth
as `Layer::is_likely_resident`
- remove `LayerInner::wanted_evicted` boolean as no longer needed

Builds upon: #7185. Cc: #5331.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants