Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(layer): remove the need to repair internal state (#7030)
## Problem The current implementation of struct Layer supports canceled read requests, but those will leave the internal state such that a following `Layer::keep_resident` call will need to repair the state. In pathological cases seen during generation numbers resetting in staging or with too many in-progress on-demand downloads, this repair activity will need to wait for the download to complete, which stalls disk usage-based eviction. Similar stalls have been observed in staging near disk-full situations, where downloads failed because the disk was full. Fixes #6028 or the "layer is present on filesystem but not evictable" problems by: 1. not canceling pending evictions by a canceled `LayerInner::get_or_maybe_download` 2. completing post-download initialization of the `LayerInner::inner` from the download task Not canceling evictions above case (1) and always initializing (2) lead to plain `LayerInner::inner` always having the up-to-date information, which leads to the old `Layer::keep_resident` never having to wait for downloads to complete. Finally, the `Layer::keep_resident` is replaced with `Layer::is_likely_resident`. These fix #7145. ## Summary of changes - add a new test showing that a canceled get_or_maybe_download should not cancel the eviction - switch to using a `watch` internally rather than a `broadcast` to avoid hanging eviction while a download is ongoing - doc changes for new semantics and cleanup - fix `Layer::keep_resident` to use just `self.0.inner.get()` as truth as `Layer::is_likely_resident` - remove `LayerInner::wanted_evicted` boolean as no longer needed Builds upon: #7185. Cc: #5331.
- Loading branch information
2206e14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2786 tests run: 2641 passed, 2 failed, 143 skipped (full report)
Failures on Postgres 14
test_heavy_write_workload[neon_off-github-actions-selfhosted-10-5-5]
: releasetest_bulk_insert[neon-github-actions-selfhosted]
: releaseCode coverage* (full report)
functions
:28.4% (7181 of 25277 functions)
lines
:47.1% (44155 of 93761 lines)
* collected from Rust tests only
2206e14 at 2024-03-21T02:28:03.374Z :recycle: