Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer::keep_resident will wait for layer download blocking disk usage based eviction #7145

Closed
koivunej opened this issue Mar 15, 2024 · 0 comments · Fixed by #7030
Closed
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug

Comments

@koivunej
Copy link
Contributor

This is an issue for the 10-minute hang on #5331 first observed in staging. It is related to #6028, which tracks the need for Layer::keep_resident to repair the state.

A fix is in #7030 -- use a separate watch channel to signal intent to begin a download, thus canceling the semaphore wait that a later started Layer::keep_resident would be doing.

The need to wait for downloads caused a disk full situation on 2024-03-15 00:32:00 on pageserver-0.eu-west-1.aws.neon.build because a download was awaited, which could not be completed. After all, the disk was full, and we currently do not preallocate space for downloads, so it always failed, not in the beginning, but somewhere along the way. A failed download, such as disk full or not found1, will also cause Layer::keep_resident to wait for exponential backoff.

Footnotes

  1. In past staging issues when generation numbers have rolled back, there had been retrying on S3 returning "not found". It should no longer be retried after (PR, which moved timeouts to remote_storage).

@koivunej koivunej added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels Mar 15, 2024
@koivunej koivunej self-assigned this Mar 15, 2024
@koivunej koivunej linked a pull request Mar 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant