Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce layer map locking on the read path #6833

Closed
VladLazar opened this issue Feb 20, 2024 · 3 comments
Closed

Reduce layer map locking on the read path #6833

VladLazar opened this issue Feb 20, 2024 · 3 comments
Assignees
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug triaged bugs that were already triaged

Comments

@VladLazar
Copy link
Contributor

Currently, the read path (Timeline::get and Timeline::get_vectored when it merges) hold the layer manager read lock while accumulating the reconstruct data (includes on-demand downloads and disk IO). The non-vectored read path grabs the lock here.

Locking for this long is not required. Instead, the read path could hold on to a storage_layer::Layer (introduced in #4938) and only hold the lock while querying the layer map. The fix for the vectored read path will be similar, but slightly more involved. The search fringe must be updated to hold storage_layer::Layer instead of layer descriptions and the in-memory handles will need some thought as well.

@VladLazar VladLazar added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels Feb 20, 2024
@jcsp jcsp added the triaged bugs that were already triaged label Feb 29, 2024
@koivunej
Copy link
Member

Just realized while looking at some logs (slack thread), that compaction now needs to wait for on-demand downloads. This is a regression introduced with the struct Layer work. Before that, we released the read lock for layer downloading and switching.

Since no one has noticed this in quite some time, I don't think this is critical, but we should get it eventually fixed, so a slight boost to this issue.

What needs to be done:

  • always drop the read lock before doing anything with the Layer

VladLazar added a commit that referenced this issue Mar 26, 2024
## Problem
We currently hold the layer map read lock while doing IO on the read
path. This is not required for correctness.

## Summary of changes
Drop the layer map lock after figuring out which layer we wish to read
from.
Why is this correct:
* `Layer` models the lifecycle of an on disk layer. In the event the
layer is removed from local disk, it will be on demand downloaded
* `InMemoryLayer` holds the `EphemeralFile` which wraps the on disk
file. As long as the `InMemoryLayer` is in scope, it's safe to read from it.

Related #6833
@VladLazar
Copy link
Contributor Author

Update:

  • Merged PR for the normal read path
  • PR for vectored read is ready to merge, but I'm holding it back for the next release

VladLazar added a commit that referenced this issue Apr 2, 2024
## Problem
The vectored read path holds the layer map lock while visiting a
timeline.

## Summary of changes
* Rework the fringe order to hold `Layer` on `Arc<InMemoryLayer>`
handles instead of descriptions that are resolved by the layer map at
the time of read. Note that previously `get_values_reconstruct_data` was
implemented for the layer description which already knew the lsn range
for the read. Now it is implemented on the new `ReadableLayer` handle
and needs to get the lsn range as an argument.
* Drop the layer map lock after updating the fringe.

Related #6833
@VladLazar
Copy link
Contributor Author

Both PRs merged. Releasing on the week of 2024-04-08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug triaged bugs that were already triaged
Projects
None yet
Development

No branches or pull requests

3 participants