-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: don't store tenant/timeline ID on each layer, synchronize with Timeline shutdown #5967
Conversation
2148 tests run: 2064 passed, 0 failed, 84 skipped (full report)Flaky tests (4)Postgres 16
Postgres 15
Postgres 14Code coverage (full report)
The comment gets automatically updated with the latest test results
93e069c at 2023-12-11T18:50:38.500Z :recycle: |
96371ee
to
6b60f7c
Compare
9f36f14
to
8b2f996
Compare
We can build it on-demand via Timeline.
8b2f996
to
41401ea
Compare
} | ||
|
||
/// Use this instead of `local_path` if you already have a Timeline to hand. | ||
pub(crate) fn build_local_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who uses this from outside? Or why does it need to be exposed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used by:
- Timeline, constructing paths for layer files in e.g. create_delta_layer
- RemoteTimelineClient, constructing path to download to.
// We will only do I/O during drop if our Timeline's layer_gate is open: this avoids | ||
// the risk that we would race with Timeline::shutdown and end up doing I/O to a timeline | ||
// path for which the Timeline object has been torn down already. | ||
let _gate_guard = match timeline.layer_gate.enter() { | ||
Ok(g) => g, | ||
Err(GateError::GateClosed) => return, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I am looking at this, wouldn't it just be better to acquire a gate for all layers before creating them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the simplest reasoning about correctness, we could hold a gate around the lifetime of all layer objects.
However, that's relatively expensive (layers are very numerous) and isn't necessary: we already block timeline shutdown on all related tasks completing, so that should mean that layers aren't doing any I/O after shutdown complete. The only exception to that is here in drop(), where we need to check the gate in case some other code kept a Layer alive past the point that the other code kept its GateGuard (or task_mgr task lifetime).
let local_path = self | ||
.local_path() | ||
.map_err(|_| EvictionCancelled::TimelineGone)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now we would be upgrading the timeline all over? It must already be upgraded to get to here because otherwise we would be running without a span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not upgraded frequently (the paths that need the local path aren't called super often), but yeah, I agree that it's a bit dangerous to make this implicit, since callers might naively use it for logging or something.
I've refactored this to take Timeline as a parameter: as you say, callers generally have a Timeline to hand already at the point they call this - 0f53319
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting rid of ids in the persistentlayerdesc is a good change but I dislike the adhoc timeline upgrading which is now spread around layer.
@@ -1004,7 +1038,10 @@ impl LayerInner { | |||
// | |||
// FIXME: this is not true anymore, we can safely evict wanted deleted files. | |||
} else if can_evict && evict { | |||
let span = tracing::info_span!(parent: None, "layer_evict", tenant_id = %self.desc.tenant_shard_id.tenant_id, shard_id = %self.desc.tenant_shard_id.shard_slug(), timeline_id = %self.desc.timeline_id, layer=%self, %version); | |||
// If timeline is alive, we can construct a span with IDs for this function. | |||
let span = self.timeline.upgrade().map(|timeline| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koivunej at this point, we could instead return
if the timeline is gone. I didn't make that change because it wasn't obvious if that would violate any other expectations, but it's probably fine since once Timeline is destroyed we don't have any obligation to make progress with eviction -- your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should
#7082 fixed the important part of this (deletions happening after timeline shutdown) |
Problem
Summary of changes
Checklist before requesting a review
Checklist before merging