New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: exclude gc_horizon from synthetic size calculation #6407
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jcsp
added
t/bug
Issue Type: Bug
c/storage/pageserver
Component: storage: pageserver
labels
Jan 19, 2024
2706 tests run: 2582 passed, 0 failed, 124 skipped (full report)Flaky tests (2)Postgres 16Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
2e938fa at 2024-03-15T13:44:38.948Z :recycle: |
arpad-m
approved these changes
Jan 19, 2024
koivunej
reviewed
Jan 25, 2024
jcsp
force-pushed
the
jcsp/synthetic-size-drops
branch
from
March 15, 2024 12:06
bce0c61
to
2e938fa
Compare
koivunej
reviewed
Mar 15, 2024
koivunej
approved these changes
Mar 15, 2024
5 tasks
jcsp
added a commit
that referenced
this pull request
Apr 23, 2024
## Problem We already made a change in #6407 to make pitr_interval authoritative for synthetic size calculations (do not charge users for data retained due to gc_horizon), but that change didn't cover the case where someone entirely disables time-based retention by setting pitr_interval=0 Relates to: #6374 ## Summary of changes When pitr_interval is zero, do not set `pitr_cutoff` based on gc_horizon. gc_horizon is still enforced, but separately (its value is passed separately, there was never a need to claim pitr_cutoff to gc_horizon) ## More detail ### Issue 1 Before this PR, we would skip the update_gc_info for timelines with last_record_lsn() < gc_horizon. Let's call such timelines "tiny". The rationale for that presumably was that we can't GC anything in the tiny timelines, why bother to call update_gc_info(). However, synthetic size calculation relies on up-to-date update_gc_info() data. Before this PR, tiny timelines would never get an updated GcInfo::pitr_horizon (it remained Lsn(0)). Even on projects with pitr_interval=0d. With this PR, update_gc_info is always called, hence GcInfo::pitr_horizon is always updated, thereby providing synthetic size calculation with up-to-data data. ### Issue 2 Before this PR, regardless of whether the timeline is "tiny" or not, GcInfo::pitr_horizon was clamped to at least last_record_lsn - gc_horizon, even if the pitr window in terms of LSN range was shorter (=less than) the gc_horizon. With this PR, that clamping is removed, so, for pitr_interval=0, the pitr_horizon = last_record_lsn.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
See:
gc_horizon
, which retains data based on size, even though users only configure retention by time #6374Summary of changes
Whereas previously we calculated synthetic size from the gc_horizon or the pitr_interval (whichever is the lower LSN), now we ignore gc_horizon and exclusively start from the
pitr_interval
. This is a more generous calculation for billing, where we do not charge users for data retained due to gc_horizon.Checklist before requesting a review
Checklist before merging