-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate apply_lsn from SK to PS to prevent GC from collecting objects which may be still requested by replica #7368
Conversation
3090 tests run: 2963 passed, 0 failed, 127 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
f02aeb5 at 2024-05-21T12:28:39.906Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Compute-related changes are OK after fixing the style issue.
This PR is not passing tests because just it is not enough to fix the problem with accessing too old version which was collected by GC. #6718 should be committed first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, thanks, but a few minor remarks.
Removing my review request, since it's unclear to me what the state of this pr is. Please re-request review if needed. |
pageserver/src/tenant/timeline/walreceiver/connection_manager.rs
Outdated
Show resolved
Hide resolved
Generally LGTM. However the new test still fails, which is interesting. Note that it was untrivial to get to the source problem because of nesting errors masking each other: 1) first So the root cause is what this PR tried to prevent: |
13bcccb
to
f692109
Compare
2e736c0
to
f0f36be
Compare
To avoid pageserver gc'ing data needed by standby, propagate standby apply LSN through standby -> safekeeper -> broker -> pageserver flow and hold off GC for it. Iteration of GC resets the value to remove the horizon when standby goes away -- pushes are assumed to happen at least once between gc iterations. As a safety guard max allowed lag compared to normal GC horizon is hardcoded as 10GB. Add test for the feature. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Hot standby feedback xmins can be greater than next_xid due to sparse update of nextXid on pageserver (to do less writes it advances next xid on 1024). ProcessStandbyHSFeedback ignores such xids from the future; to fix, minimize received xmin to next_xid. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
f0f36be
to
f02aeb5
Compare
refer #6211 #6357
Problem
Summary of changes
Checklist before requesting a review
Checklist before merging