-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: find gc cutoff points without holding Tenant::gc_cs #7585
Conversation
2880 tests run: 2759 passed, 0 failed, 121 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
90ece17 at 2024-05-03T12:15:53.035Z :recycle: |
3c979ad
to
7d161ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refresh_gc_info_internal
: I don't like that we now have a timelines
variable that only contains the timelines for which we produced GcInfo, but later also use self.timelines
. I think it's correct, but, it's a super easy footgun down the line, because they can be so easily confused. Maybe newtype magic could help.
But, this is not a hard ask. A comment warning future developers about this would be nice, though.
The only thing I'm insistent on is to factor out the questionmark business, as asked for in #7585 (comment)
Approving on that condition.
7d161ca
to
c4183eb
Compare
they are most likely just shutting down errors, and we have no types for such.
there is no reason to keep holding on to it for longer.
d923e01
to
873f4ab
Compare
#7585 introduced test case for deletions while synthetic size is being calculated. The test has a race against deletion, but we only accept one outcome. Fix it to accept 404 as well, as we cannot control from outside which outcome happens. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7456/8970595458/index.html#/testresult/32a5b2f8c4094bdb
The current implementation of finding timeline gc cutoff Lsn(s) is done while holding `Tenant::gc_cs`. In recent incidents long create branch times were caused by holding the `Tenant::gc_cs` over extremely long `Timeline::find_lsn_by_timestamp`. The fix is to find the GC cutoff values before taking the `Tenant::gc_cs` lock. This change is safe to do because the GC cutoff values and the branch points have no dependencies on each other. In the case of `Timeline::find_gc_cutoff` taking a long time with this change, we should no longer see `Tenant::gc_cs` interfering with branch creation. Additionally, the `Tenant::refresh_gc_info` is now tolerant of timeline deletions (or any other failures to find the pitr_cutoff). This helps with the synthetic size calculation being constantly completed instead of having a break for a timely timeline deletion. Fixes: #7560 Fixes: #7587
#7585 introduced test case for deletions while synthetic size is being calculated. The test has a race against deletion, but we only accept one outcome. Fix it to accept 404 as well, as we cannot control from outside which outcome happens. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7456/8970595458/index.html#/testresult/32a5b2f8c4094bdb
The current implementation of finding timeline gc cutoff Lsn(s) is done while holding
Tenant::gc_cs
. In recent incidents long create branch times were caused by holding theTenant::gc_cs
over extremely longTimeline::find_lsn_by_timestamp
. The fix is to find the GC cutoff values before taking theTenant::gc_cs
lock. This change is safe to do because the GC cutoff values and the branch points have no dependencies on each other. In the case ofTimeline::find_gc_cutoff
taking a long time with this change, we should no longer seeTenant::gc_cs
interfering with branch creation.Additionally, the
Tenant::refresh_gc_info
is now tolerant of timeline deletions (or any other failures to find the pitr_cutoff). This helps with the synthetic size calculation being constantly completed instead of having a break for a timely timeline deletion.Fixes: #7560
Fixes: #7587