-
Notifications
You must be signed in to change notification settings - Fork 27
[Postgres] Improve handling of "lost" replication slots #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 72b12b1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 12 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
stevensJourney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
The base branch was changed.
e1f9e83 to
5b6e088
Compare
This adds a check for replication slots that are "lost" due to
max_slot_wal_keep_sizeexceeded, to automatically re-create the slot if needed. There is an explicitwal_statusfield since Postgres 13+ that we now check, as well as some additional error message checks.The slot health check is now also modified to only wait 2 minutes for the slot to become healthy, rather than 120 tries. This is relevant because each individual try can take 2 minutes in some scenarios, which can cause the overall check to only fail after 4 hours.
This does not yet solve the issue of this health check potentially causing high load on the source database. For that we should probably use an exponential back-off mechanism for the overall replication retries (separate PR).