[Postgres] Improve handling of "lost" replication slots #387

rkistner · 2025-10-30T01:53:24Z

This adds a check for replication slots that are "lost" due to max_slot_wal_keep_size exceeded, to automatically re-create the slot if needed. There is an explicit wal_status field since Postgres 13+ that we now check, as well as some additional error message checks.

The slot health check is now also modified to only wait 2 minutes for the slot to become healthy, rather than 120 tries. This is relevant because each individual try can take 2 minutes in some scenarios, which can cause the overall check to only fail after 4 hours.

This does not yet solve the issue of this health check potentially causing high load on the source database. For that we should probably use an exponential back-off mechanism for the overall replication retries (separate PR).

changeset-bot · 2025-10-30T01:53:28Z

🦋 Changeset detected

Latest commit: 72b12b1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 12 packages

Name	Type
@powersync/service-module-postgres	Patch
@powersync/lib-service-postgres	Patch
@powersync/service-schema	Patch
@powersync/service-image	Patch
@powersync/service-module-postgres-storage	Patch
@powersync/service-module-mongodb	Patch
@powersync/service-module-mysql	Patch
@powersync/service-core	Patch
@powersync/service-core-tests	Patch
@powersync/service-module-core	Patch
@powersync/service-module-mongodb-storage	Patch
test-client	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

stevensJourney

Looks good to me :)

The base branch was changed.

rkistner requested a review from stevensJourney October 30, 2025 02:00

rkistner marked this pull request as ready for review October 30, 2025 02:00

stevensJourney previously approved these changes Oct 30, 2025

View reviewed changes

Base automatically changed from test-pg-18 to main October 31, 2025 09:38

rkistner added 9 commits October 31, 2025 11:39

Add test to reproduce a "lost" replication slot.

dd6631c

Detect "can no longer access replication slot" error.

3abf836

Check replication slot status upfront.

7acb9e6

Limit slot health check based on time, not iterations.

62d9377

Fix for invalidation_reason not on pg < 16.

27a3535

Skip test on older postgres versions.

7c40e39

Further tweaks to postgres compatibility fallbacks.

b1a056e

Remove some version checks.

7021928

Add changeset.

5b6e088

rkistner force-pushed the lost-replication-slot branch from e1f9e83 to 5b6e088 Compare October 31, 2025 09:39

Resolve todo.

72b12b1

rkistner requested a review from stevensJourney October 31, 2025 09:46

stevensJourney approved these changes Oct 31, 2025

View reviewed changes

rkistner merged commit 0e9aa94 into main Oct 31, 2025
22 checks passed

rkistner deleted the lost-replication-slot branch October 31, 2025 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Postgres] Improve handling of "lost" replication slots #387

[Postgres] Improve handling of "lost" replication slots #387

Uh oh!

rkistner commented Oct 30, 2025

Uh oh!

changeset-bot bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

stevensJourney left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Postgres] Improve handling of "lost" replication slots #387

[Postgres] Improve handling of "lost" replication slots #387

Uh oh!

Conversation

rkistner commented Oct 30, 2025

Uh oh!

changeset-bot bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

stevensJourney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Oct 30, 2025 •

edited

Loading