Skip to content

Conversation

rkistner
Copy link
Contributor

@rkistner rkistner commented Jul 21, 2025

Background

When deploying new sync rules, we create a new stream (using for example a new postgres logical replication slot) for the new version, and process it while the current version stays active. When initial replication is complete, clients switch over to sync from the new copy.

For the new sync rules themselves, replication works roughly as follows:

  1. Get a snapshot position.
  2. Do an initial snapshot.
  3. Resume streaming data from the initial snapshot position.

The issue

The main issue is that when the initial snapshot is complete, there could still be a long period before streaming replication has caught up. This is typically not a problem for instances with small data volumes, it could be significant in cases where replication takes a couple of hours, and a lot of new data has come in during that time.

A secondary issue is specific to replicating MongoDB data - until replication has caught up, there could be inconsistent data synced to clients.

The fix

This refactors the "autoActivate" behavior - we now only switch over to the new sync rules version when it has a consistent checkpoint.

Additionally, for MongoDB replication, we update streaming progress during the initial catch-up phase, so that we can resume replication at the same point in the case of restart.

This is not a complete fix yet - at that point replication of the new sync rules could still be behind and take a while to fully catch up, but it a significant improvement already. For this, we're re-purposing "snapshot_lsn" as a more general "resume_from_lsn".

Additional smaller fixes

  1. We now bypass the previous "resnapshot" behavior unless we have TOAST values we need to re-replicate.
  2. This improves stability of some tests.
  3. At "rate limiting" to avoid touching probes on every change in MongoDB change streams - this added significant overhead, causing replication catch-up to take longer than it should.

Copy link

changeset-bot bot commented Jul 21, 2025

🦋 Changeset detected

Latest commit: ff8de76

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 11 packages
Name Type
@powersync/service-module-postgres-storage Minor
@powersync/service-module-mongodb-storage Minor
@powersync/service-core-tests Minor
@powersync/service-module-postgres Minor
@powersync/service-module-mongodb Minor
@powersync/service-core Minor
@powersync/service-module-mysql Minor
@powersync/service-schema Minor
@powersync/service-image Minor
@powersync/service-module-core Patch
test-client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rkistner rkistner marked this pull request as ready for review July 22, 2025 08:31
@rkistner rkistner requested a review from stevensJourney July 22, 2025 09:15
Copy link
Collaborator

@stevensJourney stevensJourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wen't through the logic flow and could not spot any issues. This LGTM.

@rkistner rkistner merged commit d56eeb9 into main Jul 22, 2025
21 checks passed
@rkistner rkistner deleted the fix-replication-switchover branch July 22, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants