Skip to content

Conversation

@rkistner
Copy link
Contributor

@rkistner rkistner commented Nov 11, 2025

This primarily fixes memory leaks occurring when restarting application. The same issue affected the replication jobs for Postgres, MongoDB and MySQL.

The basic issue is that every time a new replication job was started, a new PgManager/MongoManager/MySQLConnectionManager was created and persisted on the ConnectionManagerFactory, which was never released. If there is a persistent connection error, a new job is started every couple of seconds, increasing memory usage indefinitely. In some cases, this could lead to a 200MB+ memory increase per hour.

The main fix is to keep track of when these connection managers are closed, and then clean up these instances.

Additional fixes:

  1. Only instantiate ChecksumCache when it is used for the first time. Since the cache pre-allocates memory, this avoids memory usage in cases where it's never used, such as the replication job.
  2. Simplify the replication jobs to remove replicateLoop. This moves the retry logic to now be purely managed by the top-level replicator, instead of having multiple levels of retries for different types of errors. This also removes some double-logging of replication errors (e.g. "Replication failed" and "Replication error" for the same error). This also means that if there are multiple replication processes running, it is now feasible for another process to take over if one runs into errors.
  3. On Postgres WalStreamReplicationJob, use the same connection pool for keepalive messages as for the job itself, instead of using two separate pools.
  4. For Sync rules: <x> have been locked by another process for replication messages, include the expiration time of the lock - this helps to identify whether the lock is actually active or not.
  5. Some tweaks to connection attempt delays in the ErrorRateLimiter implementations.

Tested by using an invalid connection string, manually disabling the ErrorRateLimiter delays (variable), manually disabling the AbstractReplicator delay (5s), then measuring memory usage over time with Chrome's inspector.

@changeset-bot
Copy link

changeset-bot bot commented Nov 11, 2025

🦋 Changeset detected

Latest commit: c3d8ca3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 11 packages
Name Type
@powersync/service-module-postgres-storage Patch
@powersync/service-module-mongodb-storage Patch
@powersync/service-module-postgres Patch
@powersync/service-module-mongodb Patch
@powersync/service-core Patch
@powersync/service-module-mysql Patch
@powersync/service-image Patch
@powersync/service-schema Patch
@powersync/service-core-tests Patch
@powersync/service-module-core Patch
test-client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rkistner rkistner marked this pull request as draft November 11, 2025 13:27
@rkistner rkistner marked this pull request as ready for review November 11, 2025 14:12
Copy link
Collaborator

@stevensJourney stevensJourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me :)

@rkistner rkistner merged commit d889219 into main Nov 11, 2025
22 checks passed
@rkistner rkistner deleted the fix-failed-connection-leak branch November 11, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants