Skip to content

Comments

[MongoDB Storage] Optimize compact job#503

Merged
rkistner merged 5 commits intomainfrom
optimize-bucket-compacting
Feb 12, 2026
Merged

[MongoDB Storage] Optimize compact job#503
rkistner merged 5 commits intomainfrom
optimize-bucket-compacting

Conversation

@rkistner
Copy link
Contributor

@rkistner rkistner commented Feb 12, 2026

#375 introduced a change to the compact job to only compact buckets with > 10 changes, which can significantly reduce the amount of work if the instance is idle, or only a small number of buckets are changing.

The issue came in if the large buckets are modified while compacting. For example, if there are 1000 large buckets receiving 10 changes every 10 minutes each, these buckets would be repeatedly re-compacted during the compact job, with only minimal gains each time.

This changes the bucket scanning logic: We now scan through all buckets (in bucket_state), then filter out ones that don't need compacting. This part can be slower than the previous logic in some cases, but it should improve the worst-case performance significantly, by avoiding re-compacting the same buckets repeatedly.

This also adds a filter to skip compacting unless at least 10% of the bucket has changed since last compact, either by count or by size. This will avoid the extreme cases of compacting a bucket with say 50 000 operations in total, but only 10 changes that can actually be compacted. This is not a guarantee that the the compact will give gains, but it does filter out some cases where we know there won't be significant gains.

Future: Incremental compacting

We may want to add a more incremental compact job in the future, for example running continuously or every couple of minutes, to compact buckets as soon as it could help. In that case, we'd want to avoid scanning through all the buckets on every iteration.

An alternative considered here is to keep on using the index on {_id.g: 1, estimate_since_compact.count: 1} to find the buckets, but either keep a cursor open, or re-query based on {count > last_count} (iterate in increasing order of count). What makes this approach difficult is that cursors can break when they remain open or idle for a long time, and re-querying based on the count can give duplicates. Iterating based on _id simplifies things here.

To get incremental compacting in the future, we may do some hybrid with the old approach again:

  1. Check for changed buckets first.
  2. If that exceeds a certain threshold in either number of buckets or time to process, switch to a full scan.

@changeset-bot
Copy link

changeset-bot bot commented Feb 12, 2026

🦋 Changeset detected

Latest commit: 23c5d0e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 12 packages
Name Type
@powersync/service-module-mongodb-storage Patch
@powersync/service-core-tests Patch
@powersync/service-core Patch
@powersync/service-schema Patch
@powersync/service-module-mongodb Patch
@powersync/service-module-mssql Patch
@powersync/service-module-mysql Patch
@powersync/service-module-postgres Patch
@powersync/service-image Patch
@powersync/service-module-postgres-storage Patch
@powersync/service-module-core Patch
test-client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rkistner rkistner marked this pull request as ready for review February 12, 2026 08:23
@rkistner rkistner requested a review from simolus3 February 12, 2026 13:46
@rkistner rkistner merged commit d1c2228 into main Feb 12, 2026
26 checks passed
@rkistner rkistner deleted the optimize-bucket-compacting branch February 12, 2026 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants