Merged
Conversation
🦋 Changeset detectedLatest commit: 23c5d0e The changes in this PR will be included in the next version bump. This PR includes changesets to release 12 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
stevensJourney
approved these changes
Feb 12, 2026
simolus3
approved these changes
Feb 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#375 introduced a change to the compact job to only compact buckets with > 10 changes, which can significantly reduce the amount of work if the instance is idle, or only a small number of buckets are changing.
The issue came in if the large buckets are modified while compacting. For example, if there are 1000 large buckets receiving 10 changes every 10 minutes each, these buckets would be repeatedly re-compacted during the compact job, with only minimal gains each time.
This changes the bucket scanning logic: We now scan through all buckets (in bucket_state), then filter out ones that don't need compacting. This part can be slower than the previous logic in some cases, but it should improve the worst-case performance significantly, by avoiding re-compacting the same buckets repeatedly.
This also adds a filter to skip compacting unless at least 10% of the bucket has changed since last compact, either by count or by size. This will avoid the extreme cases of compacting a bucket with say 50 000 operations in total, but only 10 changes that can actually be compacted. This is not a guarantee that the the compact will give gains, but it does filter out some cases where we know there won't be significant gains.
Future: Incremental compacting
We may want to add a more incremental compact job in the future, for example running continuously or every couple of minutes, to compact buckets as soon as it could help. In that case, we'd want to avoid scanning through all the buckets on every iteration.
An alternative considered here is to keep on using the index on
{_id.g: 1, estimate_since_compact.count: 1}to find the buckets, but either keep a cursor open, or re-query based on {count > last_count} (iterate in increasing order of count). What makes this approach difficult is that cursors can break when they remain open or idle for a long time, and re-querying based on the count can give duplicates. Iterating based on _id simplifies things here.To get incremental compacting in the future, we may do some hybrid with the old approach again: