Skip to content

Conversation

@FGasper
Copy link
Collaborator

@FGasper FGasper commented Nov 4, 2025

Migration Verifier listens for changes on both source & destination clusters and enqueues rechecks in the same collection for both. Thus, any changes that hit the source also hit the destination. Thus, for every change on the source we expect to see a duplicate change on the destination.

We handle this via an insert with tolerance for duplicate keys. The server’s duplicate-key path, though, is quite slow. It’s much faster just to write both documents and then deduplicate them when reading.

This changeset makes that change. Rechecks triggered by source changes are no longer document-level duplicates of destination-triggered rechecks because the _id now contains a rand field, set to a random int32, that distinguishes them. When we convert those into recheck tasks, we project the _id.rand field out so that it’s easy to deduplicate them.

This also avoids duplicate-key errors in the “hot documents” case as well.

@FGasper FGasper requested a review from tdq45gj November 4, 2025 19:10
@FGasper FGasper marked this pull request as ready for review November 4, 2025 19:11
Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder if using default ObjectID as _id and creating an index on {db, coll, docID} is a better (or worse) alternative.

@FGasper FGasper merged commit 5fb4f11 into mongodb-labs:main Nov 5, 2025
99 checks passed
@FGasper FGasper deleted the felipe_reduce_change_stream_recheck_dupes branch November 5, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants