Reduce duplicate keys on recheck insertion. #158

FGasper · 2025-11-04T01:14:02Z

Migration Verifier listens for changes on both source & destination clusters and enqueues rechecks in the same collection for both. Thus, any changes that hit the source also hit the destination. Thus, for every change on the source we expect to see a duplicate change on the destination.

We handle this via an insert with tolerance for duplicate keys. The server’s duplicate-key path, though, is quite slow. It’s much faster just to write both documents and then deduplicate them when reading.

This changeset makes that change. Rechecks triggered by source changes are no longer document-level duplicates of destination-triggered rechecks because the _id now contains a rand field, set to a random int32, that distinguishes them. When we convert those into recheck tasks, we project the _id.rand field out so that it’s easy to deduplicate them.

This also avoids duplicate-key errors in the “hot documents” case as well.

This reverts commit b203cba.

tdq45gj

LGTM. I wonder if using default ObjectID as _id and creating an index on {db, coll, docID} is a better (or worse) alternative.

FGasper added 16 commits November 3, 2025 20:01

dedupe

4e5e790

i32

4867147

move

b46272c

comments

d11bb04

fix compile

f5daf23

double rechecks

7ac0bea

fix tests again …

8c0fea6

save

63c6bdf

revert

26bc48f

dedupe

162a74b

dedupe in aggregation

b203cba

note a start time

bef5374

Revert "dedupe in aggregation"

a6a01d2

This reverts commit b203cba.

types

99246c9

rand instead of cause

5c55dfe

revert

0dff2ec

FGasper requested a review from tdq45gj November 4, 2025 19:10

FGasper marked this pull request as ready for review November 4, 2025 19:11

tdq45gj approved these changes Nov 5, 2025

View reviewed changes

FGasper merged commit 5fb4f11 into mongodb-labs:main Nov 5, 2025
99 checks passed

FGasper deleted the felipe_reduce_change_stream_recheck_dupes branch November 5, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce duplicate keys on recheck insertion. #158

Reduce duplicate keys on recheck insertion. #158

Uh oh!

FGasper commented Nov 4, 2025 •

edited

Loading

Uh oh!

tdq45gj left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reduce duplicate keys on recheck insertion. #158

Reduce duplicate keys on recheck insertion. #158

Uh oh!

Conversation

FGasper commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdq45gj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FGasper commented Nov 4, 2025 •

edited

Loading