-
Notifications
You must be signed in to change notification settings - Fork 15
REP-5283 Report change event document sizes accurately. #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REP-5283 Report change event document sizes accurately. #43
Conversation
tdq45gj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall. I wonder if it's necessary to check the fullDocument missing case since.
internal/verifier/change_stream.go
Outdated
| OpType string `bson:"operationType"` | ||
| Ns *Namespace `bson:"ns,omitempty"` | ||
| DocKey DocKey `bson:"documentKey,omitempty"` | ||
| DocSize *int `bson:"fullDocument,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BSON tag should probably be documentSize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! Good catch.
internal/verifier/change_stream.go
Outdated
| bson.D{{"$type", "$fullDocument"}}, | ||
| }}, | ||
| }}, // fullDocument exists | ||
| {"then", bson.D{{"$bsonSize", "$fullDocument"}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$bsonSize is added in v.4.4. Can we fall back to the old slower logic for v4.2 and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for checking this. I checked as far back as 5.0 but not earlier.
That makes me think we should just do fullDocument. I don’t think this is where any bottlenecks are anyway.
Thoughts?
internal/verifier/change_stream.go
Outdated
| {"$cond", bson.D{ | ||
| {"if", bson.D{ | ||
| {"$ne", bson.A{ | ||
| "missing", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the fullDocument missing case? $bsonSize returns 0 on null value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should avoid “magic values” when possible. I realize that 0 often serves in that role, but if we can avoid that I think it’s a small code-quality win.
mtrussotto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateLookup is fairly expensive (not the network cost, but the query), but I can believe the savings from better partitioning outweigh that.
|
@mtrussotto I agree re the savings. Moreover, mongosync’s embedded verifier uses |
tdq45gj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I was thinking that we don't need to make the migration-verifier run as fast as on supported versions. This could be simply setting setting the document size field to a large constant for 4.2. But I'll leave it up to you.
|
@tdq45gj Customers who are on 4.2 are likely still there because of the onerousness of upgrading, which suggests that their data sets could be large. So it’s probably better not to retain “pessimism” for 4.2 IMO. |
Previously all change events’ document sizes were recorded “pessimistically” as 16 MiB. This helped to avoid OOMs. It came at a cost, though: when the recheck queue is converted to recheck tasks, those tasks are sized so as to approximate the configured partition size. Thus, if the partition size was 400 MiB (the default), only 25 change events could fit into a recheck task. If there are 250,000 pending rechecks—not unfeasible for a large, busy data set after generation 0—that’s 10,000 tasks to create and perform, which is inefficient.
PR #34 all but eliminates the OOMs, which undercuts that “pessimism”’s benefit. It makes more sense now to allow for the possibility of large recheck tasks in order to minimize the number of tasks. Moreover, we can get pretty good confidence about document sizes from change events anyway:
fullDocument.fullDocument.This changeset, then:
fullDocumentin update events, and