-
Notifications
You must be signed in to change notification settings - Fork 372
REP-5475 - mongodump --internalOnlySourceWritesDoneBarrier #768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b77f70 to
ff0d8d3
Compare
ff0d8d3 to
dfe4cb3
Compare
ajayvijayakumar123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions!
mongodump/barrier.go
Outdated
| // test infrastructure. The tests will create the barrier file when they have | ||
| // finshed writes to the source cluster. | ||
| func waitForSourceWritesDoneBarrier(barrierName string) { | ||
| log.Logvf(log.Always, "waitForSourceWritesDoneBarrier: %s", barrierName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| log.Logvf(log.Always, "waitForSourceWritesDoneBarrier: %s", barrierName) | |
| log.Logvf(log.DebugHigh, "waitForSourceWritesDoneBarrier: %s", barrierName) |
I'm not too familiar with how we do logging in the tools. But, should this be at the debug level since we won't use this flag in production?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the log message so that it's clear that this log means we are entering the blocking state and the later one means we are still in the blocking state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to disagree with @ajayvijayakumar123: if this should never run in production this it's actually critical that it is as noisy as possible when running in production. Basically, if we hide the logs when this is running in production then we'll not be able to detect that something which should never run in production is running in production until it breaks.
If possible, we should not have this code compile into production binaries so that it's actually impossible to run in production :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yup good point. I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undone and undone, and I added a comment to explain the choice of log.Always.
mongodump/barrier.go
Outdated
| @@ -0,0 +1,44 @@ | |||
| // Copyright (C) MongoDB, Inc. 2014-present. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this barrier system only used to provide a signal between mongodump and the integration testing framework?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The testing framework has to signal when the source cluster is in its final state. Otherwise there will be a race and mongodump may capture the state too early, when it's still being changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this may already be addressed with discussions you had with Dave)
My opinion here is that we should have avoid having code that exists just for testing get compiled into the binaries that are released. So, is it possible to have this mechanism only be a part of the integration testing build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go build tags would facilitate this, FYI.
I don’t feel terribly strongly about it, though, because it’s clearly marked as “internal-only”, and even if a user did use it, the worst that happens is that their backup stalls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed in the review of the technical design and the consensus was to use a command-line option that is always present, but has an obscure name and isn't visible in the command-line help documentation.
| return | ||
| } | ||
| if os.IsNotExist(err) { | ||
| if time.Since(prevLogTime) >= logInterval { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make sure I understand, this setup will not print a log statement until 1 minute after it has actually started waiting on the barrier file. And then this works in tandem with the log statement at the top of the function to indicate we have entered and are still in the blocking state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We get a log message after 1 minute, and (roughly) once a minute from then on, to say that we're waiting. Enough to see what's happening, but without flooding the logfile with the same message on every poll.
mongodump/barrier.go
Outdated
| // test infrastructure. The tests will create the barrier file when they have | ||
| // finshed writes to the source cluster. | ||
| func waitForSourceWritesDoneBarrier(barrierName string) { | ||
| log.Logvf(log.Always, "waitForSourceWritesDoneBarrier: %s", barrierName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the log message so that it's clear that this log means we are entering the blocking state and the later one means we are still in the blocking state.
FGasper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good. I’ve made some notes; see what you think.
Thanks!
mongodump/barrier.go
Outdated
| // Poll for existence of the barrier file every 500msec | ||
| time.Sleep(500 * time.Millisecond) | ||
| } else { | ||
| panic(errors.Wrapf(err, "failed to open barrier file %s", barrierName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panicking seems a bit extreme here. Can we return an error instead?
(And use %#q in the error string to get a nicely-quoted name?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think panic() is appropriate here, since this code is only used in the resmoke test environment and anything strange happening to that file would indicate some level of chaos in the test. I added a comment to make that clear.
Changed %s to %#q throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General Go practice is to avoid panic() except for indicating purely-internal errors, usually programmer mistakes.
Maybe ping another reviewer for a third opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change.
mongodump/mongodump.go
Outdated
| return nil | ||
| } | ||
|
|
||
| if !dump.OutputOptions.Oplog && (dump.InputOptions.SourceWritesDoneBarrier != "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're checking collection exists before we're sure that writes are done, it seems that if a collection created after verify collection exists but before the barrier will be ignored. Should the barrier be before verifyCollectionExists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quibble: verifyCollectionExists is for backups of a specific collection. It’s a no-op otherwise. So either the backup will fail, or that collection will be included.
That said, I agree that the existence check should probably follow the barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the non-oplog case, moved the waitForBarrier before the check.
I added a comment about the oplog case.
// A test with the combination of
// 1. --oplog and
// 2. --internalSourceWritesOnly and
// 3. a specified collection
// 4. the collection didn't exist at the time mongodump was started but is
// created later and possibly captured in the oplog
// would do this check too early and thus fail.
//
// That's out of scope for mongodump passthrough testing so we don't try to handle it.
exists, err := dump.verifyCollectionExists()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we note this as a non-goal on the scope? Can we create a ticket to support this case eventually so we don't lose track of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The medium/long-term goal is to get rid of mongodump and mongorestore, by supporting dump-to-file(s) and restore-from-file(s) in mongosync. So I don't think there's a need to plan for extending the mongodump testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I still think we note this as a non-goal on the scope to call out something that is supported but not tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added "mongodump --collection" as a non-goal on the scope.
FGasper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d still like to see the panic converted to a returned error. @ajayvijayakumar123, @tdq45gj? (Erich, you don’t pop up as easily in @ completion …)
mongodump/barrier.go
Outdated
| // Poll for existence of the barrier file every 500msec | ||
| time.Sleep(500 * time.Millisecond) | ||
| } else { | ||
| panic(errors.Wrapf(err, "failed to open barrier file %s", barrierName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General Go practice is to avoid panic() except for indicating purely-internal errors, usually programmer mistakes.
Maybe ping another reviewer for a third opinion?
I've kind of intentionally backed off these PRs bc there were getting to be too many people involved and I didn't to create distractions. I'm reading the comments to learn but not asking questions or making comments so that the discussion can stay focused on getting things across the line. My general opinion on panics or faults is that they should be used when you have entered a state that is unrecoverable (that is, the program has entered a state that should be impossible and where it can no longer trust any decision it makes after that). |
ajayvijayakumar123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % a suggestion !
tdq45gj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
mongodump/barrier.go
Outdated
| time.Sleep(500 * time.Millisecond) | ||
| } else { | ||
| // Any other error implies that the resmoke test environment is | ||
| // irretrievably confused, so it is appropriate to panic here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment might need to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
FGasper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % @tdq45gj’s request to update the comment about panicking.
REP-5475 - Changes required for the mongodump passthrough testing
New mongodump option --internalOnlySourceWritesDoneBarrier to synchronize with resmoke tests
mongodump/barrier.go implements the wait-until-named-file-exists barrier