Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry upserts that fail with "duplicate key error" #291

Merged
merged 16 commits into from
Jul 14, 2016

Conversation

babbageclunk
Copy link
Contributor

According to the Mongo docs (link below), getting this error from an upsert is expected
behaviour and the client should retry if it happens.
https://docs.mongodb.com/v3.2/reference/method/db.collection.update/#use-unique-indexes

It doesn't seem to be specific to Mongo 3.2, but I think that the performance characteristics have changed in a way that makes it much more common than it is with 2.x.

Looking through the code revealed that removes had the same problem - I verified this by modifying the repro code to do removes instead.

This fixes https://github.com/go-mgo/mgo/issues/277

According to the Mongo docs (link below), getting this error is expected
behaviour and the client should retry if it happens.
https://docs.mongodb.com/v3.2/reference/method/db.collection.update/#use-unique-indexes

This fixes https://github.com/go-mgo/mgo/issues/277
@niemeyer
Copy link
Contributor

Any chance we can get a test?

Also, wonder if we should be retrying on mgo itself. Any reason not to?

@babbageclunk
Copy link
Contributor Author

Good point, the retry should definitely be in mgo, rather than mgo.txn. I'll move it. Are you alright with the infinite loop to do the retry? Or should I be doing something a bit cleverer?

Sorry, I should have added a test - it's fiddly to do, but it looks like there are already some other tests trying to reproduce racy things.

Also, use IsDup to detect the duplicate key errors - it looks like
that's been accreted through painful experience.
@babbageclunk
Copy link
Contributor Author

Hi, sorry for the delay on this - I've moved the retrying into mgo, in Query.Apply and Collection.Upsert. Unfortunately I've had real trouble reproducing the problem in a test at that level - no matter how I try I can't get the upserts to fail with an 11000 error.

However when I run the txn tests TestTxnQueueStashStressTest was reliably failing and now passes consistently with the change. (I don't really understand why the txn tests aren't run by go test ./... from the mgo directory.)

@babbageclunk
Copy link
Contributor Author

It seems like the -check.v in .travis.yml means the mgo/txn tests aren't being run - is that deliberate? I've added a line to run the txn tests as well.

@babbageclunk
Copy link
Contributor Author

Hmm - that's unfortunate. It looks like TestTxnQueueStressTest fails intermittently on the parent branch as well. I'm going to pull the change to turn on the mgo/txn tests out of this PR into a new one and add a skip for that test until someone can work out what's going wrong.

@babbageclunk
Copy link
Contributor Author

Sigh - no, of course, that doesn't work because this PR fixes TestTxnQueueStashStressTest. So I'm including the skip here.

At the moment it fails about 25% of the time - the reason needs to be
worked out, but it's probably better to get the txn tests running first.
It fails sometimes in a way that seems like a timing issue - put a sleep
in before retrying (up to 3 attempts).
If 10 1-second waits aren't enough then it's likely that the "no
reachable servers" error isn't transient.
Make StartServer dial the server first to ensure it's up before it returns.

Error logging showed that something else was already using port 50017.

Picking a random port still gets collisions :(, so try picking an unused
one instead.
@babbageclunk
Copy link
Contributor Author

To show that this change does fix the duplicate key error, here's a build of a branch that has the txn tests turned on without the upsert retrying - TestTxnQueueStashStressTest fails in the mongo 3.2 test run.

@@ -2484,7 +2484,15 @@ func (c *Collection) Upsert(selector interface{}, update interface{}) (info *Cha
Flags: 1,
Upsert: true,
}
lerr, err := c.writeOp(&op, true)
var lerr *LastError
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please start by retrying at most 5 times here and in the other loop. More than that there's a good chance that there's something else wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@babbageclunk
Copy link
Contributor Author

Thanks for going through this!

This has better handling of used ports and error reporting. Remove
mgo_test.go, it's not needed now. sim_test.go:simulate was relying on
the mgoaddr global set in mgo_test.go, changed it to get the DBServer
passed in.
This is needed so the txn tests can use dbtest.
@babbageclunk
Copy link
Contributor Author

babbageclunk commented Jul 10, 2016

This is a spurious failure - the script to start the various mongo processes and set up users failed with couldn't add user: not master
The same commit passed on Travis here: https://travis-ci.org/babbageclunk/mgo/builds/143769207


if err == nil {
break
} else if change.Upsert && IsDup(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that the else is redundant, perhaps leave it out and put the if on a new line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing those two elses and making the ifs independent reads much better, thanks

@rogpeppe
Copy link
Contributor

LGTM FWIW. What an unfortunate semantic.

@rogpeppe
Copy link
Contributor

rogpeppe commented Jul 12, 2016

BTW it would be great if this could land soon, as it fixes a current critical bug in juju.

@@ -2484,7 +2487,15 @@ func (c *Collection) Upsert(selector interface{}, update interface{}) (info *Cha
Flags: 1,
Upsert: true,
}
lerr, err := c.writeOp(&op, true)
var lerr *LastError
for i := 0; i < maxUpsertRetries; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe mgo has an ability to hook in a log function, can we get retries like this into that mechanism so we have an idea how often it might be triggering.
Maybe something like logging only on success if i > 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Clarify the retry logic in Query.Apply.
Clarify what happens when test fails and what it might mean.
// https://docs.mongodb.com/v3.2/reference/method/db.collection.update/#use-unique-indexes
if !IsDup(err) {
if i > 0 {
debugf("upsert retry succeeded after %d failure(s)", i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is bogus. Not having a dup error does not mean the upsert succeeded.

Please drop the debug message altogether. If we have debug on, we'll see the attempts going through.

@niemeyer
Copy link
Contributor

Merging, will fix that afterwards.

@niemeyer niemeyer merged commit aee6a64 into go-mgo:v2-unstable Jul 14, 2016
@babbageclunk
Copy link
Contributor Author

Thanks!

@babbageclunk
Copy link
Contributor Author

I've got a branch with these same changes against the v2 branch. Do you want me to make another PR for that, or will you just merge this across to there?

babbageclunk added a commit to babbageclunk/juju that referenced this pull request Aug 18, 2016
This version includes the fixes for the duplicate key error on upsert,
which means we don't need the patch any more.

go-mgo/mgo#291
go-mgo/mgo#316
jujubot added a commit to juju/juju that referenced this pull request Aug 22, 2016
Update mgo dependency

This version includes the fixes for the duplicate key error on upsert, which means we don't need the patch any more.

[go-mgo/mgo#291](go-mgo/mgo#291)
[go-mgo/mgo#316](go-mgo/mgo#316)

Add a readme to keep the patches directory alive and explain its use. mgz has made a change to apply_patches.py to ignore any files with a different extension.

(Review request: http://reviews.vapour.ws/r/5477/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants