Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for deadlock in globalsign/mgo#120 #121

Merged
merged 5 commits into from
Mar 22, 2018

Conversation

KJTsanaktsidis
Copy link

As discussed in the issue #120, isMaster() can cause a deadlock with the topology scanner if the connection it makes dies before running the command; mgo automagically attempts to make another socket in acquireSocket, but this can't work without topology.

This PR provides a test that fails ~20% of the time on my machine with the deadlock identified in the linked issue. It also proposes a fix which is to force isMaster to run its commands on the socket it sets on the session.

domodwyer
domodwyer previously approved these changes Mar 12, 2018
@domodwyer
Copy link

Hi @KJTsanaktsidis

Thanks a load! Having a reproducible test case makes this 1000x easier. Would you mind retargeting to development and we'll merge this.

Dom

@domodwyer
Copy link

Sorry it's Monday - fix included!

Amazing work - thanks very much - I thought I was going to spend today digging through the mgo code! I'll get this merged and tested.

Again, thanks!
Dom

We've seen a deadlock happen occasionally where syncServers needs to
acquire a socket to call isMaster, but the socket acquisition needs to
know the server topology which isn't known yet. See globalsign#120
issue for a detailed breakdown.

This replicates the issue by setting up a mongo "server" which closes
sockets as soon as they're opened; about 20% of the time, this will
trigger the deadlock because the acquired socket for ismaster() dies and
needs to be reacquired.
As discussed in the issue globalsign#120, isMaster() can cause a
deadlock with the topology scanner if the connection it makes dies
before running the command; mgo automagically attempts to make another
socket in acquireSocket, but this can't work without topology.

This commit forces isMaster() to actually run on the intended socket.
@KJTsanaktsidis KJTsanaktsidis changed the base branch from master to development March 12, 2018 21:49
@KJTsanaktsidis
Copy link
Author

Sorry, forgot about that. I've rebased - PTAL.

domodwyer
domodwyer previously approved these changes Mar 14, 2018
session.go Outdated
// RunOnSocket does the same as Run, but guarantees that your command will be run
// on the provided socket instance; if it's unhealthy, you will receive the error
// from it.
func (db *Database) RunOnSocket(socket *mongoSocket, cmd interface{}, result interface{}) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that accepting a package level parameter in a public method is a good idea.
There is no way this can be used outside of the mgo package.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KJTsanaktsidis maybe make a private function instead of a public method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can definitely do this - don’t think I’ll get to it for a couple of days though, sorry :/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KJTsanaktsidis you could merge zendesk#1 if you agree with the change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @dvic - I merged your change into this PR. Sorry this took so long for me to get to :/

@KJTsanaktsidis
Copy link
Author

@szank PTAL? @dvic fixed it for me :)

@szank szank merged commit 876956d into globalsign:development Mar 22, 2018
libi pushed a commit to libi/mgo that referenced this pull request Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants