Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ca: Don't block server's Stop until handlers finish #1841

Merged
merged 1 commit into from Jan 10, 2017

Conversation

aaronlehmann
Copy link
Collaborator

The ca server's Stop method waits until all running handlers finish.
This may deadlock if quorum is not met at the time. Some of the other
services like dispatcher used to work like this, but it shouldn't be
necessary anymore. Remove the wait group.

cc @cyli @LK4D4

The ca server's Stop method waits until all running handlers finish.
This may deadlock if quorum is not met at the time. Some of the other
services like dispatcher used to work like this, but it shouldn't be
necessary anymore. Remove the wait group.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@cyli
Copy link
Contributor

cyli commented Jan 5, 2017

LGTM - it seems reasonable to me that we don't have to wait for the handlers to finish - if there is an error seems like the client can just retry and contact another CA server (if it's stopping because it's no longer the leader).

Could you explain what causes the deadlock though? The handlers may take a long time to finish because without quorum it cannot read or update data, but does the CA not being able to be stopped prevent quorum from being achieved? Or is it that the manager cannot be stopped because the manager raft server shuts down first, and if quorum is lost then the CA server can't shut down? Or something else entirely?

@aaronlehmann
Copy link
Collaborator Author

The deadlock turned out to be a more general problem, and I'm taking some steps to deal with it in #1829. Think of this one as more of a simplification.

Basically, you'd call Stop and it would block forever because it was waiting for the handler to finish, and the handler was waiting for something to commit to raft.

@codecov-io
Copy link

codecov-io commented Jan 5, 2017

Current coverage is 54.48% (diff: 71.42%)

Merging #1841 into master will decrease coverage by 0.16%

@@             master      #1841   diff @@
==========================================
  Files           102        102          
  Lines         17025      17018     -7   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           9304       9272    -32   
- Misses         6585       6603    +18   
- Partials       1136       1143     +7   

Sunburst

Powered by Codecov. Last update f355ca1...65de375

@cyli
Copy link
Contributor

cyli commented Jan 5, 2017

Ah ok, thanks for explaining!

@dperny
Copy link
Collaborator

dperny commented Jan 9, 2017

This looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants