Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager: Always run the watch server #2323

Merged
merged 1 commit into from
Jul 20, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions manager/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,10 @@ func (m *Manager) Run(parent context.Context) error {
healthServer.SetServingStatus("Raft", api.HealthCheckResponse_NOT_SERVING)
localHealthServer.SetServingStatus("ControlAPI", api.HealthCheckResponse_NOT_SERVING)

if err := m.watchServer.Start(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: probably doesn't matter much, but would it make sense to start this after the raft node has started up and joined (when the other watches are set)? Otherwise if something starts watching right away, would they get the events from the raft node starting up and loading all of its state from disk?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, where in the code would you suggest? I don't think the end of this startup process is signaled to higher-level code, but I may be forgetting something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, there's no guarantee or anything. But we start the other watches at https://github.com/docker/swarmkit/pull/2323/files#diff-8077df928eb040c7c69eea83f15e3c9dL542, after we've finished loading raft state from disk and we've observed a leader and cluster state - possibly starting the watch server can also happen at that time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion, but I'm worried it would trigger the error here, which is not recoverable:

https://github.com/moby/moby/blob/8d703b98b5c403743bf17e22395e32a7271b8d3c/daemon/cluster/noderunner.go#L212-L215

I'd prefer to just fix the regression in this PR, and a future change that's coordinated with some added retry/backoff in moby/moby could delay starting the watch server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, sounds good.

log.G(ctx).WithError(err).Error("watch server failed to start")
}

go m.serveListener(ctx, m.remoteListener)
go m.serveListener(ctx, m.controlListener)

Expand Down Expand Up @@ -566,8 +570,8 @@ func (m *Manager) Run(parent context.Context) error {
const stopTimeout = 8 * time.Second

// Stop stops the manager. It immediately closes all open connections and
// active RPCs as well as stopping the scheduler. If clearData is set, the
// raft logs, snapshots, and keys will be erased.
// active RPCs as well as stopping the manager's subsystems. If clearData is
// set, the raft logs, snapshots, and keys will be erased.
func (m *Manager) Stop(ctx context.Context, clearData bool) {
log.G(ctx).Info("Stopping manager")
// It's not safe to start shutting down while the manager is still
Expand Down Expand Up @@ -601,6 +605,7 @@ func (m *Manager) Stop(ctx context.Context, clearData bool) {

m.dispatcher.Stop()
m.logbroker.Stop()
m.watchServer.Stop()
m.caserver.Stop()

if m.allocator != nil {
Expand Down Expand Up @@ -1006,10 +1011,6 @@ func (m *Manager) becomeLeader(ctx context.Context) {
log.G(ctx).WithError(err).Error("LogBroker failed to start")
}

if err := m.watchServer.Start(ctx); err != nil {
log.G(ctx).WithError(err).Error("watch server failed to start")
}

go func(server *ca.Server) {
if err := server.Run(ctx); err != nil {
log.G(ctx).WithError(err).Error("CA signer exited with an error")
Expand Down Expand Up @@ -1062,7 +1063,6 @@ func (m *Manager) becomeLeader(ctx context.Context) {
func (m *Manager) becomeFollower() {
m.dispatcher.Stop()
m.logbroker.Stop()
m.watchServer.Stop()
m.caserver.Stop()

if m.allocator != nil {
Expand Down