New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cluster events #32421

Merged
merged 1 commit into from May 17, 2017

Conversation

@dongluochen
Contributor

dongluochen commented Apr 6, 2017

This PR is ready for review.

===
This is a work item in progress. Test needs to be added. It also needs a change in Swarmkit Watch API docker/swarmkit#2099. Initial design discussion is at docker/swarmkit#491. I'd like to get design feedback.

Signed-off-by: Dong Chen dongluo.chen@docker.com

- What I did
Add swarm events to docker event stream.

- How I did it
Use Swarm Store Watch API to get change notification from Raft store. Translate that into Docker event format and push it into Docker event structure.

- How to verify it
On a manager node, docker events includes node's local events and cluster events. Cluster events have a global scope while node's local events are local. Existing event filters apply to cluster events. A new scope filter is added.

#run this on a manager node
$ docker events -f scope=global

# a node joins a cluster
2017-04-06T18:03:46.551104594Z node create s0ugk1wi0vgxnspxx0ptmon47 (name=)
2017-04-06T18:03:46.553642227Z node update s0ugk1wi0vgxnspxx0ptmon47 (name=)
2017-04-06T18:03:46.556070184Z node update s0ugk1wi0vgxnspxx0ptmon47 (name=)
2017-04-06T18:03:46.562025609Z node update s0ugk1wi0vgxnspxx0ptmon47 (name=)
2017-04-06T18:03:46.689608127Z node update s0ugk1wi0vgxnspxx0ptmon47 (name=ip-172-19-71-145, state.new=ready, state.old=unknown)
# a node goes down
2017-04-06T18:04:32.705082118Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51, state.new=down, state.old=ready)
# a node is back up
2017-04-06T18:05:06.288643169Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51, state.new=ready, state.old=down)
# promote a node
2017-04-06T18:05:29.965972204Z node update 0urgktyobae3etk5e4n4331es (desiredrole.new=manager, desiredrole.old=worker, name=ip-172-19-147-51)
2017-04-06T18:05:29.972791974Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:05:29.992599632Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:05:30.007889135Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:05:30.084421029Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:05:30.088454689Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
# demote a node
2017-04-06T18:06:15.004097376Z node update 0urgktyobae3etk5e4n4331es (desiredrole.new=worker, desiredrole.old=manager, name=ip-172-19-147-51)
2017-04-06T18:06:20.015726988Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:06:20.035951960Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:06:20.048380156Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:06:20.068198599Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
2017-04-06T18:06:20.072033661Z node update 0urgktyobae3etk5e4n4331es (name=ip-172-19-147-51)
# change a node's availability to pause
2017-04-06T18:07:01.620569322Z node update 0urgktyobae3etk5e4n4331es (availability.new=pause, availability.old=active, name=ip-172-19-147-51)
# change a node's availability to active
2017-04-06T18:07:35.825859057Z node update 0urgktyobae3etk5e4n4331es (availability.new=active, availability.old=pause, name=ip-172-19-147-51)

# create a service
2017-04-06T18:08:43.303340796Z service create 9vvofszhb6iv4k3tmphras96u (name=nginx)
2017-04-06T18:08:43.307625739Z service update 9vvofszhb6iv4k3tmphras96u (name=nginx)
# scale a service
2017-04-06T18:09:37.018798236Z service update 9vvofszhb6iv4k3tmphras96u (name=nginx, replicas.new=3, replicas.old=2)
# update image of a service
2017-04-06T18:11:19.122732546Z service update 9vvofszhb6iv4k3tmphras96u (image.new=nginx:1.10.3@sha256:6202beb06ea61f44179e02ca965e8e13b961d12640101fca213efbfd145d7575, image.old=nginx:latest@sha256:e6693c20186f837fc393390135d8a598a96a833917917789d63766cab6c59582, name=nginx)
2017-04-06T18:11:19.126619069Z service update 9vvofszhb6iv4k3tmphras96u (name=nginx, updatestate.new=updating, updatestate.old=nil)
2017-04-06T18:11:41.552581741Z service update 9vvofszhb6iv4k3tmphras96u (name=nginx, updatestate.new=completed, updatestate.old=updating)
# remove a service
2017-04-06T18:12:23.466026101Z service remove 9vvofszhb6iv4k3tmphras96u (name=nginx)

# create a network
2017-04-06T18:12:45.047497468Z network create qew8jjh6riv75r8tj4wf0imfw (name=tnet)
2017-04-06T18:12:45.053957928Z network update qew8jjh6riv75r8tj4wf0imfw (name=tnet)
# create a container associtated with this network
# create a service associated with this network
2017-04-06T18:14:38.161988206Z service create v6md2fr205au7c6wq5zdo6sz1 (name=nginx)
2017-04-06T18:14:38.166951070Z service update v6md2fr205au7c6wq5zdo6sz1 (name=nginx)
# remove the service
2017-04-06T18:15:29.441350106Z service remove v6md2fr205au7c6wq5zdo6sz1 (name=nginx)
# remove a network
2017-04-06T18:15:43.558832104Z network remove qew8jjh6riv75r8tj4wf0imfw (name=tnet)

# create a secret
2017-04-06T18:16:27.127307662Z secret create p3z8b3f2q40fun94yq3vd2g9y (name=mysecret)
# remove a secret
2017-04-06T18:16:55.393649748Z secret remove p3z8b3f2q40fun94yq3vd2g9y (name=mysecret)

- Description for the changelog
Add cluster events to Docker event stream.

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen May 9, 2017

Contributor

Test failure looks unrelated. It's already tracked by #33041.

23:20:09 ----------------------------------------------------------------------
23:20:09 FAIL: check_test.go:355: DockerSwarmSuite.TearDownTest
23:20:09 
23:20:09 check_test.go:360:
23:20:09     d.Stop(c)
23:20:09 daemon/daemon.go:392:
23:20:09     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
23:20:09 ... Error: Error while stopping the daemon d95be62fe1bc6 : exit status 2
Contributor

dongluochen commented May 9, 2017

Test failure looks unrelated. It's already tracked by #33041.

23:20:09 ----------------------------------------------------------------------
23:20:09 FAIL: check_test.go:355: DockerSwarmSuite.TearDownTest
23:20:09 
23:20:09 check_test.go:360:
23:20:09     d.Stop(c)
23:20:09 daemon/daemon.go:392:
23:20:09     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
23:20:09 ... Error: Error while stopping the daemon d95be62fe1bc6 : exit status 2

@dongluochen dongluochen changed the title from [WIP] support cluster events to support cluster events May 9, 2017

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen May 9, 2017

Contributor

Please review.

Contributor

dongluochen commented May 9, 2017

Please review.

case *swarmapi.Object_Network:
daemon.logNetworkEvent(event.Action, v.Network, event.OldObject.GetNetwork())
case *swarmapi.Object_Secret:
daemon.logSecretEvent(event.Action, v.Secret, event.OldObject.GetSecret())

This comment has been minimized.

@AkihiroSuda

AkihiroSuda May 9, 2017

Member

logrus.Warn on default?

@AkihiroSuda

AkihiroSuda May 9, 2017

Member

logrus.Warn on default?

@AkihiroSuda

This comment has been minimized.

Show comment
Hide comment
@AkihiroSuda
Member

AkihiroSuda commented May 9, 2017

Show outdated Hide outdated daemon/cluster/cluster.go
Show outdated Hide outdated daemon/cluster/noderunner.go
Show outdated Hide outdated daemon/cluster/noderunner.go
Show outdated Hide outdated daemon/cluster/noderunner.go
Show outdated Hide outdated daemon/cluster/noderunner.go
Show outdated Hide outdated daemon/events.go
Show outdated Hide outdated daemon/events.go
Show outdated Hide outdated daemon/events.go
Show outdated Hide outdated daemon/events.go
Show outdated Hide outdated daemon/events/events.go
}
switch v := event.Object.GetObject().(type) {
case *swarmapi.Object_Node:
daemon.logNodeEvent(event.Action, v.Node, event.OldObject.GetNode())

This comment has been minimized.

@aaronlehmann

aaronlehmann May 10, 2017

Contributor

Don't we need to check if OldObject is nil before calling GetNode, etc.?

@aaronlehmann

aaronlehmann May 10, 2017

Contributor

Don't we need to check if OldObject is nil before calling GetNode, etc.?

This comment has been minimized.

@dongluochen

dongluochen May 10, 2017

Contributor

The GetNode function handles it directly.

   func (m *Object) GetNode() *Node {
       if x, ok := m.GetObject().(*Object_Node); ok {
         return x.Node
       }
       return nil
   }

   func (m *Object) GetObject() isObject_Object {
       if m != nil {
         return m.Object
       }
       return nil
   }
@dongluochen

dongluochen May 10, 2017

Contributor

The GetNode function handles it directly.

   func (m *Object) GetNode() *Node {
       if x, ok := m.GetObject().(*Object_Node); ok {
         return x.Node
       }
       return nil
   }

   func (m *Object) GetObject() isObject_Object {
       if m != nil {
         return m.Object
       }
       return nil
   }
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann May 10, 2017

Contributor

LGTM

Contributor

aaronlehmann commented May 10, 2017

LGTM

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann May 11, 2017

Contributor

Please rebase

Contributor

aaronlehmann commented May 11, 2017

Please rebase

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 11, 2017

Member

@dongluochen looks like CI failed on a TearDown, could this be related?

21:22:11 
21:22:11 ----------------------------------------------------------------------
21:22:11 FAIL: check_test.go:355: DockerSwarmSuite.TearDownTest
21:22:11 
21:22:11 check_test.go:360:
21:22:11     d.Stop(c)
21:22:11 daemon/daemon.go:392:
21:22:11     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
21:22:11 ... Error: Error while stopping the daemon d7e64b157df6f : exit status 2
21:22:11 
21:22:11 
21:22:11 ----------------------------------------------------------------------
21:22:11 PANIC: docker_api_swarm_service_test.go:27: DockerSwarmSuite.TestAPIServiceUpdatePort
21:22:11 
21:22:11 [d7e64b157df6f] waiting for daemon to start
21:22:11 [d7e64b157df6f] daemon started
21:22:11 
21:22:11 [d7e64b157df6f] exiting daemon
21:22:11 ... Panic: Fixture has panicked (see related PANIC)
21:22:11 
21:22:11 ----------------------------------------------------------------------
Member

thaJeztah commented May 11, 2017

@dongluochen looks like CI failed on a TearDown, could this be related?

21:22:11 
21:22:11 ----------------------------------------------------------------------
21:22:11 FAIL: check_test.go:355: DockerSwarmSuite.TearDownTest
21:22:11 
21:22:11 check_test.go:360:
21:22:11     d.Stop(c)
21:22:11 daemon/daemon.go:392:
21:22:11     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
21:22:11 ... Error: Error while stopping the daemon d7e64b157df6f : exit status 2
21:22:11 
21:22:11 
21:22:11 ----------------------------------------------------------------------
21:22:11 PANIC: docker_api_swarm_service_test.go:27: DockerSwarmSuite.TestAPIServiceUpdatePort
21:22:11 
21:22:11 [d7e64b157df6f] waiting for daemon to start
21:22:11 [d7e64b157df6f] daemon started
21:22:11 
21:22:11 [d7e64b157df6f] exiting daemon
21:22:11 ... Panic: Fixture has panicked (see related PANIC)
21:22:11 
21:22:11 ----------------------------------------------------------------------
@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann May 11, 2017

Contributor

I think it is related.

Contributor

aaronlehmann commented May 11, 2017

I think it is related.

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen May 11, 2017

Contributor

Looking at it. I saw similar issue at #33041 so I didn't pay attention. I should have looked into it.

Contributor

dongluochen commented May 11, 2017

Looking at it. I saw similar issue at #33041 so I didn't pay attention. I should have looked into it.

@dongluochen

This comment has been minimized.

Show comment
Hide comment
@dongluochen

dongluochen May 16, 2017

Contributor

Ping @AkihiroSuda @vdemeester @tonistiigi. Please take a look if you are interested.

Contributor

dongluochen commented May 16, 2017

Ping @AkihiroSuda @vdemeester @tonistiigi. Please take a look if you are interested.

@thaJeztah thaJeztah added this to the 17.06.0 milestone May 17, 2017

Show outdated Hide outdated cmd/dockerd/daemon.go
@@ -227,13 +228,18 @@ func (cli *DaemonCli) start(opts daemonOptions) (err error) {
name, _ := os.Hostname()
// Use a buffered channel to pass changes from store watch API to daemon
// A buffer allows store watch API and daemon processing to not wait for each other
watchStream := make(chan *swarmapi.WatchMessage, 32)

This comment has been minimized.

@tonistiigi

tonistiigi May 17, 2017

Member

No need to change but wondering why you chose to pass in a channel and handle this in dockerd instead of defining a LogEvent interface in cluster, that would be called to process events.

@tonistiigi

tonistiigi May 17, 2017

Member

No need to change but wondering why you chose to pass in a channel and handle this in dockerd instead of defining a LogEvent interface in cluster, that would be called to process events.

This comment has been minimized.

@dongluochen

dongluochen May 17, 2017

Contributor

It was decided to add a watch API to expose cluster change. But do not add a separate LogEvent interface for simplicity.
docker/swarmkit#491 (comment)

@dongluochen

dongluochen May 17, 2017

Contributor

It was decided to add a watch API to expose cluster change. But do not add a separate LogEvent interface for simplicity.
docker/swarmkit#491 (comment)

This comment has been minimized.

@tonistiigi

tonistiigi May 17, 2017

Member

I meant interface in docker's cluster pkg like cluster.Backend cluster.NetworkSubnetsProvider that would be implemented by daemon.Daemon

@tonistiigi

tonistiigi May 17, 2017

Member

I meant interface in docker's cluster pkg like cluster.Backend cluster.NetworkSubnetsProvider that would be implemented by daemon.Daemon

This comment has been minimized.

@dongluochen

dongluochen May 17, 2017

Contributor

I'll look at it in a separate change.

@dongluochen

dongluochen May 17, 2017

Contributor

I'll look at it in a separate change.

@vdemeester

LGTM 🦁 😍
(one small question, but can't wait to have that !!)

"name": name,
}
eventTime := eventTimestamp(node.Meta, action)
// In an update event, display the changes in attributes

This comment has been minimized.

@vdemeester

vdemeester May 17, 2017

Member

We don't display labels changes, by design ?

@vdemeester

vdemeester May 17, 2017

Member

We don't display labels changes, by design ?

This comment has been minimized.

@dongluochen

dongluochen May 17, 2017

Contributor

The attributes are extra information for an event. For example, a node update event 2017-04-06T18:07:01.620569322Z node update 0urgktyobae3etk5e4n4331es (availability.new=pause, availability.old=active, name=ip-172-19-147-51) contains a fix part timestamp node update nodeID and the attributes part availability.new, availability.old, name. We can add labels change to the attributes if they are useful.

Attributes are added to events in an ad-hoc way. Since there are a lot of information in swarm objects, revealing all changes may reduce readability. For example, a node goes thru a list of changes when joining a cluster, which is internal procedure and users shouldn't put much effort to inspect them.

I'm not very sure how this will be used. We are starting with basic events docker/swarmkit#491 (comment). I think it'll be extended based on user feedbacks.

@dongluochen

dongluochen May 17, 2017

Contributor

The attributes are extra information for an event. For example, a node update event 2017-04-06T18:07:01.620569322Z node update 0urgktyobae3etk5e4n4331es (availability.new=pause, availability.old=active, name=ip-172-19-147-51) contains a fix part timestamp node update nodeID and the attributes part availability.new, availability.old, name. We can add labels change to the attributes if they are useful.

Attributes are added to events in an ad-hoc way. Since there are a lot of information in swarm objects, revealing all changes may reduce readability. For example, a node goes thru a list of changes when joining a cluster, which is internal procedure and users shouldn't put much effort to inspect them.

I'm not very sure how this will be used. We are starting with basic events docker/swarmkit#491 (comment). I think it'll be extended based on user feedbacks.

This comment has been minimized.

@thaJeztah

thaJeztah May 17, 2017

Member

I've seen many tools using labels to add configuration that's used when listening for events - labels would be a likely candidate to add

@thaJeztah

thaJeztah May 17, 2017

Member

I've seen many tools using labels to add configuration that's used when listening for events - labels would be a likely candidate to add

support cluster events
Signed-off-by: Dong Chen <dongluo.chen@docker.com>

@aaronlehmann aaronlehmann merged commit 3d63049 into moby:master May 17, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 34260 has succeeded
Details
janky Jenkins build Docker-PRs 42860 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 3245 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 14099 has succeeded
Details
z Jenkins build Docker-PRs-s390x 2964 has succeeded
Details

@dongluochen dongluochen referenced this pull request May 18, 2017

Closed

API: Events stream #491

@WTFKr0 WTFKr0 referenced this pull request Jun 7, 2017

Open

Support Swarm events #920

@thaJeztah thaJeztah referenced this pull request Sep 20, 2017

Closed

Swarm events #23827

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment