Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Daemon “Node join event” logs in 17.06 #33962

Closed
kinghuang opened this issue Jul 5, 2017 · 16 comments
Closed

Question: Daemon “Node join event” logs in 17.06 #33962

kinghuang opened this issue Jul 5, 2017 · 16 comments

Comments

@kinghuang
Copy link

Description

Since upgrading my swarms to Docker CE 17.06, the Docker daemons have been endlessly outputting info log lines about “Node join event for (some node here)”. Is this normal? They are annoying in that there's a huge number of log lines being generated, but there hasn't been any noticeable ill-effect.

Steps to reproduce the issue:

  1. Given a 10-node Docker swarm on Docker CE 17.05, upgrade the entire swarm to 17.06.

Describe the results you received:

The Docker daemon logs on all the swarm nodes are endlessly outputting log lines about node join events. Here's a brief sample.

Jul 05 06:48:00 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:48:00.549832682-06:00" level=info msg="Node join event for itrmsdev06.ucalgary.ca-1765e9fefb75/10.41.149.149"
Jul 05 06:48:02 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:48:02.849150257-06:00" level=info msg="Node join event for itrmsdev07.ucalgary.ca-0550e75fb99a/10.41.149.147"
Jul 05 06:48:32 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:48:32.854937108-06:00" level=info msg="Node join event for itrmsdev08.ucalgary.ca-ba7e5b1ad050/10.41.149.87"
Jul 05 06:49:02 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:49:02.858112455-06:00" level=info msg="Node join event for itrmsdev06.ucalgary.ca-1765e9fefb75/10.41.149.149"
Jul 05 06:49:04 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:49:04.130278304-06:00" level=info msg="Node join event for itrmsdev02.ucalgary.ca-a87b5ed6e3c6/10.41.149.139"
Jul 05 06:49:32 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:49:32.863643544-06:00" level=info msg="Node join event for itrmsdev02.ucalgary.ca-a87b5ed6e3c6/10.41.149.139"
Jul 05 06:49:36 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:49:36.352351798-06:00" level=info msg="Node join event for itrmsdev07.ucalgary.ca-0550e75fb99a/10.41.149.147"
Jul 05 06:49:53 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:49:53.765021527-06:00" level=info msg="Node join event for itrmsdev04.ucalgary.ca-4eb27ff0412f/10.41.149.148"
Jul 05 06:49:53 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:49:53.815197302-06:00" level=info msg="Node join event for itrmsdev10.ucalgary.ca-ba90731aa607/10.41.149.84"
Jul 05 06:50:02 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:50:02.868975750-06:00" level=info msg="Node join event for itrmsdev04.ucalgary.ca-4eb27ff0412f/10.41.149.148"
Jul 05 06:50:06 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:50:06.359332176-06:00" level=info msg="Node join event for itrmsdev07.ucalgary.ca-0550e75fb99a/10.41.149.147"
Jul 05 06:50:18 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:50:18.518216484-06:00" level=info msg="Node join event for itrmsdev08.ucalgary.ca-ba7e5b1ad050/10.41.149.87"
Jul 05 06:50:23 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:50:23.820983401-06:00" level=info msg="Node join event for itrmsdev10.ucalgary.ca-ba90731aa607/10.41.149.84"
Jul 05 06:50:32 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:50:32.894985130-06:00" level=info msg="Node join event for itrmsdev04.ucalgary.ca-4eb27ff0412f/10.41.149.148"
Jul 05 06:50:36 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:50:36.364264838-06:00" level=info msg="Node join event for itrmsdev07.ucalgary.ca-0550e75fb99a/10.41.149.147"
Jul 05 06:50:48 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:50:48.524466576-06:00" level=info msg="Node join event for itrmsdev08.ucalgary.ca-ba7e5b1ad050/10.41.149.87"
Jul 05 06:51:02 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:02.900238466-06:00" level=info msg="Node join event for itrmsdev08.ucalgary.ca-ba7e5b1ad050/10.41.149.87"
Jul 05 06:51:18 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:18.532095242-06:00" level=info msg="Node join event for itrmsdev08.ucalgary.ca-ba7e5b1ad050/10.41.149.87"
Jul 05 06:51:23 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:23.836963853-06:00" level=info msg="Node join event for itrmsdev10.ucalgary.ca-ba90731aa607/10.41.149.84"
Jul 05 06:51:32 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:32.906082694-06:00" level=info msg="Node join event for itrmsdev10.ucalgary.ca-ba90731aa607/10.41.149.84"
Jul 05 06:51:34 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:34.168579959-06:00" level=info msg="Node join event for itrmsdev02.ucalgary.ca-a87b5ed6e3c6/10.41.149.139"
Jul 05 06:51:36 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:36.380119279-06:00" level=info msg="Node join event for itrmsdev07.ucalgary.ca-0550e75fb99a/10.41.149.147"
Jul 05 06:51:47 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:47.406343066-06:00" level=info msg="Node join event for itrmsdev09.ucalgary.ca-491ef8ad6092/10.41.149.86"
Jul 05 06:51:48 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:51:48.538771147-06:00" level=info msg="Node join event for itrmsdev08.ucalgary.ca-ba7e5b1ad050/10.41.149.87"
Jul 05 06:52:02 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:52:02.911722699-06:00" level=info msg="Node join event for itrmsdev04.ucalgary.ca-4eb27ff0412f/10.41.149.148"
Jul 05 06:52:32 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:52:32.918801817-06:00" level=info msg="Node join event for itrmsdev09.ucalgary.ca-491ef8ad6092/10.41.149.86"
Jul 05 06:52:53 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:52:53.860424186-06:00" level=info msg="Node join event for itrmsdev10.ucalgary.ca-ba90731aa607/10.41.149.84"
Jul 05 06:53:00 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:53:00.615539332-06:00" level=info msg="Node join event for itrmsdev06.ucalgary.ca-1765e9fefb75/10.41.149.149"
Jul 05 06:53:02 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:53:02.923647658-06:00" level=info msg="Node join event for itrmsdev02.ucalgary.ca-a87b5ed6e3c6/10.41.149.139"
Jul 05 06:53:04 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:53:04.187673479-06:00" level=info msg="Node join event for itrmsdev02.ucalgary.ca-a87b5ed6e3c6/10.41.149.139"
Jul 05 06:53:23 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:53:23.866587231-06:00" level=info msg="Node join event for itrmsdev10.ucalgary.ca-ba90731aa607/10.41.149.84"
Jul 05 06:53:30 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:53:30.621512944-06:00" level=info msg="Node join event for itrmsdev06.ucalgary.ca-1765e9fefb75/10.41.149.149"
Jul 05 06:53:32 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:53:32.928307151-06:00" level=info msg="Node join event for itrmsdev03.ucalgary.ca-ce40e4760502/10.41.149.138"
Jul 05 06:53:45 itrmsdev01.ucalgary.ca dockerd[61491]: time="2017-07-05T06:53:45.632933649-06:00" level=info msg="Node join event for itrmsdev03.ucalgary.ca-ce40e4760502/10.41.149.138"

Describe the results you expected:

Previous versions of Docker did not produce these log lines. They are adding very significant noise to the daemon logs. I'm not sure if this indicates a problem, or what node join events mean in this context.

Additional information you deem important (e.g. issue happens only occasionally):

The log line comes from libnetwork in networkdb/delegate.go for NodeEventTypeJoin, and was merged in moby/libnetwork#1775.

Output of docker version:

→  ~ docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:31:53 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:21:56 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

→  ~ docker info
Containers: 88
 Running: 37
 Paused: 0
 Stopped: 51
Images: 681
Server Version: 17.06.0-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: gelf
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 5i7ghc9agzge2hqfrmrgcucvn
 Is Manager: true
 ClusterID: m0gz05zqgmhx67noqdvko2npr
 Managers: 3
 Nodes: 10
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 10.41.149.137
 Manager Addresses:
  10.41.149.137:2377
  10.41.149.138:2377
  10.41.149.139:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.21.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51GiB
Name: itrmsdev01.ucalgary.ca
ID: MYIJ:6PFS:CZR5:QGU6:2ZBA:S3T7:3ER5:3QRM:5QYB:ZM22:SH6N:S2JP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

The data here is from a 10-node Docker CE 17.06.0 swarm on RHEL 7.3. My other Docker CE 17.06.0 swarms also exhibit the same behaviour.

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 5, 2017

Looks like something in libnetwork, which I don't see in master.

Ping @sanimej

@sanimej
Copy link

sanimej commented Jul 5, 2017

@kinghuang This should have been a debug message.

All the cluster nodes exchange a gossip message to indicate what networks are currently active on a node. This message is also synced over TCP every 30 seconds or if a peer queries for it. This doesn't indicate nor should cause any issue other than the chatty logs. I will push a change to fix it.

@thaJeztah
Copy link
Member

Opened moby/libnetwork#1833 to change the logging level from info to debug

@thaJeztah
Copy link
Member

thaJeztah commented Aug 5, 2017

The noisy logs are resolved on master through 68d8198 (#34103), and included in 17.07

@Cas-pian
Copy link

@thaJeztah Will this also be fixed in 17.06.1?

@matzegebbe
Copy link

@Cas-pian For me I still see it in 17.06.1

Aug 22 09:30:24 emea-jas-a13p dockerd[12307]: time="2017-08-22T09:30:24.697638549+02:00" level=info msg="Node join event for emea-jas-a16p-5d424159f458/172.20.26.21"

root@xxxx:~# docker version
Client:
 Version:      17.06.1-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   874a737
 Built:        Thu Aug 17 22:01:59 2017
 OS/Arch:      linux/amd64

@kinghuang
Copy link
Author

I've upgraded to Docker CE 17.07 on one of my swarms, but I'm still getting the “Node join event …” log lines at info level.

@thaJeztah Is the fix for sure in 17.07? Or, is this a new problem?

Aug 31 16:52:16 itrmswebdev01.ucalgary.ca dockerd[27005]: time="2017-08-31T16:52:16.106887576-06:00" level=info msg="Node join event for itrmswebdev02.ucalgary.ca-452ee701da73/10.43.64.120"
Aug 31 16:52:46 itrmswebdev01.ucalgary.ca dockerd[27005]: time="2017-08-31T16:52:46.110052018-06:00" level=info msg="Node join event for itrmswebdev03.ucalgary.ca-76769f98baac/10.43.64.121"
Aug 31 16:53:14 itrmswebdev01.ucalgary.ca dockerd[27005]: time="2017-08-31T16:53:14.323864136-06:00" level=info msg="Node join event for itrmswebdev03.ucalgary.ca-76769f98baac/10.43.64.121"
Aug 31 16:53:16 itrmswebdev01.ucalgary.ca dockerd[27005]: time="2017-08-31T16:53:16.114292669-06:00" level=info msg="Node join event for itrmswebdev02.ucalgary.ca-452ee701da73/10.43.64.120"
Aug 31 16:53:44 itrmswebdev01.ucalgary.ca dockerd[27005]: time="2017-08-31T16:53:44.327674147-06:00" level=info msg="Node join event for itrmswebdev03.ucalgary.ca-76769f98baac/10.43.64.121"
Aug 31 16:53:46 itrmswebdev01.ucalgary.ca dockerd[27005]: time="2017-08-31T16:53:46.117058648-06:00" level=info msg="Node join event for itrmswebdev03.ucalgary.ca-76769f98baac/10.43.64.121"
[kchuang@itrmswebdev01 ~]$ docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 17.07.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: y1xhcbb4im3wrj1e4yd6jwksh
 Is Manager: true
 ClusterID: v35u6m3fg05esczglsvdsoqur
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 10.43.64.119
 Manager Addresses:
  10.43.64.119:2377
  10.43.64.120:2377
  10.43.64.121:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3addd840653146c90a254301d6c3a663c7fd6429
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.694GiB
Name: itrmswebdev01.ucalgary.ca
ID: ZN6S:FSZV:RPTY:GGAZ:DK2F:UVE3:RDUO:CPF4:LQUJ:TBJW:4HSR:OQ2C
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

@kinghuang
Copy link
Author

@thaJeztah I've completed upgrading my swarms to 17.07.0-ce. They're all still outputing the “Node join event…” messages.

@thaJeztah
Copy link
Member

I can confirm the logs are still noisy; I dived a bit in the code to understand what everything does, and think I see what's happening (I'm not really familiar with that code, so explaining what I saw below);

First the node is retrieved from the database through getNode(), followed by some logic for single-node managers.

After that, the node is retrieved from the database again, this time using checkAndGetNode()

Looking at checkAndGetNode(), that method checks if the cached status of the node found was set by the current (or later) event and if not (thus from an older event), the node's state is removed.
In practice, this is likely to happen (i.e. a "node join" event adds the node to the nDB.nodes list, and the next event removes it from nDB.nodes).

After that, the code I added checks if the node's status was already known as "joined" (due to the explanation above, it won't be), and after that, adds the node to the list of joined nodes, and if it was a "new" join (again, probably "always"), logs the node join event.

Possible solutions:

  • check if the node exists in nDB.nodes before calling nDB.checkAndGetNode()
  • rethink nDB.checkAndGetNode() - why is the node's always removed? Instead, we could probably;
    • inside the switch nEvent.Type update the node's state depending on the type of event, and remove the node from all other states
    • if an unknown event is received, remove the node from all lists (or something?)

@fcrisciani @abhinandanpb any suggestions?

@fcrisciani
Copy link
Contributor

@thaJeztah the proper solution is in splitting the lamport time between the node events and network events and after that correct the use of the witness.

The witness clock always checks that the node lamport time is max(local, event) and if that is > than the local one it increments it by 1 and update the local one. Every 30s there is a TCP sync that exchange the full series of messages and the node related ones are getting the latest lamport time that is in the node, that got bumped by the previous logic that I was mentioning, creating the illusion to other nodes that the events received are actually new and so printing the log (and incrementing again the lamport time).
The proper solution is to make sure that the lamport time of events remains consistent and old node join events are not getting advertised with higher lamport time.
I hope that the explanation is not too convoluted, it took me some time too to understand why that was happening

@MarvAmBass
Copy link

Hi,

I'm still getting this error with version 'Server Version: 17.09.0-ce'.

@fcrisciani your solution has to do with the source code not with the server setup right? Because I don't know how to make sure that the lamport time remains consistent from the server/node side.

Thanks

@fcrisciani
Copy link
Contributor

@MarvAmBass yep correct, that will require a code change, at the moment not being a breaking issue that creates traffic disruption is not getting much priority.

@agilob
Copy link

agilob commented Dec 8, 2017

The noisy logs are resolved on master through 68d8198 (#34103), and included in 17.07

I'm still getting this on docker -v version 17.11.0-ce, build 1caf76c

Dec 08 21:14:47 rpi1 dockerd[403]: time="2017-12-08T21:14:47.461682063Z" level=info msg="Node join event for b6
Dec 08 21:14:48 rpi1 dockerd[403]: time="2017-12-08T21:14:48.268214535Z" level=info msg="Node join event for b6
Dec 08 21:14:48 rpi1 dockerd[403]: time="2017-12-08T21:14:48.461724884Z" level=info msg="Node join event for b6
Dec 08 21:14:49 rpi1 dockerd[403]: time="2017-12-08T21:14:49.348132454Z" level=info msg="Node join event for b6
Dec 08 21:14:49 rpi1 dockerd[403]: time="2017-12-08T21:14:49.472665401Z" level=info msg="Node join event for b6
Dec 08 21:14:50 rpi1 dockerd[403]: time="2017-12-08T21:14:50.283037118Z" level=info msg="Node join event for b6
Dec 08 21:14:50 rpi1 dockerd[403]: time="2017-12-08T21:14:50.461113483Z" level=info msg="Node join event for b6
Dec 08 21:14:51 rpi1 dockerd[403]: time="2017-12-08T21:14:51.268262994Z" level=info msg="Node join event for b6
Dec 08 21:14:51 rpi1 dockerd[403]: time="2017-12-08T21:14:51.461198307Z" level=info msg="Node join event for b6
Dec 08 21:14:52 rpi1 dockerd[403]: time="2017-12-08T21:14:52.272784101Z" level=info msg="Node join event for b6
Dec 08 21:14:52 rpi1 dockerd[403]: time="2017-12-08T21:14:52.461887169Z" level=info msg="Node join event for b6
Dec 08 21:14:53 rpi1 dockerd[403]: time="2017-12-08T21:14:53.279151326Z" level=info msg="Node join event for b6
Dec 08 21:14:53 rpi1 dockerd[403]: time="2017-12-08T21:14:53.463325080Z" level=info msg="Node join event for b6
Dec 08 21:14:54 rpi1 dockerd[403]: time="2017-12-08T21:14:54.273065756Z" level=info msg="Node join event for b6
Dec 08 21:14:54 rpi1 dockerd[403]: time="2017-12-08T21:14:54.461628790Z" level=info msg="Node join event for b6
Dec 08 21:14:55 rpi1 dockerd[403]: time="2017-12-08T21:14:55.267522221Z" level=info msg="Node join event for b6
Dec 08 21:14:55 rpi1 dockerd[403]: time="2017-12-08T21:14:55.461482600Z" level=info msg="Node join event for b6

@thaJeztah
Copy link
Member

@agilob if you read beyond that you would have seen that this was reopened and my later comment: #33962 (comment)

@fcrisciani
Copy link
Contributor

17.12 should fix that

@thaJeztah
Copy link
Member

Let me close this ticket for now, as it looks like it went stale.

@thaJeztah thaJeztah closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants