[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017

thaJeztah · 2017-11-20T14:01:02Z

Cherry-pick of #1944 and #1960 for 17.09

git checkout -b 17.09-backport-netdb-fix-reap upstream/bump_17.09
git cherry-pick -s -S -x 10cd98c56fb17737834c1031cad56a9932a266f4
git cherry-pick -s -S -x 3feb3aadf388d64bb8ea7e0a6614933d3d8b885a
git cherry-pick -s -S -x fbba555561bab4af332d6c1ee592976f16c609c3

Also included #1985 and #1991, which look to depend on this

# PR: https://github.com/docker/libnetwork/pull/1985
git cherry-pick -s -S -x 1c04e1980dfd6debab6e43d9db672c0f01528868

# PR: https://github.com/docker/libnetwork/pull/1991
git cherry-pick -s -S -x 52a9ab55ce92a541798d130a87a84ca34f0b6178

eduardolundgren · 2017-11-20T15:47:34Z

@thaJeztah thank you for backing port this, would be possible to make a release with it?

thaJeztah · 2017-11-20T17:26:11Z

@eduardolundgren this cherry-pick is in preparation of a possible 17.09.1 release; it's still being decided which changes need to be backported and which not.

thaJeztah · 2017-11-20T17:26:44Z

Looks like I may have to cherry-pick #1947 into this one to fix the lint-errors

eduardolundgren · 2017-11-20T18:17:28Z

Hopefully, this backport gets into 17.09.1, the current stable release corrupts the managers' quorum when the number of overlay networks gets close to 2000, in addition to that some workers go down randomly.

It starts to log lots of netPeers messages

e0cc780) - netID:kt2qn6cpuq7okhcy3d5pzo6ps leaving:false netPeers:10 entries:20 Queue qLen:0 netMsg/s:0"

Then, segmented wal file info appears

dockerd[2069]: segmented wal file /mnt/ebs/var/lib/docker/swarm/raft/wal-v3-encrypted/0000000000000039-00

Eventually, it logs a failure

dockerd[2069]: time="" level=error msg="Failed to commit allocation of network resources for node w58o35tebq6atsq1kgh665cow" error="raft: raft message is too large and can't be sent" module=node node.id=7ih3j4dbd0f3r6zy0uhvicr0o

fcrisciani · 2017-11-20T19:39:19Z

@thaJeztah lint commit is in, can you try to rebase?

- Added remainingReapTime field in the table event. Wihtout it a node that did not have a state for the element was marking the element for deletion setting the max reapTime. This was creating the possibility to keep the entry being resync between nodes forever avoding the purpose of the reap time itself. - On broadcast of the table event the node owner was rewritten with the local node name, this was not correct because the owner should continue to remain the original one of the message Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit 10cd98c) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

- Changed the loop per network. Previous implementation was taking a ReadLock to update the reapTime but now with the residualReapTime also the bulkSync is using the same ReadLock creating possible issues in concurrent read and update of the value. The new logic fetches the list of networks and proceed to the cleanup network by network locking the database and releasing it after each network. This should ensure a fair locking avoiding to keep the database blocked for too much time. Note: The ticker does not guarantee that the reap logic runs precisely every reapTimePeriod, actually documentation says that if the routine is too long will skip ticks. In case of slowdown of the process itself it is possible that the lifetime of the deleted entries increases, it still should not be a huge problem because now the residual reaptime is propagated among all the nodes a slower node will let the deleted entry being repropagate multiple times but the state will still remain consistent. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit 3feb3aa) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Make sure that the network is garbage collected after the entries. Entries to be deleted requires that the network is present. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit fbba555) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

The comparison was against the wrong constant value. As described in the comment the check is there to guarantee to not propagate events realted to stale deleted elements Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit 6f11d29) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Attachable containers they are tasks with no service associated their cleanup was not done properly so it was possible to have a leak of their name resolution if that was the last container on the network. Cleanupservicebindings was not able to do the cleanup because there is no service, while also the notification of the delete arrives after that the network is already being cleaned Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit 1c04e19) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Unit test for the cleanupServiceDiscovery, follow up of PR: moby#1985 Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit 52a9ab5) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

thaJeztah · 2017-11-20T19:44:18Z

rebased 👍

fcrisciani · 2017-11-20T22:21:50Z

LGTM

eduardolundgren · 2017-11-20T23:41:25Z

Do you know when 17.09.1 release is gonna happen?
Is this going to also get into 17.10.x and 17.11.x?
Sorry for so many questions :)
Thank you again for the fast response.

thaJeztah · 2017-11-21T00:27:48Z

No, I don't have a date for 17.09.1.

17.10 and 17.11 are edge releases, so would only get critical ("P0") patches; with 17.11 being released, docker 17.10 reached EOL.

For your specific issue, the error message indicates it may not be an issue in the networking stack, but due to the raft state becoming too big for syncing between managers; this pull request in SwarmKit raises the maximum size to 128MB; moby/swarmkit#2375, which is being backported to 17.09.1 through docker-archive/docker-ce#323, and should address the direct problem.

A bigger change is being worked on in moby/swarmkit#2458, which will use a streaming mechanism to send raft snapshots.

thaJeztah · 2017-11-21T00:45:24Z

Actually, the size fix was already included in 17.09.0 through docker-archive/docker-ce#242, so I'm not sure if this will address your situation

eduardolundgren · 2017-11-21T01:04:10Z

@thaJeztah raft state becoming too big was only happening when approximately 2000 overlay networks were being created in a 30min time window. In contrary, if 2000 services were being created it was all fine. Supposedly the raft messages for service creation are bigger.

Is there a way to know if 17.09.0-ce-rc3 contains moby/swarmkit#2375?

I just tested on 17.09.0-ce-rc3 from Thu Sep 21 02:32:26 2017 and it seems to be working well. I only had one manager and one worker on this test though, will try again with more managers and see if it's still fine.

thaJeztah · 2017-11-21T01:17:16Z

Yes; the size change was in 17.09.0-ce-rc3; https://github.com/docker/docker-ce/blob/v17.09.0-ce-rc3/components/engine/vendor/github.com/docker/swarmkit/manager/manager.go#L59

Note that rc3 is a release candidate; not the final release (17.09.0-ce was released after that)

thaJeztah changed the title ~~[17.09] Fix reapTime logic in NetworkDB~~ [17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container Nov 20, 2017

Flavio Crisciani added 6 commits November 20, 2017 20:41

Add test for cleanupServiceDiscovery

42f9e55

Unit test for the cleanupServiceDiscovery, follow up of PR: moby#1985 Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> (cherry picked from commit 52a9ab5) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

thaJeztah force-pushed the 17.09-backport-netdb-fix-reap branch from ab3a74b to 42f9e55 Compare November 20, 2017 19:42

fcrisciani merged commit 690b4c0 into moby:bump_17.09 Nov 20, 2017

thaJeztah deleted the 17.09-backport-netdb-fix-reap branch November 20, 2017 22:24

andrewhsu mentioned this pull request Nov 28, 2017

[17.09] bump vndr of libnetwork to 690b4c0 docker-archive/docker-ce#328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017

[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017

thaJeztah commented Nov 20, 2017 •

edited

Loading

eduardolundgren commented Nov 20, 2017

thaJeztah commented Nov 20, 2017

thaJeztah commented Nov 20, 2017

eduardolundgren commented Nov 20, 2017 •

edited

Loading

fcrisciani commented Nov 20, 2017

thaJeztah commented Nov 20, 2017

fcrisciani commented Nov 20, 2017

eduardolundgren commented Nov 20, 2017

thaJeztah commented Nov 21, 2017

thaJeztah commented Nov 21, 2017

eduardolundgren commented Nov 21, 2017

thaJeztah commented Nov 21, 2017

[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017

[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017

Conversation

thaJeztah commented Nov 20, 2017 • edited Loading

eduardolundgren commented Nov 20, 2017

thaJeztah commented Nov 20, 2017

thaJeztah commented Nov 20, 2017

eduardolundgren commented Nov 20, 2017 • edited Loading

fcrisciani commented Nov 20, 2017

thaJeztah commented Nov 20, 2017

fcrisciani commented Nov 20, 2017

eduardolundgren commented Nov 20, 2017

thaJeztah commented Nov 21, 2017

thaJeztah commented Nov 21, 2017

eduardolundgren commented Nov 21, 2017

thaJeztah commented Nov 21, 2017

thaJeztah commented Nov 20, 2017 •

edited

Loading

eduardolundgren commented Nov 20, 2017 •

edited

Loading