Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container #2017

Merged
merged 6 commits into from
Nov 20, 2017

Conversation

thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Nov 20, 2017

Cherry-pick of #1944 and #1960 for 17.09

git checkout -b 17.09-backport-netdb-fix-reap upstream/bump_17.09
git cherry-pick -s -S -x 10cd98c56fb17737834c1031cad56a9932a266f4
git cherry-pick -s -S -x 3feb3aadf388d64bb8ea7e0a6614933d3d8b885a
git cherry-pick -s -S -x fbba555561bab4af332d6c1ee592976f16c609c3

Also included #1985 and #1991, which look to depend on this

# PR: https://github.com/docker/libnetwork/pull/1985
git cherry-pick -s -S -x 1c04e1980dfd6debab6e43d9db672c0f01528868

# PR: https://github.com/docker/libnetwork/pull/1991
git cherry-pick -s -S -x 52a9ab55ce92a541798d130a87a84ca34f0b6178

@thaJeztah thaJeztah changed the title [17.09] Fix reapTime logic in NetworkDB [17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container Nov 20, 2017
@eduardolundgren
Copy link

@thaJeztah thank you for backing port this, would be possible to make a release with it?

@thaJeztah
Copy link
Member Author

@eduardolundgren this cherry-pick is in preparation of a possible 17.09.1 release; it's still being decided which changes need to be backported and which not.

@thaJeztah
Copy link
Member Author

Looks like I may have to cherry-pick #1947 into this one to fix the lint-errors

@eduardolundgren
Copy link

eduardolundgren commented Nov 20, 2017

Hopefully, this backport gets into 17.09.1, the current stable release corrupts the managers' quorum when the number of overlay networks gets close to 2000, in addition to that some workers go down randomly.

It starts to log lots of netPeers messages

e0cc780) - netID:kt2qn6cpuq7okhcy3d5pzo6ps leaving:false netPeers:10 entries:20 Queue qLen:0 netMsg/s:0"

Then, segmented wal file info appears

dockerd[2069]: segmented wal file /mnt/ebs/var/lib/docker/swarm/raft/wal-v3-encrypted/0000000000000039-00

Eventually, it logs a failure

dockerd[2069]: time="" level=error msg="Failed to commit allocation of network resources for node w58o35tebq6atsq1kgh665cow" error="raft: raft message is too large and can't be sent" module=node node.id=7ih3j4dbd0f3r6zy0uhvicr0o

@fcrisciani
Copy link

@thaJeztah lint commit is in, can you try to rebase?

Flavio Crisciani added 6 commits November 20, 2017 20:41
- Added remainingReapTime field in the table event.
  Wihtout it a node that did not have a state for the element
  was marking the element for deletion setting the max reapTime.
  This was creating the possibility to keep the entry being resync
  between nodes forever avoding the purpose of the reap time
  itself.

- On broadcast of the table event the node owner was rewritten
  with the local node name, this was not correct because the owner
  should continue to remain the original one of the message

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 10cd98c)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Changed the loop per network. Previous implementation was taking a
  ReadLock to update the reapTime but now with the residualReapTime
  also the bulkSync is using the same ReadLock creating possible
  issues in concurrent read and update of the value.
  The new logic fetches the list of networks and proceed to the
  cleanup network by network locking the database and releasing it
  after each network. This should ensure a fair locking avoiding
  to keep the database blocked for too much time.

  Note: The ticker does not guarantee that the reap logic runs
  precisely every reapTimePeriod, actually documentation says that
  if the routine is too long will skip ticks. In case of slowdown
  of the process itself it is possible that the lifetime of the
  deleted entries increases, it still should not be a huge problem
  because now the residual reaptime is propagated among all the nodes
  a slower node will let the deleted entry being repropagate multiple
  times but the state will still remain consistent.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 3feb3aa)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make sure that the network is garbage collected after
the entries. Entries to be deleted requires that the network
is present.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit fbba555)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The comparison was against the wrong constant value.
As described in the comment the check is there to guarantee
to not propagate events realted to stale deleted elements

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 6f11d29)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Attachable containers they are tasks with no service associated
their cleanup was not done properly so it was possible to have
a leak of their name resolution if that was the last container
on the network.
Cleanupservicebindings was not able to do the cleanup because there
is no service, while also the notification of the delete arrives
after that the network is already being cleaned

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 1c04e19)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Unit test for the cleanupServiceDiscovery,
follow up of PR: moby#1985

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
(cherry picked from commit 52a9ab5)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah force-pushed the 17.09-backport-netdb-fix-reap branch from ab3a74b to 42f9e55 Compare November 20, 2017 19:42
@thaJeztah
Copy link
Member Author

rebased 👍

@fcrisciani
Copy link

LGTM

@fcrisciani fcrisciani merged commit 690b4c0 into moby:bump_17.09 Nov 20, 2017
@thaJeztah thaJeztah deleted the 17.09-backport-netdb-fix-reap branch November 20, 2017 22:24
@eduardolundgren
Copy link

Do you know when 17.09.1 release is gonna happen?
Is this going to also get into 17.10.x and 17.11.x?
Sorry for so many questions :)
Thank you again for the fast response.

@thaJeztah
Copy link
Member Author

No, I don't have a date for 17.09.1.

17.10 and 17.11 are edge releases, so would only get critical ("P0") patches; with 17.11 being released, docker 17.10 reached EOL.

For your specific issue, the error message indicates it may not be an issue in the networking stack, but due to the raft state becoming too big for syncing between managers; this pull request in SwarmKit raises the maximum size to 128MB; moby/swarmkit#2375, which is being backported to 17.09.1 through docker-archive/docker-ce#323, and should address the direct problem.

A bigger change is being worked on in moby/swarmkit#2458, which will use a streaming mechanism to send raft snapshots.

@thaJeztah
Copy link
Member Author

Actually, the size fix was already included in 17.09.0 through docker-archive/docker-ce#242, so I'm not sure if this will address your situation

@eduardolundgren
Copy link

@thaJeztah raft state becoming too big was only happening when approximately 2000 overlay networks were being created in a 30min time window. In contrary, if 2000 services were being created it was all fine. Supposedly the raft messages for service creation are bigger.

Is there a way to know if 17.09.0-ce-rc3 contains moby/swarmkit#2375?

I just tested on 17.09.0-ce-rc3 from Thu Sep 21 02:32:26 2017 and it seems to be working well. I only had one manager and one worker on this test though, will try again with more managers and see if it's still fine.

@thaJeztah
Copy link
Member Author

Yes; the size change was in 17.09.0-ce-rc3; https://github.com/docker/docker-ce/blob/v17.09.0-ce-rc3/components/engine/vendor/github.com/docker/swarmkit/manager/manager.go#L59

Note that rc3 is a release candidate; not the final release (17.09.0-ce was released after that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants