control-plane hardening: Avoid nDB stale entries #1727

sanimej · 2017-04-19T21:21:44Z

With the current design in libnetwork control plane events are first sent through gossip. This is through UDP which can be lossy. To account for that there is an anti entropy phase every 30 seconds that does a full state sync.

Problem: In some deployments we have seen CPU getting pegged or memory exhaustion (till OOM kicks in) for longer periods resulting in both gossip and push/pull state sync failing. Delete events are retained in nDB for 60 seconds currently. So at the max a deleted event will remain only for two bulk-sync cycles. If few bulk-syncs fail from this node the peers will be left with stale events forever.

This can be fixed by two approaches:

Introduce a mechanism in every node to somehow figure out when an event is stale and delete it locally. This can be done by a mark and sweep logic. ie: on every bulk sync mark all events in the nDB as stale and clear the flag as we process the events from the bulk sync. If an event remains stale for few bulk sync intervals then delete it.

This carries the risk that we can incorrectly delete an event because there is no owner identification in the event messages and bulk sync happens with only one peer. In bigger clusters it can longer for an event to eventually reach all nodes through bulk sync (if it was missed earlier from the gossip). To avoid this we have to let the entries remain for many bulk sync cycles. Approach 2 achieves the same in a much simpler and reliable way.

Let the deleted entries remain longer in the networkDB. Currently its cleaned up after 60 seconds which is too aggressive. This change increases it to 30 minutes.

This applies for network scoped gossip (endpoint join/leave events) and the global gossip (node leave, network join/leave).
There is a specific case to consider though: gossip for last few tasks getting deleted on a node is lost. In this case we will remove that network from this node. Hence increasing the reap time for endpoint events won't help. But the network leave event itself will be retained longer with this change. This combined with #1704 will make sure the state is cleaned up on all remote nodes.

Signed-off-by: Santhosh Manohar santhosh@docker.com

Signed-off-by: Santhosh Manohar <santhosh@docker.com>

mavenugo · 2017-04-25T18:04:04Z

LGTM

Avoid nDB stale entries because of intermittent nw issues.

6d51449

Signed-off-by: Santhosh Manohar <santhosh@docker.com>

mavenugo approved these changes Apr 25, 2017

View reviewed changes

mavenugo merged commit 5dc95a3 into moby:master Apr 25, 2017

aboch mentioned this pull request Apr 25, 2017

[17.05.x] Vendoring libnetwork @5d4e5de moby/moby#32819

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

control-plane hardening: Avoid nDB stale entries #1727

control-plane hardening: Avoid nDB stale entries #1727

sanimej commented Apr 19, 2017

mavenugo commented Apr 25, 2017

control-plane hardening: Avoid nDB stale entries #1727

control-plane hardening: Avoid nDB stale entries #1727

Conversation

sanimej commented Apr 19, 2017

mavenugo commented Apr 25, 2017