Join GitHub today
Import libnetwork fix for rolling updates #36638
- What I did
This patch allows endpoints to complete servicing connections while being removed from a service. The fix is mostly within libnetwork. There is a small change to the moby code to remove a now-superfluous call to sandbox.DisableService(). However, moby would function correctly (just sub-optimally) without said removal. (see the changelog for for the commits further details)
This fix addresses issue raised in docker/libnetwork#2074 and is related to #30321. The libnetwork PR for this fix is here docker/libnetwork#2112. This PR does not include any other libnetwork PRs. The removal of the call to DisableService() from the releaseNetworks() function reverts the change introduced in #35960.
- How I did it
The fix works by initially down-weighting a container endpoint in the load balancer to 0 while keeping the endpoint present in the load balancer. This allows traffic to continue to flow to the endpoint while preventing new connections from going to the endpoint. This allows the container to complete requests during the "stop_grace_period" and then exit when finished without interruption of service.
This change requires propagating the status of disabled service endpoints via the networkDB. Accordingly, the patch includes both code to generate and handle service update messages. It also augments the service structure with a ServiceDisabled boolean to convey whether an endpoint should ultimately be removed or just disabled. This, naturally, required a rebuild of the protocol buffer code.
The protocol buffer encoding is designed to support additions of fields to messages in a backwards-compatible manner. Protocol buffer unmarshalling code automatically skips past any fields that it isn't
As it turns out, the additional field is simply a bool that is otherwise irrelevent on networkDB create and delete events. So its absence in older moby daemon processing has no impact. However, the fix leverages the "update" networkDB message which was previously unused in libnetwork. Although older libnetwork implementations parse the message cleanly, they will see the message as unexpected and as such issue a log at error level indicating the receipt of such.
Other than this there should be no other negative impact for use of this patch in mixed environments. (Although older mobys won't be able to gracefully downgrade connections on their nodes of course.)
Signed-off-by: Chris Telfer firstname.lastname@example.org
- How to verify it
One can verify this in a swarm environment by the following steps:
- Description for the changelog
Import libnetwork fix for rolling updates
- A picture of a cute animal (not mandatory but encouraged)
Sorry, I should have been more clear. This patch only addresses Linux. I'm not yet familiar enough with HCS / HNS to know whether/how equivalent down-weighting could occur there. Having said that, the libnetwork calls into the service_windows.go functions do provide all the information necessary to perform said operation. The patch, as written, should not change the behavior of Windows nodes either positively or negatively in this regard.
@@ Coverage Diff @@ ## master #36638 +/- ## ========================================= Coverage ? 34.95% ========================================= Files ? 613 Lines ? 45581 Branches ? 0 ========================================= Hits ? 15934 Misses ? 27560 Partials ? 2087
@ctelfer Tested this build with our use case and I'm happy to report that it works beautifully. Streaming connections are kept connected honouring the grace timeout parameter. This is the last thing that is required to have proper zero downtime deployments with Swarm.