Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable service on release network #35960

Merged
merged 2 commits into from Jan 18, 2018

Conversation

@abhi
Copy link
Contributor

commented Jan 8, 2018

This PR contains a fix for #30321. There was a #31142
PR intending to fix the issue by adding a delay between disabling the
service in the cluster and the shutdown of the tasks. However
disabling the service was not deleting the service info in the cluster.
Added a fix to delete service info from cluster and verified using siege
to ensure there is zero downtime on rolling update of a service.In order
to support it and ensure consitency of enabling and disable service knob
from the daemon, we need to ensure we disable service when we release
the network from the container. This helps in making the enable and
disable service less racy. The corresponding part of libnetwork fix is
part of docker/libnetwork#1824

Signed-off-by: abhi abhi@docker.com

- What I did
In order to ensure enable and disable a service on an sandbox less racy , explictly disable service on the sandbox during a container exit event on the network side. This entry point is the releaseNetwork action which triggers the tearing of the network resource for a container.
- How I did it
Added DisableService action on releaseNetwork
- How to verify it
This is particularly important during rolling update of a service and when the tasks in a swarm service goes down. Verified using siege and performing zero downtime rolling update.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@abhi

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2018

@abhi abhi force-pushed the abhi:service branch 2 times, most recently from f62807c to 33daf5a Jan 8, 2018

@abhi abhi force-pushed the abhi:service branch 2 times, most recently from 4ecaaca to 0ee3ec2 Jan 9, 2018

@abhi abhi force-pushed the abhi:service branch 2 times, most recently from 9168de7 to 227ed57 Jan 10, 2018

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jan 16, 2018

ping @abhi can you rebase this PR? Also looks like you added a file that shouldn't be there (I noticed theres a stack dump in there);

screen shot 2018-01-16 at 11 19 26

@abhi abhi force-pushed the abhi:service branch 2 times, most recently from 189a44f to f61d536 Jan 16, 2018

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jan 17, 2018

Looks like you updated the PR; note that GitHub doesn't send out notifications when you update, so its best to comment or "ping" someone after you did so 😄

@abhi

This comment has been minimized.

Copy link
Contributor Author

commented Jan 17, 2018

@thaJeztah the guy I need to ping sits next to me :D Will do next time.
@fcrisciani PTAL

@@ -966,6 +966,9 @@ func (daemon *Daemon) releaseNetwork(container *container.Container) {
logrus.Warnf("error locating sandbox id %s: %v", sid, err)
return
}
if err := sb.DisableService(); err != nil {
logrus.Errorf("Error disabling service for sandbox id %s container %s: %v", sid, container.ID, err)

This comment has been minimized.

Copy link
@cpuguy83

cpuguy83 Jan 17, 2018

Contributor

Nit: logrus.WithField("container", container.ID).WithError(err).WithField("sandbox", sid).Error("Error removing service from sandbox")

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jan 17, 2018

Member

Or .WithFields();

logrus.WithFields(logrus.Fields{"container": container.ID, "sandbox": sid}).WithError(err).Error("Error removing service from sandbox")
@fcrisciani

This comment has been minimized.

Copy link
Contributor

commented Jan 17, 2018

LGTM after the log change 👍

abhi added 2 commits Jan 8, 2018
Disable service on release network
This PR contains a fix for #30321. There was a #31142
PR intending to fix the issue by adding a delay between disabling the
service in the cluster and the shutdown of the tasks. However
disabling the service was not deleting the service info in the cluster.
Added a fix to delete service info from cluster and verified using siege
to ensure there is zero downtime on rolling update of a service.In order
to support it and ensure consitency of enabling and disable service knob
from the daemon, we need to ensure we disable service when we release
the network from the container. This helps in making the enable and
disable service less racy. The corresponding part of libnetwork fix is
part of docker/libnetwork#1824

Signed-off-by: abhi <abhi@docker.com>
libnetwork vendor
Signed-off-by: abhi <abhi@docker.com>

@abhi abhi force-pushed the abhi:service branch from f61d536 to dad093c Jan 17, 2018

@abhi

This comment has been minimized.

Copy link
Contributor Author

commented Jan 17, 2018

@thaJeztah
Copy link
Member

left a comment

LGTM

@cpuguy83
Copy link
Contributor

left a comment

LGTM

I'm not sure I like the ambiguity of the "service" term, but it is what it is.

@cpuguy83 cpuguy83 merged commit 6feae06 into moby:master Jan 18, 2018

7 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 38845 has succeeded
Details
janky Jenkins build Docker-PRs 47594 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 7980 has succeeded
Details
vendor Jenkins build Docker-PRs-vendor 4079 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 19077 has succeeded
Details
z Jenkins build Docker-PRs-s390x 7988 has succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.