New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure containers are stopped on daemon startup #35805

Merged
merged 1 commit into from Dec 19, 2017

Conversation

Projects
None yet
8 participants
@cpuguy83
Contributor

cpuguy83 commented Dec 14, 2017

When the containerd 1.0 runtime changes were made, we inadvertently
removed the functionality where any running containers are killed on
startup when not using live-restore.
This change restores that behavior.

ping @fcrisciani

@fcrisciani

This comment has been minimized.

Show comment
Hide comment
@fcrisciani

fcrisciani Dec 14, 2017

Contributor

LGTM

Contributor

fcrisciani commented Dec 14, 2017

LGTM

@andrewhsu

This comment has been minimized.

Show comment
Hide comment
@andrewhsu

andrewhsu Dec 14, 2017

Contributor

Seeing failures:

FAIL: docker_cli_nat_test.go:62: DockerSuite.TestNetworkLocalhostTCPNat
FAIL: docker_cli_nat_test.go:46: DockerSuite.TestNetworkNat
22:55:14 ----------------------------------------------------------------------
22:55:14 FAIL: docker_cli_nat_test.go:62: DockerSuite.TestNetworkLocalhostTCPNat
22:55:14 
22:55:14 docker_cli_nat_test.go:76:
22:55:14     c.Assert(final, checker.Equals, msg)
22:55:14 ... obtained string = ""
22:55:14 ... expected string = "hi yall"
22:55:14 
22:55:15 ----------------------------------------------------------------------
22:55:15 FAIL: docker_cli_nat_test.go:46: DockerSuite.TestNetworkNat
22:55:15 
22:55:15 docker_cli_nat_test.go:52:
22:55:15     c.Assert(err, check.IsNil)
22:55:15 ... value *net.OpError = &net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:(*net.TCPAddr)(0xc4203ecb70), Err:(*os.SyscallError)(0xc4204b3900)} ("dial tcp 172.17.0.3:8080: getsockopt: connection refused")
22:55:15 
22:55:16 
Contributor

andrewhsu commented Dec 14, 2017

Seeing failures:

FAIL: docker_cli_nat_test.go:62: DockerSuite.TestNetworkLocalhostTCPNat
FAIL: docker_cli_nat_test.go:46: DockerSuite.TestNetworkNat
22:55:14 ----------------------------------------------------------------------
22:55:14 FAIL: docker_cli_nat_test.go:62: DockerSuite.TestNetworkLocalhostTCPNat
22:55:14 
22:55:14 docker_cli_nat_test.go:76:
22:55:14     c.Assert(final, checker.Equals, msg)
22:55:14 ... obtained string = ""
22:55:14 ... expected string = "hi yall"
22:55:14 
22:55:15 ----------------------------------------------------------------------
22:55:15 FAIL: docker_cli_nat_test.go:46: DockerSuite.TestNetworkNat
22:55:15 
22:55:15 docker_cli_nat_test.go:52:
22:55:15     c.Assert(err, check.IsNil)
22:55:15 ... value *net.OpError = &net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:(*net.TCPAddr)(0xc4203ecb70), Err:(*os.SyscallError)(0xc4204b3900)} ("dial tcp 172.17.0.3:8080: getsockopt: connection refused")
22:55:15 
22:55:16 
@vdemeester

LGTM 🐯

@anusha-ragunathan

This comment has been minimized.

Show comment
Hide comment
@anusha-ragunathan

anusha-ragunathan Dec 15, 2017

Contributor

Tested this and it doesnt work. After daemon restart, docker ps shows no live containers, but the process is still running. Probably needs a SIGTERM

Contributor

anusha-ragunathan commented Dec 15, 2017

Tested this and it doesnt work. After daemon restart, docker ps shows no live containers, but the process is still running. Probably needs a SIGTERM

@anusha-ragunathan

This comment has been minimized.

Show comment
Hide comment
@anusha-ragunathan

anusha-ragunathan Dec 15, 2017

Contributor

Adding a SignalProcess call terminates the process cleanly. Needs addition of wait and check logic for process to be dead and SIGKILL after wait.

index 96b39e8..c18379b 100644
--- a/daemon/daemon.go
+++ b/daemon/daemon.go
@@ -44,6 +44,7 @@ import (
 	"github.com/docker/docker/pkg/containerfs"
 	"github.com/docker/docker/pkg/idtools"
 	"github.com/docker/docker/pkg/plugingetter"
+	"github.com/docker/docker/pkg/signal"
 	"github.com/docker/docker/pkg/sysinfo"
 	"github.com/docker/docker/pkg/system"
 	"github.com/docker/docker/pkg/truncindex"
@@ -241,12 +242,19 @@ func (daemon *Daemon) restore() error {
 				logrus.Errorf("Failed to restore container %s with containerd: %s", c.ID, err)
 				return
 			}
-			if !alive {
+
+			if !daemon.configStore.LiveRestoreEnabled {
+				err = daemon.containerd.SignalProcess(context.Background(), c.ID, libcontainerd.InitProcessName, int(signal.SignalMap["TERM"]))
+				logrus.WithError(err).Errorf("Failed to kill container %v due to %v", c.ID, err)
+			}
+
+			if !alive || !daemon.configStore.LiveRestoreEnabled {
 				ec, exitedAt, err = daemon.containerd.DeleteTask(context.Background(), c.ID)
 				if err != nil && !errdefs.IsNotFound(err) {
 					logrus.WithError(err).Errorf("Failed to delete container %s from containerd", c.ID)
 					return
 				}
+				alive = false
 			}
 
 			if c.IsRunning() || c.IsPaused() {
Contributor

anusha-ragunathan commented Dec 15, 2017

Adding a SignalProcess call terminates the process cleanly. Needs addition of wait and check logic for process to be dead and SIGKILL after wait.

index 96b39e8..c18379b 100644
--- a/daemon/daemon.go
+++ b/daemon/daemon.go
@@ -44,6 +44,7 @@ import (
 	"github.com/docker/docker/pkg/containerfs"
 	"github.com/docker/docker/pkg/idtools"
 	"github.com/docker/docker/pkg/plugingetter"
+	"github.com/docker/docker/pkg/signal"
 	"github.com/docker/docker/pkg/sysinfo"
 	"github.com/docker/docker/pkg/system"
 	"github.com/docker/docker/pkg/truncindex"
@@ -241,12 +242,19 @@ func (daemon *Daemon) restore() error {
 				logrus.Errorf("Failed to restore container %s with containerd: %s", c.ID, err)
 				return
 			}
-			if !alive {
+
+			if !daemon.configStore.LiveRestoreEnabled {
+				err = daemon.containerd.SignalProcess(context.Background(), c.ID, libcontainerd.InitProcessName, int(signal.SignalMap["TERM"]))
+				logrus.WithError(err).Errorf("Failed to kill container %v due to %v", c.ID, err)
+			}
+
+			if !alive || !daemon.configStore.LiveRestoreEnabled {
 				ec, exitedAt, err = daemon.containerd.DeleteTask(context.Background(), c.ID)
 				if err != nil && !errdefs.IsNotFound(err) {
 					logrus.WithError(err).Errorf("Failed to delete container %s from containerd", c.ID)
 					return
 				}
+				alive = false
 			}
 
 			if c.IsRunning() || c.IsPaused() {
@andrewhsu

This comment has been minimized.

Show comment
Hide comment
@andrewhsu

andrewhsu Dec 15, 2017

Contributor

Just saw failure on experimental build job:

FAIL: docker_cli_network_unix_test.go:1135: DockerNetworkSuite.TestDockerNetworkHostModeUngracefulDaemonRestart

[d271672e27723] waiting for daemon to start
[d271672e27723] daemon started

Daemon d271672e27723 is not started
[d271672e27723] waiting for daemon to start
[d271672e27723] daemon started

docker_cli_network_unix_test.go:1157:
    c.Assert(err, checker.IsNil)
... value *errors.fundamental = condition ""false" == "true"" not true in time (10s) ("condition \"\"false\" == \"true\"\" not true in time (10s)")
Contributor

andrewhsu commented Dec 15, 2017

Just saw failure on experimental build job:

FAIL: docker_cli_network_unix_test.go:1135: DockerNetworkSuite.TestDockerNetworkHostModeUngracefulDaemonRestart

[d271672e27723] waiting for daemon to start
[d271672e27723] daemon started

Daemon d271672e27723 is not started
[d271672e27723] waiting for daemon to start
[d271672e27723] daemon started

docker_cli_network_unix_test.go:1157:
    c.Assert(err, checker.IsNil)
... value *errors.fundamental = condition ""false" == "true"" not true in time (10s) ("condition \"\"false\" == \"true\"\" not true in time (10s)")
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 16, 2017

Member

https://jenkins.dockerproject.org/job/Docker-PRs-experimental/38431/console

23:51:15 ----------------------------------------------------------------------
23:51:15 FAIL: docker_cli_network_unix_test.go:1135: DockerNetworkSuite.TestDockerNetworkHostModeUngracefulDaemonRestart
23:51:15 
23:51:15 [d271672e27723] waiting for daemon to start
23:51:15 [d271672e27723] daemon started
23:51:15 
23:51:15 Daemon d271672e27723 is not started
23:51:15 [d271672e27723] waiting for daemon to start
23:51:15 [d271672e27723] daemon started
23:51:15 
23:51:15 docker_cli_network_unix_test.go:1157:
23:51:15     c.Assert(err, checker.IsNil)
23:51:15 ... value *errors.fundamental = condition ""false" == "true"" not true in time (10s) ("condition \"\"false\" == \"true\"\" not true in time (10s)")
23:51:15 
23:51:15 [d271672e27723] exiting daemon
23:51:30 
Member

thaJeztah commented Dec 16, 2017

https://jenkins.dockerproject.org/job/Docker-PRs-experimental/38431/console

23:51:15 ----------------------------------------------------------------------
23:51:15 FAIL: docker_cli_network_unix_test.go:1135: DockerNetworkSuite.TestDockerNetworkHostModeUngracefulDaemonRestart
23:51:15 
23:51:15 [d271672e27723] waiting for daemon to start
23:51:15 [d271672e27723] daemon started
23:51:15 
23:51:15 Daemon d271672e27723 is not started
23:51:15 [d271672e27723] waiting for daemon to start
23:51:15 [d271672e27723] daemon started
23:51:15 
23:51:15 docker_cli_network_unix_test.go:1157:
23:51:15     c.Assert(err, checker.IsNil)
23:51:15 ... value *errors.fundamental = condition ""false" == "true"" not true in time (10s) ("condition \"\"false\" == \"true\"\" not true in time (10s)")
23:51:15 
23:51:15 [d271672e27723] exiting daemon
23:51:30 
@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Dec 18, 2017

Contributor

experimental CI failure is unrelated and is tracked in #35825
All green otherwise.

Contributor

cpuguy83 commented Dec 18, 2017

experimental CI failure is unrelated and is tracked in #35825
All green otherwise.

Ensure containers are stopped on daemon startup
When the containerd 1.0 runtime changes were made, we inadvertantly
removed the functionality where any running containers are killed on
startup when not using live-restore.
This change restores that behavior.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@anusha-ragunathan

This comment has been minimized.

Show comment
Hide comment
@anusha-ragunathan

anusha-ragunathan Dec 18, 2017

Contributor

LGTM (Once CI is green)

Contributor

anusha-ragunathan commented Dec 18, 2017

LGTM (Once CI is green)

@seemethere

LGTM on green as well

@anusha-ragunathan anusha-ragunathan merged commit c3a9f5d into moby:master Dec 19, 2017

6 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 38464 has succeeded
Details
janky Jenkins build Docker-PRs 47197 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 7610 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 18714 has succeeded
Details
z Jenkins build Docker-PRs-s390x 7417 has succeeded
Details

@cpuguy83 cpuguy83 deleted the cpuguy83:fix_kill_containers_on_restart branch Dec 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment