New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --live-restore flag #23213

Merged
merged 1 commit into from Jun 14, 2016

Conversation

Projects
None yet
@crosbymichael
Contributor

crosbymichael commented Jun 2, 2016

This flags enables full support of daemonless containers in docker. It
ensures that docker does not stop containers on shutdown or restore and
properly reconnects to the container when restarted.

This is not the default because of backwards compat but should be the
desired outcome for people running containers in prod.

Signed-off-by: Michael Crosby crosbymichael@gmail.com

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jun 2, 2016

Member

can we make this option live reloadable (daemon.json)?

Member

thaJeztah commented Jun 2, 2016

can we make this option live reloadable (daemon.json)?

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jun 2, 2016

Contributor

@thaJeztah is there something special that I should be doing for the config file?

Contributor

crosbymichael commented Jun 2, 2016

@thaJeztah is there something special that I should be doing for the config file?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jun 2, 2016

Member

@crosbymichael yeah, it's still a bit Hacky, but you need to explicitly allow it to be set in the Reload() function, and it needs a mention in the docs/reference/commandline/dockerd.md section about reloadable options.

Here's a PR that added an option that's reloadable at runtime; 7368e41

Member

thaJeztah commented Jun 2, 2016

@crosbymichael yeah, it's still a bit Hacky, but you need to explicitly allow it to be set in the Reload() function, and it needs a mention in the docs/reference/commandline/dockerd.md section about reloadable options.

Here's a PR that added an option that's reloadable at runtime; 7368e41

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jun 2, 2016

Member

Oh and the validateConfiguration() function should check if the option is not both set on the command line (through a flag) and in the daemon.json, because that's invalid

Member

thaJeztah commented Jun 2, 2016

Oh and the validateConfiguration() function should check if the option is not both set on the command line (through a flag) and in the daemon.json, because that's invalid

Show outdated Hide outdated libcontainerd/client_linux.go Outdated
select {
case <-time.After(10 * time.Second):
if err := clnt.Signal(containerID, int(syscall.SIGKILL)); err != nil {
logrus.Errorf("error sending sigkill to %v: %v", containerID, err)

This comment has been minimized.

@AkihiroSuda

AkihiroSuda Jun 3, 2016

Member

ditto

@AkihiroSuda

AkihiroSuda Jun 3, 2016

Member

ditto

Show outdated Hide outdated docs/admin/configuring.md Outdated
@@ -445,12 +447,43 @@ func (clnt *client) restore(cont *containerd.Container, options ...CreateOption)
}
func (clnt *client) Restore(containerID string, options ...CreateOption) error {
if clnt.liveRestore {

This comment has been minimized.

@cyphar

cyphar Jun 3, 2016

Contributor

I'm confused why we kill all container processes if they are still hanging around when we start? I understand that we need --live-restore in order to maintain back-compat when the daemon dies. But it seems odd that we go from the old code (which did "live restore" by default) to the new code that will kill all of the containers if they're still lying around. Does this play nicely when you have more than one consumer of containerd?

@cyphar

cyphar Jun 3, 2016

Contributor

I'm confused why we kill all container processes if they are still hanging around when we start? I understand that we need --live-restore in order to maintain back-compat when the daemon dies. But it seems odd that we go from the old code (which did "live restore" by default) to the new code that will kill all of the containers if they're still lying around. Does this play nicely when you have more than one consumer of containerd?

This comment has been minimized.

@crosbymichael

crosbymichael Jun 3, 2016

Contributor

you are reading it wrong, it was always like that, i just added the original code back from my previous PR because of backwards compat concerns

@crosbymichael

crosbymichael Jun 3, 2016

Contributor

you are reading it wrong, it was always like that, i just added the original code back from my previous PR because of backwards compat concerns

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jun 4, 2016

Contributor
11:21:33 ---> Making bundle: validate-lint (in bundles/1.12.0-dev/validate-lint)
11:21:33 Errors from golint:
11:21:33 libcontainerd/remote_solaris.go:31:1: exported function WithLiveRestore should have comment or be unexported
11:21:33 libcontainerd/remote_windows.go:33:1: exported function WithLiveRestore should have comment or be unexported
Contributor

icecrime commented Jun 4, 2016

11:21:33 ---> Making bundle: validate-lint (in bundles/1.12.0-dev/validate-lint)
11:21:33 Errors from golint:
11:21:33 libcontainerd/remote_solaris.go:31:1: exported function WithLiveRestore should have comment or be unexported
11:21:33 libcontainerd/remote_windows.go:33:1: exported function WithLiveRestore should have comment or be unexported
@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jun 13, 2016

Contributor

experimental failures might be related. I'll restart CI

Contributor

LK4D4 commented Jun 13, 2016

experimental failures might be related. I'll restart CI

Show outdated Hide outdated docs/admin/configuring.md Outdated
Show outdated Hide outdated libcontainerd/client_linux.go Outdated
Show outdated Hide outdated docs/admin/configuring.md Outdated

@thaJeztah thaJeztah added this to the 1.12.0 milestone Jun 13, 2016

f := ctr.fifo(i)
c := make(chan struct{})
go func() {
close(c) // this channel is used to not close the writer too early, before readonly open has been called.

This comment has been minimized.

@tiborvass

tiborvass Jun 13, 2016

Collaborator

is this a hack?

@tiborvass

tiborvass Jun 13, 2016

Collaborator

is this a hack?

This comment has been minimized.

@crosbymichael

crosbymichael Jun 13, 2016

Contributor

it signals that the goroutine has been scheduled to the calling code to get rid of races if it returns before the goroutine's code has started executing

@crosbymichael

crosbymichael Jun 13, 2016

Contributor

it signals that the goroutine has been scheduled to the calling code to get rid of races if it returns before the goroutine's code has started executing

This comment has been minimized.

@orivej

orivej Jun 27, 2016

closeReaderFifo may still run before openReaderFromFifo (even though it is less likely with this synchronization point). Maybe this is better:

    r := openReaderFromFifo(f)
    go io.Copy(ioutil.Discard, r)
    closeReaderFifo(f)
@orivej

orivej Jun 27, 2016

closeReaderFifo may still run before openReaderFromFifo (even though it is less likely with this synchronization point). Maybe this is better:

    r := openReaderFromFifo(f)
    go io.Copy(ioutil.Discard, r)
    closeReaderFifo(f)

This comment has been minimized.

@mlaventure

mlaventure Jun 27, 2016

Contributor

@orivej

openReaderFromFifo is a blocking operation: as we're opening it with O_RDONLY, it will block until another process opens the FIFO for writing.

@mlaventure

mlaventure Jun 27, 2016

Contributor

@orivej

openReaderFromFifo is a blocking operation: as we're opening it with O_RDONLY, it will block until another process opens the FIFO for writing.

This comment has been minimized.

@orivej

orivej Jun 27, 2016

In fact, openReaderFromFifo spawns a goroutine that opens the pipe, and immediately returns.

This means that my code sample would work, but still would not guarantee that closeReaderFifo runs after openReaderFromFifo opens the file.

@orivej

orivej Jun 27, 2016

In fact, openReaderFromFifo spawns a goroutine that opens the pipe, and immediately returns.

This means that my code sample would work, but still would not guarantee that closeReaderFifo runs after openReaderFromFifo opens the file.

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jun 13, 2016

Contributor

LGTM

Contributor

icecrime commented Jun 13, 2016

LGTM

}
clnt.deleteContainer(containerID)

This comment has been minimized.

@mlaventure

mlaventure Jun 13, 2016

Contributor

nitpick: this should be within the above if

@mlaventure

mlaventure Jun 13, 2016

Contributor

nitpick: this should be within the above if

@mlaventure

This comment has been minimized.

Show comment
Hide comment
@mlaventure

mlaventure Jun 13, 2016

Contributor

aside my nit, LGTM (IANAM)

Contributor

mlaventure commented Jun 13, 2016

aside my nit, LGTM (IANAM)

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jun 14, 2016

Contributor

@crosbymichael I see a few failures on experimental:

14:03:11 FAIL: docker_cli_daemon_experimental_test.go:64: DockerDaemonSuite.TestCleanupMountsAfterDaemonCrash
14:03:11 
14:03:11 [d90042189] waiting for daemon to start
14:03:11 [d90042189] daemon started
14:03:11 [d90042189] exiting daemon
14:03:11 [d90042189] waiting for daemon to start
14:03:11 [d90042189] daemon started
14:03:11 docker_cli_daemon_experimental_test.go:90:
14:03:11     c.Fatalf("Container %s expected to stay alive after daemon restart", id)
14:03:11 ... Error: Container 1f90a3892d1280613d9a7221eefb640ae56e35e5c37fda7d6638e06690bba402 expected to stay alive after daemon restart
14:03:11 
14:03:11 [d90042189] exiting daemon
Contributor

icecrime commented Jun 14, 2016

@crosbymichael I see a few failures on experimental:

14:03:11 FAIL: docker_cli_daemon_experimental_test.go:64: DockerDaemonSuite.TestCleanupMountsAfterDaemonCrash
14:03:11 
14:03:11 [d90042189] waiting for daemon to start
14:03:11 [d90042189] daemon started
14:03:11 [d90042189] exiting daemon
14:03:11 [d90042189] waiting for daemon to start
14:03:11 [d90042189] daemon started
14:03:11 docker_cli_daemon_experimental_test.go:90:
14:03:11     c.Fatalf("Container %s expected to stay alive after daemon restart", id)
14:03:11 ... Error: Container 1f90a3892d1280613d9a7221eefb640ae56e35e5c37fda7d6638e06690bba402 expected to stay alive after daemon restart
14:03:11 
14:03:11 [d90042189] exiting daemon
Add --live-restore flag
This flags enables full support of daemonless containers in docker.  It
ensures that docker does not stop containers on shutdown or restore and
properly reconnects to the container when restarted.

This is not the default because of backwards compat but should be the
desired outcome for people running containers in prod.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
@tonistiigi

This comment has been minimized.

Show comment
Hide comment
@tonistiigi

tonistiigi Jun 14, 2016

Member

LGTM

Member

tonistiigi commented Jun 14, 2016

LGTM

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jun 14, 2016

Contributor

Ping @thaJeztah: please give the docs a read, and we'll fix what needs to in a followup PR.

Contributor

icecrime commented Jun 14, 2016

Ping @thaJeztah: please give the docs a read, and we'll fix what needs to in a followup PR.

@crosbymichael crosbymichael merged commit 3020081 into moby:master Jun 14, 2016

5 of 7 checks passed

experimental Jenkins build Docker-PRs-experimental 19902 is running
Details
gccgo Jenkins build Docker-PRs-gccgo 6769 is running
Details
docker/dco-signed All commits signed
Details
janky Jenkins build Docker-PRs 28696 has succeeded
Details
userns Jenkins build Docker-PRs-userns 10858 has succeeded
Details
win2lin Jenkins build Docker-PRs-Win2Lin 27281 has succeeded
Details
windowsTP5 Jenkins build Docker-PRs-WoW-TP5 3187 has succeeded
Details

@crosbymichael crosbymichael deleted the crosbymichael:restore-option branch Jun 14, 2016

@innocentme1

This comment has been minimized.

Show comment
Hide comment
@innocentme1

innocentme1 Jul 27, 2016

@crosbymichael @thaJeztah

  1. Are there any updated docs regarding this feature? I could not find one - Can you help me in getting the doc?
  2. So, this is a flag to daemon and is enabled by default? Do users have the option of changing this default setting etc?
  3. Do we have this per container level?

innocentme1 commented Jul 27, 2016

@crosbymichael @thaJeztah

  1. Are there any updated docs regarding this feature? I could not find one - Can you help me in getting the doc?
  2. So, this is a flag to daemon and is enabled by default? Do users have the option of changing this default setting etc?
  3. Do we have this per container level?
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 27, 2016

Member

@innocentme1 basic docs are part of this PR; see https://github.com/docker/docker/pull/23213/files#diff-fc5c21bb9d1d48db7d37c003bb62b566R283. It's a daemon-level flag, and not enabled by default. It's not something that can be set per-container

Member

thaJeztah commented Jul 27, 2016

@innocentme1 basic docs are part of this PR; see https://github.com/docker/docker/pull/23213/files#diff-fc5c21bb9d1d48db7d37c003bb62b566R283. It's a daemon-level flag, and not enabled by default. It's not something that can be set per-container

@HackToday

This comment has been minimized.

Show comment
Hide comment
@HackToday

HackToday Apr 28, 2017

Contributor

Hi @thaJeztah and @crosbymichael
Yes, with that. docker daemon can stop while container still running.

From my usage, I found since containerd daemon,if containerd dies, all containers would still not be available(in stopped status). It means containerd hit same issue as old docker daemon,

Is it ? Thanks

Contributor

HackToday commented Apr 28, 2017

Hi @thaJeztah and @crosbymichael
Yes, with that. docker daemon can stop while container still running.

From my usage, I found since containerd daemon,if containerd dies, all containers would still not be available(in stopped status). It means containerd hit same issue as old docker daemon,

Is it ? Thanks

@HackToday

This comment has been minimized.

Show comment
Hide comment
@HackToday
Contributor

HackToday commented Apr 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment