Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix checkpoint's exiting semantics. #37360

Merged
merged 1 commit into from Jul 26, 2018

Conversation

Projects
None yet
10 participants
@bjbroder
Copy link
Contributor

commented Jun 28, 2018

Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even docker checkpoint create --leave-running=false ... would not stop the
container.

Signed-off-by: Brielle Broder bbroder@google.com

@hugelgupf

This comment has been minimized.

Copy link

commented Jun 28, 2018

@hugelgupf

This comment has been minimized.

Copy link

commented Jun 28, 2018

janky seems to be failing for no good reason?

01:07:11 W: Failed to fetch http://cdn-fastly.deb.debian.org/debian/dists/stretch/InRelease  Could not resolve 'cdn-fastly.deb.debian.org'
@hugelgupf

This comment has been minimized.

Copy link

commented Jul 9, 2018

friendly ping

@hugelgupf

This comment has been minimized.

Copy link

commented Jul 9, 2018

cc @cpuguy83 maybe?

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2018

Can we add an integration test for this?

@hugelgupf

This comment has been minimized.

Copy link

commented Jul 9, 2018

@cpuguy83 I'd like to, but we've actually had trouble getting this to work with runc (we're using runsc instead). Are there existing integration tests somewhere that we can build off of?

@cpuguy83 cpuguy83 requested a review from mlaventure Jul 9, 2018

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2018

Is this the error?

docker-runc did not terminate sucessfully: CRIU version check failed: write unixpacket @->@: write: broken pipe path= /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/56affd8f7a2433caefd9e787c4248f6064df70ea774ad9c30ce6ad16ef4efd23/criu-dump.log: unknown error_type="*errors.errorString" module=api

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2018

ping @kolyshkin

@avagin

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2018

Actually, there is one more problem. Docker (containerd) has to call "runc checkpoint" with "--empty-ns network" to dump a docker container.

I have a patch which fixes both these issues:

avagin/docker-ce@911c89f#diff-e1f8834158a9117cafe1744dc2c7adb2

But it depends on containerd changes:
containerd/containerd#2425

@kolyshkin

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2018

I like @avagin approach better (i.e. appending to options if necessary).

@hugelgupf

This comment has been minimized.

Copy link

commented Jul 10, 2018

Sure, we can append to options.

Having to call with an empty NS is specific to runc. runsc does not need to be called with that. Can this be part of something runc-specific, or can this be done in an API-agnostic way?

@avagin

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2018

@hugelgupf Here are changes which resolve the issue with an empty NS:
opencontainers/runc#1840

@bjbroder bjbroder force-pushed the bjbroder:checkpoint-exit branch from 41938ff to c9c0ef9 Jul 10, 2018

@codecov

This comment has been minimized.

Copy link

commented Jul 10, 2018

Codecov Report

Merging #37360 into master will decrease coverage by 0.02%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #37360      +/-   ##
==========================================
- Coverage   34.95%   34.92%   -0.03%     
==========================================
  Files         610      610              
  Lines       44886    44873      -13     
==========================================
- Hits        15690    15673      -17     
- Misses      27077    27081       +4     
  Partials     2119     2119
@bjbroder

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2018

@kolyshkin @avagin cpuguy wanted an integration test, but it seems like there are none for moby checkpoint/restore already? Is there anything y'all have we can contribute to?

@hugelgupf

This comment has been minimized.

Copy link

commented Jul 19, 2018

Ping

@avagin

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2018

LGTM

@bjbroder

This comment has been minimized.

Copy link
Contributor Author

commented Jul 24, 2018

Now that it's lgtm'd, how do I get this merged?

@cpuguy83
Copy link
Contributor

left a comment

LGTM

@vdemeester
Copy link
Member

left a comment

LGTM 🐸
cc @mlaventure @thaJeztah

Fix checkpoint's exiting semantics.
Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even `docker
checkpoint create --leave-running=false ...` would not stop the
container.

Signed-off-by: Brielle Broder <bbroder@google.com>

@vdemeester vdemeester force-pushed the bjbroder:checkpoint-exit branch from c9c0ef9 to db621eb Jul 25, 2018

@mlaventure
Copy link
Contributor

left a comment

LGTM

z failure is unrelated

@thaJeztah thaJeztah merged commit c3a0207 into moby:master Jul 26, 2018

5 of 6 checks passed

z Jenkins build Docker-PRs-s390x 10489 has failed
Details
dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 41414 has succeeded
Details
janky Jenkins build Docker-PRs 50187 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 10613 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 21513 has succeeded
Details
@tswift242

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2018

@bjbroder This restores the behavior of making --leave-running configurable, but is the default still true? The default used to be false.

@bjbroder

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2018

@tswift242 The default for --leave-running was, and still is, false. Previously, both the default (false) and an explicit set of the flag to false would cause the process to incorrectly continue running after checkpointing because "exit" had not been implemented appropriately, even though the leave-running flag is correctly set.

The leave-running flag's behavior can be seen in:
docker-ce/components/cli/cli/command/checkpoint/create.go

@tswift242

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2018

Gotcha. Thanks for the reply and the fix to this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.