Skip to content

Fix checkpoint's exiting semantics. #37360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 26, 2018
Merged

Conversation

bjbroder
Copy link
Contributor

Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even docker checkpoint create --leave-running=false ... would not stop the
container.

Signed-off-by: Brielle Broder bbroder@google.com

@hugelgupf
Copy link

cc @hugelgupf @nlacasse

@hugelgupf
Copy link

hugelgupf commented Jun 28, 2018

janky seems to be failing for no good reason?

01:07:11 W: Failed to fetch http://cdn-fastly.deb.debian.org/debian/dists/stretch/InRelease  Could not resolve 'cdn-fastly.deb.debian.org'

@hugelgupf
Copy link

friendly ping

@hugelgupf
Copy link

cc @cpuguy83 maybe?

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 9, 2018

Can we add an integration test for this?

@hugelgupf
Copy link

@cpuguy83 I'd like to, but we've actually had trouble getting this to work with runc (we're using runsc instead). Are there existing integration tests somewhere that we can build off of?

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 9, 2018

Is this the error?

docker-runc did not terminate sucessfully: CRIU version check failed: write unixpacket @->@: write: broken pipe path= /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/56affd8f7a2433caefd9e787c4248f6064df70ea774ad9c30ce6ad16ef4efd23/criu-dump.log: unknown error_type="*errors.errorString" module=api

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 9, 2018

ping @kolyshkin

@avagin
Copy link
Contributor

avagin commented Jul 9, 2018

Actually, there is one more problem. Docker (containerd) has to call "runc checkpoint" with "--empty-ns network" to dump a docker container.

I have a patch which fixes both these issues:

avagin/docker-ce@911c89f#diff-e1f8834158a9117cafe1744dc2c7adb2

But it depends on containerd changes:
containerd/containerd#2425

@kolyshkin
Copy link
Contributor

I like @avagin approach better (i.e. appending to options if necessary).

@hugelgupf
Copy link

hugelgupf commented Jul 10, 2018

Sure, we can append to options.

Having to call with an empty NS is specific to runc. runsc does not need to be called with that. Can this be part of something runc-specific, or can this be done in an API-agnostic way?

@avagin
Copy link
Contributor

avagin commented Jul 10, 2018

@hugelgupf Here are changes which resolve the issue with an empty NS:
opencontainers/runc#1840

@codecov
Copy link

codecov bot commented Jul 10, 2018

Codecov Report

Merging #37360 into master will decrease coverage by 0.02%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #37360      +/-   ##
==========================================
- Coverage   34.95%   34.92%   -0.03%     
==========================================
  Files         610      610              
  Lines       44886    44873      -13     
==========================================
- Hits        15690    15673      -17     
- Misses      27077    27081       +4     
  Partials     2119     2119

@bjbroder
Copy link
Contributor Author

@kolyshkin @avagin cpuguy wanted an integration test, but it seems like there are none for moby checkpoint/restore already? Is there anything y'all have we can contribute to?

@hugelgupf
Copy link

Ping

@avagin
Copy link
Contributor

avagin commented Jul 19, 2018

LGTM

@bjbroder
Copy link
Contributor Author

Now that it's lgtm'd, how do I get this merged?

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸
cc @mlaventure @thaJeztah

Previously, dockerd would always ask containerd to pass --leave-running
to runc/runsc, ignoring the exit boolean value. Hence, even `docker
checkpoint create --leave-running=false ...` would not stop the
container.

Signed-off-by: Brielle Broder <bbroder@google.com>
Copy link
Contributor

@mlaventure mlaventure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

z failure is unrelated

@tswift242
Copy link
Contributor

@bjbroder This restores the behavior of making --leave-running configurable, but is the default still true? The default used to be false.

@bjbroder
Copy link
Contributor Author

bjbroder commented Jul 27, 2018

@tswift242 The default for --leave-running was, and still is, false. Previously, both the default (false) and an explicit set of the flag to false would cause the process to incorrectly continue running after checkpointing because "exit" had not been implemented appropriately, even though the leave-running flag is correctly set.

The leave-running flag's behavior can be seen in:
docker-ce/components/cli/cli/command/checkpoint/create.go

@tswift242
Copy link
Contributor

Gotcha. Thanks for the reply and the fix to this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants