New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conmon: add support to restore a container #1427

Merged
merged 1 commit into from Mar 12, 2018

Conversation

Projects
None yet
8 participants
@adrianreber
Contributor

adrianreber commented Mar 7, 2018

runc supports checkpointing and restoring containers with the help of
CRIU. To checkpoint a container from podman it is enough to just call
runc to checkpoint the container. To restore a container with podman the
resulting container should again be under the control of conmon.

This extends conmon to be able to also restore a container.

Signed-off-by: Adrian Reber areber@redhat.com

@adrianreber adrianreber requested review from mrunalp and runcom as code owners Mar 7, 2018

@k8s-ci-robot k8s-ci-robot added the size/S label Mar 7, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Mar 7, 2018

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot

This comment has been minimized.

Collaborator

openshift-ci-robot commented Mar 7, 2018

Hi @adrianreber. Thanks for your PR.

I'm waiting for a openshift or kubernetes-incubator member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 7, 2018

/ok-to-test

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 7, 2018

@adrianreber you need to fill out the cla/Linuxfoundation stuff in order to contribute.

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 7, 2018

@adrianreber BTW Thanks for the PR.

@@ -126,6 +127,7 @@ static GOptionEntry opt_entries[] =
{ "cid", 'c', 0, G_OPTION_ARG_STRING, &opt_cid, "Container ID", NULL },
{ "cuuid", 'u', 0, G_OPTION_ARG_STRING, &opt_cuuid, "Container UUID", NULL },
{ "runtime", 'r', 0, G_OPTION_ARG_STRING, &opt_runtime_path, "Runtime path", NULL },
{ "restore", 0, 0, G_OPTION_ARG_NONE, &opt_restore, "Restore a container from a checkpoint", NULL },

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Mar 7, 2018

Contributor

Do you need to have an argument for the checkpoint to restore from?

This comment has been minimized.

@adrianreber

adrianreber Mar 7, 2018

Contributor

The checkpoint to restore from is defined by the container ID and the bundle.

@adrianreber

This comment has been minimized.

Contributor

adrianreber commented Mar 7, 2018

@rhatdan my Linux Foundation ID does not use my Red Hat email address. Who do I need to contact to get my Linux Foundation ID added to Red Hat's organization? Or do I need a new account?

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 7, 2018

I have no idea. @mrunalp @runcom @vbatts Any ideas?

@vbatts

This comment has been minimized.

Contributor

vbatts commented Mar 7, 2018

can you add that email to your github (you can add more than one), or just sign the CLA with the other email too?

@adrianreber

This comment has been minimized.

Contributor

adrianreber commented Mar 7, 2018

I created a new account and now I am authorized to contribute code to this project.

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 7, 2018

/test all

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 8, 2018

/retest

@adrianreber

This comment has been minimized.

Contributor

adrianreber commented Mar 8, 2018

Further testing with podman on my side has shown that shown that @TomSweeneyRedHat was right that an explicit definition of the checkpoint directory makes sense. Especially when looking at further enhancements like pre-copy or post-copy container migration using multiple checkpoints.

I need to update this PR. Please do not merge.

conmon: add support to restore a container
runc supports checkpointing and restoring containers with the help of
CRIU. To checkpoint a container from podman it is enough to just call
runc to checkpoint the container. To restore a container with podman the
resulting container should again be under the control of conmon.

This extends conmon to be able to also restore a container.

Signed-off-by: Adrian Reber <areber@redhat.com>

@adrianreber adrianreber force-pushed the adrianreber:master branch from 59b5b97 to 31894fe Mar 9, 2018

@k8s-ci-robot k8s-ci-robot added size/M and removed size/S labels Mar 9, 2018

@adrianreber

This comment has been minimized.

Contributor

adrianreber commented Mar 9, 2018

These conmon changes in this PR are needed to support chackpointing and restoring in podman: containers/libpod#469

@mrunalp

This comment has been minimized.

Member

mrunalp commented Mar 9, 2018

One question I have is why does this need to go through conmon? Can't podman call runc directly?

@adrianreber

This comment has been minimized.

Contributor

adrianreber commented Mar 9, 2018

@mrunalp: Initially I called runc directly from podman, but the resulting container is then not running under the control of conmon. A newly started container, however, is running under the control of conmon.

I do not know the reasons why the containers are running under conmon, but I tried to replicate the state of a newly created container with a restored container.

@mheon

This comment has been minimized.

Contributor

mheon commented Mar 9, 2018

@adrianreber We need conmon to monitor container state once it is started. conmon creates exit files to indicate to us that the container has exited, and what code it exited with. We are also planning on adding a cleanup callback to it so it can handle unmounting the container and updating the database as well (conmon changes are merged, but we need to make them available in our packages before we take advantage of them).

Given this I definitely agree with the decision to do --restore via conmon. It's running a container, so we need to make sure we have a conmon wrapping that container for monitoring and cleanup

@mrunalp

This comment has been minimized.

Member

mrunalp commented Mar 10, 2018

@adrianreber @mheon okay, sounds good.

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 12, 2018

/test all

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 12, 2018

* '--work-path' is the directory CRIU will run in and
* also place its log files.
*/
add_argv(runtime_argv, "--detach",

This comment has been minimized.

@mheon

mheon Mar 12, 2018

Contributor

Will adding --detach prevent us from attaching to the container once it has been restored?

This comment has been minimized.

@adrianreber

adrianreber Mar 12, 2018

Contributor

No, those are different attaches. The podman attach is using the attach socket to talk to the container and that already works. I already tried it with registry.fedoraproject.org/f26/httpd and after a restore I can see the request to the httpd server being logged on podman attach -l

The runc restore --detach is that same detach as runc run --detach which will immediately return to the shell.

This comment has been minimized.

@mheon

mheon Mar 12, 2018

Contributor

Ack, just wanted to make sure

@mheon

This comment has been minimized.

Contributor

mheon commented Mar 12, 2018

LGTM

@rhatdan

This comment has been minimized.

Contributor

rhatdan commented Mar 12, 2018

All green, merging.

@rhatdan rhatdan merged commit 69f77e7 into kubernetes-sigs:master Mar 12, 2018

8 checks passed

ci/openshift-jenkins/critest_fedora Jenkins job succeeded.
Details
ci/openshift-jenkins/critest_rhel Jenkins job succeeded.
Details
ci/openshift-jenkins/e2e_fedora Jenkins job succeeded.
Details
ci/openshift-jenkins/e2e_rhel Jenkins job succeeded.
Details
ci/openshift-jenkins/integration_fedora Jenkins job succeeded.
Details
ci/openshift-jenkins/integration_rhel Jenkins job succeeded.
Details
cla/linuxfoundation adrianreber authorized
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment