Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forensic Container Checkpointing #2008

Open
13 of 20 tasks
adrianreber opened this issue Sep 23, 2020 · 66 comments
Open
13 of 20 tasks

Forensic Container Checkpointing #2008

adrianreber opened this issue Sep 23, 2020 · 66 comments
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status
Milestone

Comments

@adrianreber
Copy link
Contributor

adrianreber commented Sep 23, 2020

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 23, 2020
@adrianreber
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2020
@kikisdeliveryservice
Copy link
Member

Discussion Link: N/A (or... at multiple conferences during the last years when presenting CRIU and container migration, there was always the question when will we see container migration in Kubernetes)

Responsible SIGs: maybe node

We recommend actively socializing your KEP with the appropriate sig to gain visibility, consensus and also for scheduling. Also as you are not sure of what SIG will sponsor this, reaching out to the SIGs to get clarity on that will be helpful to move your KEP forward.

@kikisdeliveryservice
Copy link
Member

Hi @adrianreber

Any updates on whether this will be included in 1.20?

Enhancements Freeze is October 6th and by that time we require:

The KEP must be merged in an implementable state
The KEP must have test plans
The KEP must have graduation criteria
The KEP must have an issue in the milestone

Best,
Kirsten

@adrianreber
Copy link
Contributor Author

Hello @kikisdeliveryservice

Any updates on whether this will be included in 1.20?

Sorry, but how would I decide this? There has not been a lot of feedback on the corresponding KEP which makes it really difficult for me to answer that question. On the other hand, maybe the missing feedback is a good sign that it will take some more time. So probably this will not be included in 1.20.

@kikisdeliveryservice kikisdeliveryservice added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Sep 28, 2020
@kikisdeliveryservice
Copy link
Member

Normally the sig would give a clear signal that it would be included. That would be by : reviewing the KEP, agreeing to the milestone proposals in the KEP etc.. I'd encourage you to keep in touch with them and start the 1.21 conversation early if this does not end up getting reviewed/merged properly by October 6th.

Best,
Kirsten

@adrianreber
Copy link
Contributor Author

@kikisdeliveryservice Thanks for the guidance. Will do.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 26, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@adrianreber
Copy link
Contributor Author

/reopen
/remove-lifecycle rotten

@k8s-ci-robot
Copy link
Contributor

@adrianreber: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Feb 25, 2021
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 25, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2021
@adrianreber
Copy link
Contributor Author

/remove-lifecycle stale

Still working on it.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@adrianreber
Copy link
Contributor Author

One quick question on the restore process, seems in 1.25 it's not directly implemented in kubernetes.

If you look at https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ you can see how it is possible to restore containers in Kubernetes by adding the checkpoint archive to an OCI image. This way you can tell Kubernetes to create a container from that checkpoint image and the resulting container will be a restore.

We hope the pod can be created from the restore which leverages capabilities from contained layer.

Not sure what you mean here.

@Jeffwan
Copy link
Contributor

Jeffwan commented Feb 15, 2023

We hope the pod can be created from the restore which leverages capabilities from contained layer.
Not sure what you mean here.

I can add more details. The way in https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ is an implicit way and kubernetes actually don't know the magic and it relies on the underneath container runtime to detect the image spec.

The other user journey could be explicit way. Kubelet can detect the perceive the snapshot and eventually invoke some restore path through CRI. It leaves the flexibility at the kubernetes layer to do lots of things, for example, schedule to node already with original image and it can only apply diff to get started. Have you evaluated explicit way in your original design?

apiVersion: v1
kind: Pod
metadata:
  namePrefix: example-
...
  annotations:
    app.kubernetes.io/snapshot-image: xxxxxx
...

@adrianreber
Copy link
Contributor Author

We hope the pod can be created from the restore which leverages capabilities from contained layer.
Not sure what you mean here.

I can add more details. The way in https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ is an implicit way and kubernetes actually don't know the magic and it relies on the underneath container runtime to detect the image spec.

The other user journey could be explicit way. Kubelet can detect the perceive the snapshot and eventually invoke some restore path through CRI. I leaves the flexibility at the kubernetes layer to do lots of things, for example, schedule to node already with original image and it can only apply diff to get started.

I see no difference between the two ways you described. The checkpoint OCI image is only the checkpoint data and nothing else. The base image, the image the container is based on, is not part of it. As implemented in CRI-O, the base image will be pulled from the registry if missing. So I see no difference based on what you are describing. The automatic early pulling of the base image would not be possible, that is correct.

The other reason to do it implicitly is, that adding additional interfaces to the CRI takes a lot of time and as it was possible to solve it without an additional CRI call, it seemed the easier solution.

If we would be talking about checkpointing and restoring pods, I think it would be necessary to have and explicit interface in the CRI. For containers I do not think it is necessary.

Have you evaluated explicit way in your original design?

It feels like I have implemented almost everything to test it out initially 😉

@Qiubabimuniuniu
Copy link

Qiubabimuniuniu commented Apr 14, 2023

Hi! @adrianreber I am interested in your checkpoint/restore in Kubernetes project, so I recently attempted to use this feature, but encountered some issues. Could you please help me? it 's very important to me.

Description: I followed your demo video https://fosdem.org/2023/schedule/event/container_kubernetes_criu/ and official documentation https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ to perform checkpoint and restore operations on your container "quay.io/adrianreber/counter". Initially, everything appeared to be working fine, but when I tried to restore the counter container on another node in my k8s cluster using a YAML file, the restored container would enter an error state within 1 second of entering the running state. Like this:
image

What I did: I attempted to debug kubelet and discovered that after the Pod was restored, an old Pod's cgroup file (such as "kubepods-besteffort-pod969bc448_d138_4131_ad8d_344d1cb78b40.slice") was generated in the "/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice" directory. However, the Pod associated with this cgroup was not running on the destination node, so kubelet would delete the directory, causing the restored container process to exit and resulting in a Pod Error.

My question is: why did this issue occur, and could it be a version compatibility issue?

My versions:
Ubuntu 22.04
kubelet 1.26.0
cri-o 1.26.3
criu 3.17.1 (https://build.opensuse.org/project/show/devel:tools:criu)

@adrianreber
Copy link
Contributor Author

@Qiubabimuniuniu are you using cgroup v1 or v2 on your systems? There might be still a bug in CRI-O when using checkpoint/restore on cgroup v2 systems.

@Qiubabimuniuniu
Copy link

@adrianreber I'm using cgroup v2. Thank you very much!!

I will try switching to use cgroup v1 and see if the problem can be resolved. Recently, I have been attempting to modify the kubelet and containerd source code to support "checkpoint/restore in Kubernetes". However, I encountered the same issue after completing the development. Additionally, while trying your project, I found that when using cri-o, I also encountered the same problem when performing checkpoint and restore operations. This problem has been bothering me for a long time. Thank you very much for your solution.

@adrianreber
Copy link
Contributor Author

Thank you very much for your solution.

It is not really a solution. I should fix CRI-O to correctly work with checkpoint and restore on cgroup v2 systems.

@Qiubabimuniuniu
Copy link

Qiubabimuniuniu commented Apr 14, 2023

Thank you very much for your solution.

It is not really a solution. I should fix CRI-O to correctly work with checkpoint and restore on cgroup v2 systems.

Looking forward to your fix

I also encountered the same issue when I was modifying the kubelet and containerd code to customize 'checkpoint/restore in k8s'. Is it possible that the issue is with CRIU instead of the container runtime ?(CRI-O or containerd) (I'm not a professional. just a guess ) Anyway, thank you very much

@Atharva-Shinde Atharva-Shinde removed the tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team label May 14, 2023
@gvhfx
Copy link

gvhfx commented Sep 13, 2023

@adrianreber Hello, I have carefully read your blog and attempted to perform container checkpoint/restore based on your steps. However, I encountered the following issue: when I tried to recover the Pod, it remained in the CreateContainerError state and the kubelet displayed an error: "no command specified." I have browsed through related issues and tried some solutions, but none of them have been successful. Could you please advise on how to resolve this? I sincerely appreciate your assistance!
image
image

@adrianreber
Copy link
Contributor Author

@gvhfx are you using cgroupv2?

@gvhfx
Copy link

gvhfx commented Sep 13, 2023

@adrianreber I have checked the version of cgroup and it is using v1. Additionally, my environment is CentOS7, and the previous kernel version was 3.10 (which does not support cgroupv2). I initially thought it might be due to the kernel, so I upgraded it to 5.4. However, I am still encountering the same error.

@adrianreber
Copy link
Contributor Author

@gvhfx CentOS 7 is really old. Can you try something newer?

@Tobeabellwether
Copy link

Tobeabellwether commented Sep 17, 2023

Hi @adrianreber, over the past few months I've been trying to do some higher-level migrations using the checkpointing technique you developed:

I first tried migrating the Pods, which as you mentioned in your presentation, is basically checkpointing all the containers and then matching some metadata.

Then I tried to migrate the Pod on the Replicaset. I first deleted the Replicaset but kept the Pods, and then migrated the target Pod. After that, I recreated the Replicaset with the same configuration. The label of the migrated Pod should match the Selector of the Replicaset. so that the recreated Replicaset will capture it.

I've also done the migration on a Deployment and a StatefulSet using the same logic, and so far everything is working fine when just testing with your counter example.

However, I have no experience as a Kubernetes developer, so I just did this through python scripts. Therefore, I would like to ask if my approach really makes sense. If so, how difficult it is to implement these in Kubernetes?

@adrianreber
Copy link
Contributor Author

However, I have no experience as a Kubernetes developer, so I just did this through python scripts. Therefore, I would like to ask if my approach really makes sense. If so, how difficult it is to implement these in Kubernetes?

Sounds great. I do not think it would be too difficult. I also had an implementation of pod checkpoint/restore three years ago. I did the pod checkpoint creation in CRI-O, not sure how you have done it.

@Tobeabellwether
Copy link

However, I have no experience as a Kubernetes developer, so I just did this through python scripts. Therefore, I would like to ask if my approach really makes sense. If so, how difficult it is to implement these in Kubernetes?

Sounds great. I do not think it would be too difficult. I also had an implementation of pod checkpoint/restore three years ago. I did the pod checkpoint creation in CRI-O, not sure how you have done it.

I didn't do low-level stuff, I simply used your forensic checkpointing multiple times on all containers of a pod, which only supports CRI-O for now right? and then updated the container and node part in the Pod configuration, leaving the rest of the configuration basically unchanged

adrianreber added a commit to adrianreber/kubernetes that referenced this issue Sep 26, 2023
Kubernetes 1.25 introduced the possibility to checkpoint a container.

For details please see the KEP 2008 Forensic Container Checkpointing
kubernetes/enhancements#2008

The initial implementation only provided a kubelet API endpoint to
trigger a checkpoint. The main reason for not extending it to the API
server and kubectl was that checkpointing is a completely new concept.

Although the result of the checkpointing, the checkpoint archive, is only
accessible by root it is important to remember that it contains all
memory pages and thus all possible passwords, private keys and random
numbers. With the checkpoint archive being only accessible by root it
does not directly make it easier to access this potentially confidential
information as root would be able to retrieve that information anyway.

Now, at least three Kubernetes releases later, we have not heard any
negative feedback about the checkpoint archive and its data. There were,
however, many questions to be able to create a checkpoint via kubectl
and not just via the kubelet API endpoint.

This commit adds 'checkpoint' support to kubectl. The 'checkpoint'
command is heavily influenced by the code of the 'exec' and 'logs'
command.

The tests are implemented that they handle a CRI implementation with and
without a implementation of the CRI RPC call 'ContainerCheckpoint'.

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Contributor Author

Pull request to automatically delete older checkpoint archives: kubernetes/kubernetes#115888

adrianreber added a commit to adrianreber/kubernetes that referenced this issue Sep 27, 2023
Kubernetes 1.25 introduced the possibility to checkpoint a container.

For details please see the KEP 2008 Forensic Container Checkpointing
kubernetes/enhancements#2008

The initial implementation only provided a kubelet API endpoint to
trigger a checkpoint. The main reason for not extending it to the API
server and kubectl was that checkpointing is a completely new concept.

Although the result of the checkpointing, the checkpoint archive, is only
accessible by root it is important to remember that it contains all
memory pages and thus all possible passwords, private keys and random
numbers. With the checkpoint archive being only accessible by root it
does not directly make it easier to access this potentially confidential
information as root would be able to retrieve that information anyway.

Now, at least three Kubernetes releases later, we have not heard any
negative feedback about the checkpoint archive and its data. There were,
however, many questions to be able to create a checkpoint via kubectl
and not just via the kubelet API endpoint.

This commit adds 'checkpoint' support to kubectl. The 'checkpoint'
command is heavily influenced by the code of the 'exec' and 'logs'
command. The checkpoint command is only available behind the 'alpha'
sub-command as the "Forensic Container Checkpointing" KEP is still
marked as Alpha.

Example output:

 $ kubectl alpha checkpoint test-pod -c container-2
 Node:                  127.0.0.1/127.0.0.1
 Namespace:             default
 Pod:                   test-pod-1
 Container:             container-2
 Checkpoint Archive:    /var/lib/kubelet/checkpoints/checkpoint-archive.tar

The tests are implemented that they handle a CRI implementation with and
without a implementation of the CRI RPC call 'ContainerCheckpoint'.

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Contributor Author

First attempt to provide checkpoint via kubectl: kubernetes/kubernetes#120898

@Tobeabellwether
Copy link

Hi @adrianreber, I created a simple microservice pod and tried to migrate it. I found that when I just started it and used a counter-like function, I was able to checkpoint it, but when I used it to connect to the message broker and send the message, checkpointing it will raise the following error, is there any way to solve it?

checkpointed: checkpointing of default/order-service-7c69b4d88b-n56xq/order-service failed
(rpc error: code = Unknown desc = failed to checkpoint container b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af:

running "/usr/local/bin/runc"
["checkpoint"
"--image-path" "/var/lib/containers/storage/overlay-containers/b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af/userdata/checkpoint"
"--work-path" "/var/lib/containers/storage/overlay-containers/b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af/userdata"
"--leave-running" "b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af"]

failed: /usr/local/bin/runc --root /run/runc --systemd-cgroup checkpoint --image-path /var/lib/containers/storage/overlay-containers/b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af/userdata --leave-running b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af

failed: time="2023-10-24T16:40:11Z"
level=error
msg="criu failed: type NOTIFY errno 0\nlog file: /var/lib/containers/storage/overlay-containers/b0ae35f30db3124341035041d02ce85bd83be2b38180081ab919a9f89e16c3af/userdata/dump.log"

@adrianreber
Copy link
Contributor Author

@Tobeabellwether Please open a bug at CRI-O with the dump.log attached.

@Tobeabellwether
Copy link

@Tobeabellwether Please open a bug at CRI-O with the dump.log attached.

@adrianreber Thanks for the tip, I checked that dump.log file and found:

(00.134562) Error (criu/sk-inet.c:191): inet: Connected TCP socket, consider using --tcp-established option.
(00.134634) ----------------------------------------
(00.134654) Error (criu/cr-dump.c:1669): Dump files (pid: 1533879) failed with -1

So, I tried to forcefully interrupt the TCP connection between the pod and the message broker, and the checkpoint was successfully created. Should this still be considered a bug?

@adrianreber
Copy link
Contributor Author

Ah, good to know. No, this is not a real bug. We probably at some point need the ability to pass down different parameters from Kubernetes to CRIU. But this is something for the far future.

You can also control CRIU options with a CRIU configuration file. Handling TCP connections that are established could be configured there.

@Tobeabellwether
Copy link

Ah, good to know. No, this is not a real bug. We probably at some point need the ability to pass down different parameters from Kubernetes to CRIU. But this is something for the far future.

You can also control CRIU options with a CRIU configuration file. Handling TCP connections that are established could be configured there.

Hi @adrianreber again, when I try to restore the checkpoint of the pod with TCP connection on a new pod, I encounter a new problem:

image

So I try to check the log file under /var/run/containers/storage/overlay-containers/order-service/userdata/restore.log but I only found those folders, no one with the name of the container to restore:

image

and the restoration for pods without TCP connections works fine.

@adrianreber
Copy link
Contributor Author

@Tobeabellwether You can redirect the CRIU log file to another file using the CRIU configuration file: log-file /tmp/restore.log and have a look at that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status
Projects
None yet
Development

No branches or pull requests