New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forensic Container Checkpointing #2008
Comments
/sig node |
We recommend actively socializing your KEP with the appropriate sig to gain visibility, consensus and also for scheduling. Also as you are not sure of what SIG will sponsor this, reaching out to the SIGs to get clarity on that will be helpful to move your KEP forward. |
Hi @adrianreber Any updates on whether this will be included in 1.20? Enhancements Freeze is October 6th and by that time we require: The KEP must be merged in an implementable state Best, |
Hello @kikisdeliveryservice
Sorry, but how would I decide this? There has not been a lot of feedback on the corresponding KEP which makes it really difficult for me to answer that question. On the other hand, maybe the missing feedback is a good sign that it will take some more time. So probably this will not be included in 1.20. |
Normally the sig would give a clear signal that it would be included. That would be by : reviewing the KEP, agreeing to the milestone proposals in the KEP etc.. I'd encourage you to keep in touch with them and start the 1.21 conversation early if this does not end up getting reviewed/merged properly by October 6th. Best, |
@kikisdeliveryservice Thanks for the guidance. Will do. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@adrianreber: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale Still working on it. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
If you look at https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ you can see how it is possible to restore containers in Kubernetes by adding the checkpoint archive to an OCI image. This way you can tell Kubernetes to create a container from that checkpoint image and the resulting container will be a restore.
Not sure what you mean here. |
I can add more details. The way in https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ is an implicit way and kubernetes actually don't know the magic and it relies on the underneath container runtime to detect the image spec. The other user journey could be explicit way. Kubelet can detect the perceive the snapshot and eventually invoke some restore path through CRI. It leaves the flexibility at the kubernetes layer to do lots of things, for example, schedule to node already with original image and it can only apply diff to get started. Have you evaluated explicit way in your original design?
|
I see no difference between the two ways you described. The checkpoint OCI image is only the checkpoint data and nothing else. The base image, the image the container is based on, is not part of it. As implemented in CRI-O, the base image will be pulled from the registry if missing. So I see no difference based on what you are describing. The automatic early pulling of the base image would not be possible, that is correct. The other reason to do it implicitly is, that adding additional interfaces to the CRI takes a lot of time and as it was possible to solve it without an additional CRI call, it seemed the easier solution. If we would be talking about checkpointing and restoring pods, I think it would be necessary to have and explicit interface in the CRI. For containers I do not think it is necessary.
It feels like I have implemented almost everything to test it out initially 😉 |
Hi! @adrianreber I am interested in your checkpoint/restore in Kubernetes project, so I recently attempted to use this feature, but encountered some issues. Could you please help me? it 's very important to me. Description: I followed your demo video https://fosdem.org/2023/schedule/event/container_kubernetes_criu/ and official documentation https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ to perform checkpoint and restore operations on your container "quay.io/adrianreber/counter". Initially, everything appeared to be working fine, but when I tried to restore the counter container on another node in my k8s cluster using a YAML file, the restored container would enter an error state within 1 second of entering the running state. Like this: What I did: I attempted to debug kubelet and discovered that after the Pod was restored, an old Pod's cgroup file (such as "kubepods-besteffort-pod969bc448_d138_4131_ad8d_344d1cb78b40.slice") was generated in the "/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice" directory. However, the Pod associated with this cgroup was not running on the destination node, so kubelet would delete the directory, causing the restored container process to exit and resulting in a Pod Error. My question is: why did this issue occur, and could it be a version compatibility issue? My versions: |
@Qiubabimuniuniu are you using cgroup v1 or v2 on your systems? There might be still a bug in CRI-O when using checkpoint/restore on cgroup v2 systems. |
@adrianreber I'm using cgroup v2. Thank you very much!! I will try switching to use cgroup v1 and see if the problem can be resolved. Recently, I have been attempting to modify the kubelet and containerd source code to support "checkpoint/restore in Kubernetes". However, I encountered the same issue after completing the development. Additionally, while trying your project, I found that when using cri-o, I also encountered the same problem when performing checkpoint and restore operations. This problem has been bothering me for a long time. Thank you very much for your solution. |
It is not really a solution. I should fix CRI-O to correctly work with checkpoint and restore on cgroup v2 systems. |
Looking forward to your fix I also encountered the same issue when I was modifying the kubelet and containerd code to customize 'checkpoint/restore in k8s'. Is it possible that the issue is with CRIU instead of the container runtime ?(CRI-O or containerd) (I'm not a professional. just a guess ) Anyway, thank you very much |
@adrianreber Hello, I have carefully read your blog and attempted to perform container checkpoint/restore based on your steps. However, I encountered the following issue: when I tried to recover the Pod, it remained in the CreateContainerError state and the kubelet displayed an error: "no command specified." I have browsed through related issues and tried some solutions, but none of them have been successful. Could you please advise on how to resolve this? I sincerely appreciate your assistance! |
@gvhfx are you using cgroupv2? |
@adrianreber I have checked the version of cgroup and it is using v1. Additionally, my environment is CentOS7, and the previous kernel version was 3.10 (which does not support cgroupv2). I initially thought it might be due to the kernel, so I upgraded it to 5.4. However, I am still encountering the same error. |
@gvhfx CentOS 7 is really old. Can you try something newer? |
Hi @adrianreber, over the past few months I've been trying to do some higher-level migrations using the checkpointing technique you developed: I first tried migrating the Pods, which as you mentioned in your presentation, is basically checkpointing all the containers and then matching some metadata. Then I tried to migrate the Pod on the Replicaset. I first deleted the Replicaset but kept the Pods, and then migrated the target Pod. After that, I recreated the Replicaset with the same configuration. The label of the migrated Pod should match the Selector of the Replicaset. so that the recreated Replicaset will capture it. I've also done the migration on a Deployment and a StatefulSet using the same logic, and so far everything is working fine when just testing with your counter example. However, I have no experience as a Kubernetes developer, so I just did this through python scripts. Therefore, I would like to ask if my approach really makes sense. If so, how difficult it is to implement these in Kubernetes? |
Sounds great. I do not think it would be too difficult. I also had an implementation of pod checkpoint/restore three years ago. I did the pod checkpoint creation in CRI-O, not sure how you have done it. |
I didn't do low-level stuff, I simply used your forensic checkpointing multiple times on all containers of a pod, which only supports CRI-O for now right? and then updated the container and node part in the Pod configuration, leaving the rest of the configuration basically unchanged |
Kubernetes 1.25 introduced the possibility to checkpoint a container. For details please see the KEP 2008 Forensic Container Checkpointing kubernetes/enhancements#2008 The initial implementation only provided a kubelet API endpoint to trigger a checkpoint. The main reason for not extending it to the API server and kubectl was that checkpointing is a completely new concept. Although the result of the checkpointing, the checkpoint archive, is only accessible by root it is important to remember that it contains all memory pages and thus all possible passwords, private keys and random numbers. With the checkpoint archive being only accessible by root it does not directly make it easier to access this potentially confidential information as root would be able to retrieve that information anyway. Now, at least three Kubernetes releases later, we have not heard any negative feedback about the checkpoint archive and its data. There were, however, many questions to be able to create a checkpoint via kubectl and not just via the kubelet API endpoint. This commit adds 'checkpoint' support to kubectl. The 'checkpoint' command is heavily influenced by the code of the 'exec' and 'logs' command. The tests are implemented that they handle a CRI implementation with and without a implementation of the CRI RPC call 'ContainerCheckpoint'. Signed-off-by: Adrian Reber <areber@redhat.com>
Pull request to automatically delete older checkpoint archives: kubernetes/kubernetes#115888 |
Kubernetes 1.25 introduced the possibility to checkpoint a container. For details please see the KEP 2008 Forensic Container Checkpointing kubernetes/enhancements#2008 The initial implementation only provided a kubelet API endpoint to trigger a checkpoint. The main reason for not extending it to the API server and kubectl was that checkpointing is a completely new concept. Although the result of the checkpointing, the checkpoint archive, is only accessible by root it is important to remember that it contains all memory pages and thus all possible passwords, private keys and random numbers. With the checkpoint archive being only accessible by root it does not directly make it easier to access this potentially confidential information as root would be able to retrieve that information anyway. Now, at least three Kubernetes releases later, we have not heard any negative feedback about the checkpoint archive and its data. There were, however, many questions to be able to create a checkpoint via kubectl and not just via the kubelet API endpoint. This commit adds 'checkpoint' support to kubectl. The 'checkpoint' command is heavily influenced by the code of the 'exec' and 'logs' command. The checkpoint command is only available behind the 'alpha' sub-command as the "Forensic Container Checkpointing" KEP is still marked as Alpha. Example output: $ kubectl alpha checkpoint test-pod -c container-2 Node: 127.0.0.1/127.0.0.1 Namespace: default Pod: test-pod-1 Container: container-2 Checkpoint Archive: /var/lib/kubelet/checkpoints/checkpoint-archive.tar The tests are implemented that they handle a CRI implementation with and without a implementation of the CRI RPC call 'ContainerCheckpoint'. Signed-off-by: Adrian Reber <areber@redhat.com>
First attempt to provide |
Hi @adrianreber, I created a simple microservice pod and tried to migrate it. I found that when I just started it and used a counter-like function, I was able to checkpoint it, but when I used it to connect to the message broker and send the message, checkpointing it will raise the following error, is there any way to solve it? checkpointed: checkpointing of default/order-service-7c69b4d88b-n56xq/order-service failed running "/usr/local/bin/runc" failed: failed: time="2023-10-24T16:40:11Z" |
@Tobeabellwether Please open a bug at CRI-O with the |
@adrianreber Thanks for the tip, I checked that (00.134562) Error (criu/sk-inet.c:191): inet: Connected TCP socket, consider using --tcp-established option. So, I tried to forcefully interrupt the TCP connection between the pod and the message broker, and the checkpoint was successfully created. Should this still be considered a bug? |
Ah, good to know. No, this is not a real bug. We probably at some point need the ability to pass down different parameters from Kubernetes to CRIU. But this is something for the far future. You can also control CRIU options with a CRIU configuration file. Handling TCP connections that are established could be configured there. |
Hi @adrianreber again, when I try to restore the checkpoint of the pod with TCP connection on a new pod, I encounter a new problem: ![]() So I try to check the log file under ![]() and the restoration for pods without TCP connections works fine. |
@Tobeabellwether You can redirect the CRIU log file to another file using the CRIU configuration file: |
Enhancement Description
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update(s):k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update(s):The text was updated successfully, but these errors were encountered: