-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author a node e2e test that verifies live-restore functionality in docker #38303
Comments
I did some verification on docker 1.12 rc-x a while back, and ran into some issue. |
@dchen1107 We recently got live-restore working even when nested within another container so let us know if you run into any issues. We had an issue where the shim process was in the same cgroup as docker daemon so reload was taking it down. The workaround was to put it in another cgroup. For k8s it would make sense to keep the shim processes in the pod cgroup. |
@dchen1107 -- would love to know more on what you found if you are able to share/recall. we plan to enable the function and do some of our own testing soon, will report back what we find. |
@mrunalp If not the dind case, then we don't need to put shim in pod cgroup, right? I'm not very sure if shim belongs to pod cgroup ... And what qos tier do you think shim belongs to? Always the same with pod? |
@hodovska on my team is going to get some testing in place that has this feature enabled to see where things break down. i think an option to node e2e that runs an optional test for this scenario where docker has this configured will help the broader community. |
/sig node |
FWIW live restore makes a big positive impact on reliabity for us. It'd be nice to prioritise making it official for k8s. In the past we had to restrict node sizes and things due to the instability of docker. We've been on live restore for the past 6mo or so, and I can't imagine living without it. We haven't noticed anything majorly bad (apart from kubelet complaining a lot after docker restarts). |
/close |
Newer versions of docker have the ability to keep containers alive during daemon downtime.
See: https://docs.docker.com/engine/admin/live-restore/
For kubelet's that integrate with the docker runtime, we should author a node e2e test that verifies enabling this feature works as expected in kubelet and pods remain alive and well after a restart.
/cc @kubernetes/sig-node @sjenning @mrunalp
The text was updated successfully, but these errors were encountered: