Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author a node e2e test that verifies live-restore functionality in docker #38303

Closed
derekwaynecarr opened this issue Dec 7, 2016 · 9 comments
Assignees
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@derekwaynecarr
Copy link
Member

Newer versions of docker have the ability to keep containers alive during daemon downtime.

See: https://docs.docker.com/engine/admin/live-restore/

For kubelet's that integrate with the docker runtime, we should author a node e2e test that verifies enabling this feature works as expected in kubelet and pods remain alive and well after a restart.

/cc @kubernetes/sig-node @sjenning @mrunalp

@derekwaynecarr derekwaynecarr self-assigned this Dec 7, 2016
@dchen1107
Copy link
Member

I did some verification on docker 1.12 rc-x a while back, and ran into some issue.

@mrunalp
Copy link
Contributor

mrunalp commented Dec 13, 2016

@dchen1107 We recently got live-restore working even when nested within another container so let us know if you run into any issues. We had an issue where the shim process was in the same cgroup as docker daemon so reload was taking it down. The workaround was to put it in another cgroup. For k8s it would make sense to keep the shim processes in the pod cgroup.

@derekwaynecarr
Copy link
Member Author

@dchen1107 -- would love to know more on what you found if you are able to share/recall. we plan to enable the function and do some of our own testing soon, will report back what we find.

@resouer
Copy link
Contributor

resouer commented Dec 19, 2016

@mrunalp If not the dind case, then we don't need to put shim in pod cgroup, right? I'm not very sure if shim belongs to pod cgroup ...

And what qos tier do you think shim belongs to? Always the same with pod?

@derekwaynecarr
Copy link
Member Author

@hodovska on my team is going to get some testing in place that has this feature enabled to see where things break down. i think an option to node e2e that runs an optional test for this scenario where docker has this configured will help the broader community.

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 25, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 25, 2017
@jsravn
Copy link
Contributor

jsravn commented Aug 29, 2017

FWIW live restore makes a big positive impact on reliabity for us. It'd be nice to prioritise making it official for k8s. In the past we had to restrict node sizes and things due to the instability of docker. We've been on live restore for the past 6mo or so, and I can't imagine living without it. We haven't noticed anything majorly bad (apart from kubelet complaining a lot after docker restarts).

@yguo0905
Copy link
Contributor

This is done in #50277 as part of #42926. Please close this issue.

@yujuhong
Copy link
Contributor

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

10 participants