New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet restart cause running pod restart with UnexpectedAdmissionError when pods have initContainers and external devices like GPU #124345
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/cc |
my 2c: completed containers, aka containers terminated succesfully and not deleted, like init containers or containers belonging to Jobs, should NOT be restarted or even retry admission when kubelet is restarted. The reason is, well, these containers already completed succesfully. |
quoting https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior:
and
which ties to #123980 |
xref: #117955 |
yes! that's why I created this issue, and this issue is more focus on the init-containers, the completed init-containers cause running pods to be restarted. this fixed doesn't help because the pod is still running: a2ca66d I was thinking about, can we add a check here to skip allocateDevice if containers is expected stopped, looks like my pods was killed here: |
/cc |
@zwpaper can you try this on a newer k8s version? 1.25 is out of support. Can you also share a pod spec to reproduce this error? Why is the pod failing with an admission error if the device is still available? |
/cc |
/triage needs-information |
What happened?
When restarting kubelet, it will restart the running pods with UnexpectedAdmissionError when pods' initContainers and containers both use external devices like GPU
What did you expect to happen?
Restart kubelet should not cause running pods to restart
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
there was an issue and a fix for running pods with devices, but it looks like the initContainers is not counted as should skip containers.
the fix has cherry-picked to v1.25.16
Related Issues:
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: