Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node reboot leaving existing pod using resources stuck with error UnexpectedAdmissionError #125579

Open
liuyuan10 opened this issue Jun 19, 2024 · 6 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@liuyuan10
Copy link

What happened?

When a node is rebooted, pod using resources allocated by device plugin will encounter UnexpectedAdmissionError error as below:

  Warning  UnexpectedAdmissionError  84s                kubelet            Allocate failed due to no healthy devices present; cannot allocate unhealthy devices xxx, which is unexpected

What makes it really bad is if it's a raw pod, it stucks in such state and never recover.

What did you expect to happen?

The pod should be retried until device plugin is ready

How can we reproduce it (as minimally and precisely as possible)?

  1. create a pod requesting any resources allocated via a device plugin
  2. reboot the node
  3. observe pod failure

Anything else we need to know?

The behavior is introduced with #116376

And there are various issues opened around kubelet restart #118559 #124345

But this issue is about node restart. When node is restarted, kubelet started to rerun existing pods in random order. So a pod can run into this issue before device plugin pod is healthy on the node

Kubernetes version

kubectl version
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.1

Cloud provider

GKE

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@liuyuan10 liuyuan10 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 19, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 19, 2024
@liuyuan10
Copy link
Author

While this issue can be easily reproduced with node reboot, I believe it can also happen if device plugin suddenly becomes unhealthy after pod is scheduled and before kubelet allocates the resources.

The expected behavior is that the failure should be retried just like CNI failures.

@tamilselvan1102
Copy link

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 19, 2024
@SamDecrock
Copy link

SamDecrock commented Jun 19, 2024

I can confirm this issue. Luckily for us the pods are managed by Deployments and they are restarted. But we're still stuck with 'ghost' pods in an UnexpectedAdmissionError state.

@ffromani
Copy link
Contributor

ffromani commented Jun 20, 2024

What happened?

When a node is rebooted, pod using resources allocated by device plugin will encounter UnexpectedAdmissionError error as below:

  Warning  UnexpectedAdmissionError  84s                kubelet            Allocate failed due to no healthy devices present; cannot allocate unhealthy devices xxx, which is unexpected

Hi! This behavior wants to make evident and recoverable a previously hidden breakage when pod actually started, but noone allocated the device they requested: #109595

I don't know the mechanics of that specific GPU device plugin, but it seems likely to me that this problem can totally happen also for gpu devices. IOW, from what I gathered it was not device-specific but rather a flaw in how the devices are handled in kubelet, hence the flag.

What makes it really bad is if it's a raw pod, it stucks in such state and never recover.

This behavior can be surprising indeed, but when pod reach terminal state the system doesn't try to recover them, this is why the platform introduce and recommends higher level controllers like deployment

What did you expect to happen?

The pod should be retried until device plugin is ready

Yes, this is something we've being discussed since a while but there's unfortunately nothing concrete yet :\

@ffromani
Copy link
Contributor

ffromani commented Jun 20, 2024

/triage accepted

kubelet retries are being discussed since a while and there's general agreement it's a desirable behavior

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 20, 2024
@kannon92 kannon92 moved this to Triaged in SIG Node Bugs Jul 22, 2024
@neolit123
Copy link
Member

The behavior is introduced with #116376

/assign @swatisehgal
for feedback as the PR author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Triaged
Development

No branches or pull requests

7 participants