Some pods are falsely evicted from the stopped node #9703

alita1991 · 2024-03-08T16:00:39Z

Environmental Info:
K3s Version: v1.27.5+k3s1

Node(s) CPU architecture, OS, and Version: Linux ip-10-190-34-107 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 3 servers

Describe the bug:

When a node is stopped using the Hypervisor interface, the otel-collector pod persists in the running state indefinitely. To initiate rescheduling, I must delete the pod. This action transitions the pod to the Terminating state, allowing it to be rescheduled eventually.

Steps To Reproduce:

Installed K3s
Installed several services that generate Kubernetes resources in the form of deployments
Powered off the node from the Hypervisor
Waited for the pods of type deployment to be evicted from the stopped node
Verified the status of the pods of type deployment to ensure they were all terminated on the stopped pod

Expected behavior:
The pod is expected to transition to the Terminating state, while a new pod should be scheduled on a healthy node.

Actual behavior:
The pod remains in the running state on a stopped node.

Additional context / logs:
14m Normal TaintManagerEviction pod/central-metrics-collector-5f5b6c599f-8gwpn Marking for deletion Pod k3s-loki/central-metrics-collector-5f5b6c599f-8gwpn, but is not happening (kubectl get events)

k3s-loki central-metrics-collector-5f5b6c599f-8gwpn 1/1 Running (kubectl get pods)

k3s-loki central-metrics-collector 0/1 (kubectl get deployments)

The text was updated successfully, but these errors were encountered:

brandond · 2024-03-08T16:05:23Z

Kubernetes cannot reason about pods on nodes that do not have a running kubelet. You may have deleted the pod, but Kubernetes does not actually know if it is terminated or not because the kubelet is not running to provide status updates. It may actually be running on a node suffering a network outage, it may be running but the Kubelet is stopped, it might not be running at all. Kubernetes has no way of knowing.

There will be no updates to the pod status until the node either comes back online, or the node is deleted and the pod is force-deleted as an orphan.

This is not a K3s issue, this is just how Kubernetes works. There are some discussions about tuning the apiserver and controller-manager to reduce internal node monitor intervals, which may aid in evicting pods from offline nodes, at #1264 (comment)

brandond closed this as completed Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some pods are falsely evicted from the stopped node #9703

Some pods are falsely evicted from the stopped node #9703

alita1991 commented Mar 8, 2024

brandond commented Mar 8, 2024 •

edited

Loading

Some pods are falsely evicted from the stopped node #9703

Some pods are falsely evicted from the stopped node #9703

Comments

alita1991 commented Mar 8, 2024

brandond commented Mar 8, 2024 • edited Loading

brandond commented Mar 8, 2024 •

edited

Loading