You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Node(s) CPU architecture, OS, and Version: Linux ip-10-190-34-107 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 3 servers
Describe the bug:
When a node is stopped using the Hypervisor interface, the otel-collector pod persists in the running state indefinitely. To initiate rescheduling, I must delete the pod. This action transitions the pod to the Terminating state, allowing it to be rescheduled eventually.
Steps To Reproduce:
Installed K3s
Installed several services that generate Kubernetes resources in the form of deployments
Powered off the node from the Hypervisor
Waited for the pods of type deployment to be evicted from the stopped node
Verified the status of the pods of type deployment to ensure they were all terminated on the stopped pod
Expected behavior:
The pod is expected to transition to the Terminating state, while a new pod should be scheduled on a healthy node.
Actual behavior:
The pod remains in the running state on a stopped node.
Additional context / logs: 14m Normal TaintManagerEviction pod/central-metrics-collector-5f5b6c599f-8gwpn Marking for deletion Pod k3s-loki/central-metrics-collector-5f5b6c599f-8gwpn, but is not happening (kubectl get events)
k3s-loki central-metrics-collector-5f5b6c599f-8gwpn 1/1 Running (kubectl get pods)
k3s-loki central-metrics-collector 0/1 (kubectl get deployments)
The text was updated successfully, but these errors were encountered:
Kubernetes cannot reason about pods on nodes that do not have a running kubelet. You may have deleted the pod, but Kubernetes does not actually know if it is terminated or not because the kubelet is not running to provide status updates. It may actually be running on a node suffering a network outage, it may be running but the Kubelet is stopped, it might not be running at all. Kubernetes has no way of knowing.
There will be no updates to the pod status until the node either comes back online, or the node is deleted and the pod is force-deleted as an orphan.
This is not a K3s issue, this is just how Kubernetes works. There are some discussions about tuning the apiserver and controller-manager to reduce internal node monitor intervals, which may aid in evicting pods from offline nodes, at #1264 (comment)
Environmental Info:
K3s Version: v1.27.5+k3s1
Node(s) CPU architecture, OS, and Version: Linux ip-10-190-34-107 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 3 servers
Describe the bug:
When a node is stopped using the Hypervisor interface, the otel-collector pod persists in the running state indefinitely. To initiate rescheduling, I must delete the pod. This action transitions the pod to the Terminating state, allowing it to be rescheduled eventually.
Steps To Reproduce:
Expected behavior:
The pod is expected to transition to the Terminating state, while a new pod should be scheduled on a healthy node.
Actual behavior:
The pod remains in the running state on a stopped node.
Additional context / logs:
14m Normal TaintManagerEviction pod/central-metrics-collector-5f5b6c599f-8gwpn Marking for deletion Pod k3s-loki/central-metrics-collector-5f5b6c599f-8gwpn, but is not happening
(kubectl get events)k3s-loki central-metrics-collector-5f5b6c599f-8gwpn 1/1 Running
(kubectl get pods)k3s-loki central-metrics-collector 0/1
(kubectl get deployments)The text was updated successfully, but these errors were encountered: