Ignore failed pods for KubePodNotReady #70

gouthamve · 2018-08-28T10:39:22Z

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

Essentially is something is evicted, or exits with non-zero, it gets rescheduled. Now, the failed pod sticks around until --terminated-pod-gc-threshold.

The only exception to this rule is that Pods with a phase of Succeeded or Failed for more than some duration (determined by terminated-pod-gc-threshold in the master) will expire and be automatically destroyed

--terminated-pod-gc-threshold int32     Default: 12500
Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled.

This is causing us some alerts like:

* If a node flaps and comes back, pods are marked failed Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

gouthamve · 2018-08-28T10:39:45Z

@brancz @tomwilkie

brancz · 2018-08-28T12:52:26Z

Yeah we actually had a similar case with Jobs that were causing lots of Completed pods.

This lgtm 👍

Ignore failed pods for KubePodNotReady

093ecef

* If a node flaps and comes back, pods are marked failed Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

tomwilkie merged commit d964061 into kubernetes-monitoring:master Aug 28, 2018

brancz mentioned this pull request Aug 28, 2018

KubePodNotReady is being fired for pods with restartPolicy=RestartNever openshift/cluster-monitoring-operator#72

Closed

aslafy-z mentioned this pull request Jan 27, 2023

fix: revert failed pods marked as NotReady #821

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore failed pods for KubePodNotReady #70

Ignore failed pods for KubePodNotReady #70

gouthamve commented Aug 28, 2018

gouthamve commented Aug 28, 2018

brancz commented Aug 28, 2018

Ignore failed pods for KubePodNotReady #70

Ignore failed pods for KubePodNotReady #70

Conversation

gouthamve commented Aug 28, 2018

gouthamve commented Aug 28, 2018

brancz commented Aug 28, 2018