Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore failed pods for KubePodNotReady #70

Merged

Conversation

gouthamve
Copy link
Contributor

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

Essentially is something is evicted, or exits with non-zero, it gets rescheduled. Now, the failed pod sticks around until --terminated-pod-gc-threshold.

The only exception to this rule is that Pods with a phase of Succeeded or Failed for more than some duration (determined by terminated-pod-gc-threshold in the master) will expire and be automatically destroyed

--terminated-pod-gc-threshold int32     Default: 12500
Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled.

This is causing us some alerts like:
screen shot 2018-08-28 at 16 06 27

* If a node flaps and comes back, pods are marked failed

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
@gouthamve
Copy link
Contributor Author

@brancz @tomwilkie

@brancz
Copy link
Member

brancz commented Aug 28, 2018

Yeah we actually had a similar case with Jobs that were causing lots of Completed pods.

This lgtm 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants