Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete pdb, if all pods are not in a running state #27

Open
szuecs opened this issue Sep 23, 2019 · 2 comments
Open

delete pdb, if all pods are not in a running state #27

szuecs opened this issue Sep 23, 2019 · 2 comments

Comments

@szuecs
Copy link
Collaborator

szuecs commented Sep 23, 2019

We observed an issue, where prometheus statefulset with 2 replicas were in a not running state, crashing all the time.
In a discussion it turned out that there is probably a 5 minutes timeout before deleting the PDB.
The argument is, if all pods are crashing that match by a PDB, then you can safely delete the PDB to help with faster recovery.

@mikkeloscar
Copy link
Owner

We have a 5 minutes ttl defined here: https://github.com/zalando-incubator/kubernetes-on-aws/blob/89b380939fd34dcbc9af347a55c2f70e36755c70/cluster/manifests/prometheus/statefulset.yaml#L5 however, because of a bug (fixed in #28) this ttl was never actually effective.

With this bug fixed I suggest we try with the 5 minutes ttl and see how effective it is. We could also lower it a bit, but the reason we may not want to completely remove it is that we determine if a PDB should be removed by looking at pod ready state which may take a bit if the pods have a slow startup. We could ofc. also look at a more specific signal like crashloopbackoff but I would rather stay with the simple generic signal of PodReady state and a ttl unless we really need to have a very specific check.

WDYT?

@szuecs
Copy link
Collaborator Author

szuecs commented Sep 24, 2019

@mikkeloscar fine for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants