New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PodGC default threshold causes OOMs on small master machines. #28484
Labels
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Comments
wojtek-t
added
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
team/control-plane
labels
Jul 5, 2016
To clarify - the cluster was from 1.2.4 release. |
gmarek
changed the title
Failed pods from a job are not garbage collected.
PodGC default threshold causes OOMs on small master machines.
Jul 5, 2016
To be clear. The problem is that GC kicks in only after accumulating 12.5k Pods in the system and small (1, 2 core) master machines can't really handle this amount of load. |
ref/ #22680 |
Yeah. There's a flag for this already. |
@lavalamp - why did you close that? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
I've seen a cluster (from 1.2.4) where there is a job and the pods it is starting are simply crashing.
The problem is that those failed pods are not being garbage collected.
In the example cluster I've seen, we were able to produce ~13.000 failed pods over the week and it seems it none of these were ever garbage collected.
This basically results in constantly increasing memory usage of master components.
This seems like a bug to me.
@soltysh @gmarek @erictune @kubernetes/goog-control-plane @lavalamp @roberthbailey
The text was updated successfully, but these errors were encountered: