Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PodGC default threshold causes OOMs on small master machines. #28484

Closed
wojtek-t opened this issue Jul 5, 2016 · 6 comments
Closed

PodGC default threshold causes OOMs on small master machines. #28484

wojtek-t opened this issue Jul 5, 2016 · 6 comments
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@wojtek-t
Copy link
Member

wojtek-t commented Jul 5, 2016

I've seen a cluster (from 1.2.4) where there is a job and the pods it is starting are simply crashing.
The problem is that those failed pods are not being garbage collected.

In the example cluster I've seen, we were able to produce ~13.000 failed pods over the week and it seems it none of these were ever garbage collected.

This basically results in constantly increasing memory usage of master components.
This seems like a bug to me.

@soltysh @gmarek @erictune @kubernetes/goog-control-plane @lavalamp @roberthbailey

@wojtek-t wojtek-t added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/control-plane labels Jul 5, 2016
@wojtek-t
Copy link
Member Author

wojtek-t commented Jul 5, 2016

To clarify - the cluster was from 1.2.4 release.

@gmarek gmarek changed the title Failed pods from a job are not garbage collected. PodGC default threshold causes OOMs on small master machines. Jul 5, 2016
@gmarek
Copy link
Contributor

gmarek commented Jul 5, 2016

To be clear. The problem is that GC kicks in only after accumulating 12.5k Pods in the system and small (1, 2 core) master machines can't really handle this amount of load.

@davidopp
Copy link
Member

davidopp commented Jul 5, 2016

ref/ #22680

@lavalamp
Copy link
Member

lavalamp commented Jul 6, 2016

Yeah. There's a flag for this already.

@lavalamp lavalamp closed this as completed Jul 6, 2016
@wojtek-t
Copy link
Member Author

wojtek-t commented Jul 6, 2016

@lavalamp - why did you close that?
Even though the flag exists, it's not used and this is a real problem.

@wojtek-t wojtek-t reopened this Jul 6, 2016
@lavalamp
Copy link
Member

lavalamp commented Jul 6, 2016

@wojtek-t because this is a dup-- we already have #22680 and #25831 filed about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

4 participants