Containers on GKE seem to request 100m CPU by default #1892

derekchiang · 2017-06-01T08:44:39Z

Running a Pachyderm cluster on GKE. For some reason, each container seems to be requesting 100m (0.1) CPU by default. Since each worker pod has two containers (one for the user code and another for the storage sidecar), each worker pod requests 0.2 CPUs, which seems a bit much. In my case I have 100+ worker pods and most of them are failing to schedule since my cluster only has 8 CPUs.

We should look into making the default CPU request smaller.

derekchiang · 2017-06-01T08:49:51Z

Can confirm that on Azure, the default CPU request is 0.

derekchiang · 2017-06-01T08:52:32Z

The bottom line is, since we don't set default resource requests, the actual resource requests used by worker pods are determined by the cloud provider, which is probably not what we want.

jdoliner · 2017-06-01T18:06:04Z

I think @msteffen saw something like this too when he was running tests. Can the answer here be as simple as just setting an explicit request to 0 by default rather than letting it override?

msteffen · 2017-06-05T22:37:42Z

we can certainly explicitly set the default resource request to 0. Two notes on that solution, though:

Explicitly setting GPU to 0 causes pods not to schedule (if the k8s node doesn't have any GPU. Turns out that requesting 0 of something that the node doesn't have is asking too much)
I saw this with kops. There's a k8s primitive called a Limit Range that can impose default resource reqeusts on pods (see https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-ram-container/ at the bottom: "Default limits are applied according to a limit range for the default namespace").

Given these, I think the default resource request will probably work (though it means that if any customers actually want to use a limit range, they can't).

Another thing that would probably work is using a non-default namespace.
A third thing that might work is kubectl deleteing the limit range that I imagine exists in new GKE clusters (this is what I did with Kops)

Finally, I think it might be worth asking ourselves (if not now, at some point) whether resource requests of 0 are actually what we want. I imagine the reason Kops and GKE started creating these limit ranges is because pods do consume resources, and if you don't account them you probably do get bitten eventually. Better scale-down logic might be safer in the long run.

derekchiang mentioned this issue Jun 5, 2017

Explicitly set default resource requests to 0 for pipeline containers #1910

Merged

derekchiang closed this as completed in #1910 Sep 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containers on GKE seem to request 100m CPU by default #1892

Containers on GKE seem to request 100m CPU by default #1892

derekchiang commented Jun 1, 2017 •

edited

derekchiang commented Jun 1, 2017

derekchiang commented Jun 1, 2017

jdoliner commented Jun 1, 2017

msteffen commented Jun 5, 2017 •

edited

Containers on GKE seem to request 100m CPU by default #1892

Containers on GKE seem to request 100m CPU by default #1892

Comments

derekchiang commented Jun 1, 2017 • edited

derekchiang commented Jun 1, 2017

derekchiang commented Jun 1, 2017

jdoliner commented Jun 1, 2017

msteffen commented Jun 5, 2017 • edited

derekchiang commented Jun 1, 2017 •

edited

msteffen commented Jun 5, 2017 •

edited