Don't set default resource requests#2034
Conversation
- These are pretty big requests for the work our pods do, based on experience at berkeley. Setting high requests means our users have to get more nodes by default, which wastes resources & money. - It becomes difficult to actually unset these! I've ended up doing things like resources.requests.cpu=1 to have similar effects. - Most upstream charts do *not* set requests, so we should do the same.
|
Most charts I've seen does indeed not set the resources requests explicitly, but I think we should provide guiding default values in comments in values.yaml like I've seen many other charts if we opt to do so as well. I recall and perceive it as a good thing if we set the dummy containers for the image-puller to zero or some minimal value though, because that should be independent of the deployment's scale I think and can help without any clear drawback in my mind. |
| resources: | ||
| requests: | ||
| cpu: 0 | ||
| memory: 0 | ||
| resources: {} |
There was a problem hiding this comment.
I think I consider this a bugfix actually, the hook-image-awaiter job run's a go program speaking with k8s api-server, it doesn't make sense for this to not request any resources, but it would make sense for it to request a very very low amount of resources during the brief time it lives I think. It should not need to depend on the deployment either.
I think I prefer that we provide defaults resource requests for situations when it doesn't matter if the deployment have 10 or 5000 users or 1 or 100 nodes.
|
Agree re: providing suggested defaults in comments and docs. I've data in grafana for hubs with lots of users and hubs with very few users! I can dig that up.
Can you expand a little bit more on what you mean by this? |
|
Just about to enter a meeting @yuvipanda but in #1764 we discussed setting explicit resources requests for the pause pods and such I think, and I think setting those explicitly to 0 makes a lot of sense still. I think the general principle of when I think we should opt for setting resource requests can be generalized to a principle. The principle would be: whenever the same resource request is a likely good fit for everyone, we set the resource requests. I think this principle holds for the image puller pods and the image-awaiter-job pod, but fail for all other pods which to some degree need more resource requests based on usage. |
|
I also can't actually unset any limits at all if they're set by default. Neither of the following do anything: This makes it impossible to change the quality of service of these pods. I want them to be BestEffort, but that's not doable now. |
|
Here's a graph of memory usage of many hubs: The smallest is around ~128MB, and largest close to 3.5G. Note that this is RSS - I don't actually think this is the exact thing memory limits give us. Similar graph for hub CPU: Most of them are around 0. CHP RSS: CHP CPU: User Scheduler CPU: User Scheduler RAM: Traefik RSS: Traefik CPU: |
|
I see the problem, zero-to-jupyterhub-k8s/jupyterhub/values.yaml Lines 86 to 89 in c3f8014 gets merged with the empty dict i.e. it's unchanged. |
By setting |
|
Thanks for the find and the fix, @consideRatio / @manics! |
|
I think this gets back to z2jh's role as: opinionated, good-practices default, vs generic chart for jupyterhub. As I understand it, we don't want to require all 'good' JupyterHub deployments to require loads of the same config. So it makes sense to clear these if we think most folks shouldn't set any resource reservations for these pods. It makes sense to keep the subset, if any, where we think most folks will have a better experience with nonzero values. Balanced, of course, with the downsides @yuvipanda mentioned that unsetting these is a bit of a pain. But it does seem that most if not all of our pods require little enough resources that clearing them by default. It might be good to figure out a scale when resource reservations start being a good idea, and add that to the docs. For instance, I think the GKE deployment on mybinder.org can serve as an upper bound for most of these things: https://grafana.mybinder.org/d/GYEYQm7ik/components-resource-metrics . Very few deployments will deal with more traffic than that one, for both the hub and proxy (sustained 10-30 launches/minute). Even in that deployment, JupyterHub rarely exceeds 100m CPU. CHP, on the other hand, is pegged at ~40% which is a lot (and I think probably means there's something for us to offload...)! |
|
Fun fact! I read through the QoS docs and they appear to never mention nor even link to another doc that explains what any of the QoS classes might mean. The closest it gets is:
So I still have no idea how any of the QoS classes are treated differently by the schedulers. |
I've been on this journey as well! 😱 |
jupyterhub/zero-to-jupyterhub-k8s#2034 Merge pull request #2034 from yuvipanda/no-requests
|
Great! I opened #2085 for documenting resource requests and collecting data about what good ranges might be. |
|
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/suggested-cpu-memory-resource-requests-limits/9322/1 |








pods do, based on experience at berkeley. Setting
high requests means our users have to get more
nodes by default, which wastes resources & money.
ended up doing things like resources.requests.cpu=1
to have similar effects.
should do the same.