[Bug]: Container size not recognized when CPU limit is not set #1730

tkolo · 2023-09-01T12:42:02Z

Is there an existing issue for this?

I have searched the existing issues

Deploy type

Downstream version (eg. RHODS 1.29)

Version

RHODS 1.31

Current Behavior

When I modify worspace sizes to ones that include both requests but limit only memory usage, after selecting they're applied correctly but later not recognized by ODS dashboard

Expected Behavior

workspace sizes should be correctly recognized in dashboard

Steps To Reproduce

Install RHODS/OpenDataHub
In dashborad config, modify noetbook sizes to include sizes that do not limit CPU
Create a new data science project
Add workspace with the new size
Notice that on the project dashboard new workspace is now visible with unknown size

Workaround (if any)

Set CPU limits

What browsers are you seeing the problem on?

Firefox, Chrome, Microsoft Edge

Anything else

My reasoning behind not setting CPU limit comes from this blog post: https://home.robusta.dev/blog/stop-using-cpu-limits

The text was updated successfully, but these errors were encountered:

tkolo · 2023-09-01T12:42:55Z

Similar issues: #1513, #826

shalberd · 2023-09-13T13:52:53Z

Hmm, well not setting limits is lazy, you just shift the problem handling to the VM / Worker Node Level and allow containers to grab resources. Maybe you also have to differentiate here between CPU and Memory ...
Regarding the argument: "This is what happens when you have CPU limits. Resources are available but you aren't allowed to use them." you typically monitor average workloads with Prometheus, also set up alerts, and on that basis set your limit and request. Also: Resources are available for other containers / pods to use, which is of a lot of value in dynamic scheduling environments. Think pipeline task containers, for example.
Would be interesting to hear what others think of this.
About the dashboard bug: yes, good that is was fixed.

tkolo · 2023-09-13T18:30:55Z

I agree that memory should have set limits, mentioned article seems to agree on that point as well. Although this is also double-edge sword - I've noticed that often when the process attempts to exceed it's limit it's simply OOM Killed rather than just prevented from growing.

As for CPU, I'd say that whenever setting limits is helpful or not largely depends on cluster's use case. If it's a production cluster where some resources must be guaranteed from starvation and it's advisable to prevent any form of cpu starvation then sure, limits are probably a good idea. Especially since kubernetes will grant different QoS classes to pods depending on how requests/limits are set

In my use case however we're talking about bare-metal cluster that is exclusively dedicated for data analytics (hence usage of RHODS 🙂). In that case, as long as I can have some guarantees that the cluster won't starve it's core components to death (it's single master node cluster... for now), which CPU and memory reequests alone seem to provide - I don't care if my data analyst's notebook takes 6 cores or 126 cores. As long as it doesn't excessively affect other workloads and any excess resources are split evenly between loads that request them, it's all good.
That of course doesn't mean that proper monitoring is not needed, however in practice my main concern right now is resource overbooking rather than over-consumption. For example oauth-proxy in odh-dashboard deployment requests 1GB RAM and half a core of CPU, despite needing a fraction of these resources in practice. Why that particular deployment scales itself to 5 instances in my cluster is also beyond me, perhaps I'll fix both in future PR 🙂.

tkolo added kind/bug Something isn't working priority/normal An issue with the product; fix when possible untriaged Indicates the newly create issue has not been triaged yet labels Sep 1, 2023

tkolo mentioned this issue Sep 2, 2023

Fixed isCpuLimitEqual and isMemoryLimitEqual wrongly comparing null/undefined values. #1739

Merged

7 tasks

manaswinidas added feature/ds-projects Data Science Projects feature (formerly Data Science Groupings - DSG) and removed untriaged Indicates the newly create issue has not been triaged yet labels Sep 6, 2023

manaswinidas assigned manaswinidas and tkolo and unassigned manaswinidas Sep 6, 2023

openshift-merge-robot closed this as completed in #1739 Sep 13, 2023

andrewballantyne added the rhods-1.33 label Sep 13, 2023

shalberd mentioned this issue Sep 14, 2023

[Feature Request]: adjust cpu and memory request and limit sizes in manifests #1786

Open

spolti mentioned this issue Oct 19, 2023

[Snyk] Upgrade pino from 8.11.0 to 8.15.1 spolti/odh-dashboard#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Container size not recognized when CPU limit is not set #1730

[Bug]: Container size not recognized when CPU limit is not set #1730

tkolo commented Sep 1, 2023

tkolo commented Sep 1, 2023

shalberd commented Sep 13, 2023

tkolo commented Sep 13, 2023 •

edited

[Bug]: Container size not recognized when CPU limit is not set #1730

[Bug]: Container size not recognized when CPU limit is not set #1730

Comments

tkolo commented Sep 1, 2023

Is there an existing issue for this?

Deploy type

Version

Current Behavior

Expected Behavior

Steps To Reproduce

Workaround (if any)

What browsers are you seeing the problem on?

Anything else

tkolo commented Sep 1, 2023

shalberd commented Sep 13, 2023

tkolo commented Sep 13, 2023 • edited

tkolo commented Sep 13, 2023 •

edited