[dashboard][kubernetes] Show container's memory info on K8s, not the physical host's. #14499

DmitriGekhtman · 2021-03-05T02:49:52Z

Why are these changes needed?

Shows container's memory info in the dashboard when using Kubernetes, not the physical host's.
K8s dashboard situation looks a little better:

(1 head, 2 min_workers all with 512Mi memory limit)

Related issue number

Addresses memory subproblem of
#11172

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

wuisawesome

Why do we need the if statement? Shouldn't ray.utils.get_system_memory() already do all these checks?

If we have a good reason for deviating from how the core measures memory usage, we should carefully document it, especially now that we're auto-reporting memory resources within ray.available_resources().

Btw the issue also mentions CPU usage. Should this PR also include that?

DmitriGekhtman · 2021-03-05T03:06:26Z

If we have a good reason for deviating from how the core measures memory usage, we should carefully document it, especially now that we're auto-reporting memory resources within ray.available_resources().

I think psutil's definition of percent is different, but sure I'll get rid of the if statement.
I think with this change the percent reported for non-docker set-ups will be slightly lower.

I decided not to deal with CPU right now, as it seems slightly more subtle -- unless you (or anyone else) have a quick fix idea!

wuisawesome · 2021-03-05T03:08:52Z

ray.utils.get_num_cpus()? Or is the issue that you want to measure usage instead of the scheduler allocations?

DmitriGekhtman · 2021-03-05T03:17:54Z

ray.utils.get_num_cpus()? Or is the issue that you want to measure usage instead of the scheduler allocations?

When I run on minikube with 8 cpus allocated to minikube, attach to the head pod, and run ray.utils.get_num_cpus(), I get 8 instead of the 1 cpu request/limit I've assigned the pod.

DmitriGekhtman · 2021-03-05T03:18:47Z

We do want accurate usage and percent for CPU too.

wuisawesome · 2021-03-05T03:57:47Z

Isn't that a pretty big issue since it means Ray will incorrectly set num_cpus? Do we have an issue tracking that?

DmitriGekhtman · 2021-03-05T04:00:08Z

Isn't that a pretty big issue since it means Ray will incorrectly set num_cpus? Do we have an issue tracking that?

KubernetesNodeProvider reads from the pod spec: https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/kubernetes/config.py#L93

DmitriGekhtman · 2021-03-08T16:22:19Z

ray.utils.get_num_cpus()? Or is the issue that you want to measure usage instead of the scheduler allocations?

When I run on minikube with 8 cpus allocated to minikube, attach to the head pod, and run ray.utils.get_num_cpus(), I get 8 instead of the 1 cpu request/limit I've assigned the pod.

After looking into it more carefully -- it reads 8 because it does indeed use all 8 CPUs available to it. But it uses at most x CPU's worth of cycles per unit time where x is the limit specified in the pod spec.

DmitriGekhtman · 2021-03-08T16:24:38Z

I think the change in this PR should be pretty uncontroversial.

CPUs I've started looking at -- I'd like to isolate those in a different PR.

Got rid of the if, deferred CPU to another PR.

wuisawesome

Dealing with CPUs in a separate PR sounds good to me

DmitriGekhtman added 13 commits February 19, 2021 14:31

random doc typo

1c6fe4d

Merge branch 'master' of https://github.com/ray-project/ray

e3074c7

Merge branch 'master' of https://github.com/ray-project/ray

d46e772

Merge branch 'master' of https://github.com/ray-project/ray

33a62ef

Merge branch 'master' of https://github.com/ray-project/ray

86be6f6

Merge branch 'master' of https://github.com/ray-project/ray

72a9c88

Merge branch 'master' of https://github.com/ray-project/ray

e63c3b5

Merge branch 'master' of https://github.com/ray-project/ray

02bb712

Merge branch 'master' of https://github.com/ray-project/ray

4faed67

Merge branch 'master' of https://github.com/ray-project/ray

5a6036e

Merge branch 'master' of https://github.com/ray-project/ray

43fdcf6

Merge branch 'master' of https://github.com/ray-project/ray

2c27c8f

more reasonable memory output

7ce5ec4

DmitriGekhtman assigned wuisawesome and richardliaw Mar 5, 2021

DmitriGekhtman requested review from wuisawesome and richardliaw March 5, 2021 02:53

wuisawesome previously requested changes Mar 5, 2021

View reviewed changes

no if

cab8ad0

get rid of comment

eb10a1d

DmitriGekhtman assigned simon-mo and rkooo567 Mar 8, 2021

DmitriGekhtman assigned kathryn-zhou Mar 8, 2021

wuisawesome approved these changes Mar 9, 2021

View reviewed changes

wuisawesome merged commit 4a7d9e7 into ray-project:master Mar 9, 2021

DmitriGekhtman deleted the k8s-dashboard-memory branch March 18, 2021 23:23

AmeerHajAli added this to the Serverless Autoscaling milestone Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dashboard][kubernetes] Show container's memory info on K8s, not the physical host's. #14499

[dashboard][kubernetes] Show container's memory info on K8s, not the physical host's. #14499

DmitriGekhtman commented Mar 5, 2021 •

edited

wuisawesome left a comment

DmitriGekhtman commented Mar 5, 2021

wuisawesome commented Mar 5, 2021

DmitriGekhtman commented Mar 5, 2021

DmitriGekhtman commented Mar 5, 2021

wuisawesome commented Mar 5, 2021

DmitriGekhtman commented Mar 5, 2021

DmitriGekhtman commented Mar 8, 2021

DmitriGekhtman commented Mar 8, 2021

wuisawesome left a comment

[dashboard][kubernetes] Show container's memory info on K8s, not the physical host's. #14499

[dashboard][kubernetes] Show container's memory info on K8s, not the physical host's. #14499

Conversation

DmitriGekhtman commented Mar 5, 2021 • edited

Why are these changes needed?

Related issue number

Checks

wuisawesome left a comment

Choose a reason for hiding this comment

DmitriGekhtman commented Mar 5, 2021

wuisawesome commented Mar 5, 2021

DmitriGekhtman commented Mar 5, 2021

DmitriGekhtman commented Mar 5, 2021

wuisawesome commented Mar 5, 2021

DmitriGekhtman commented Mar 5, 2021

DmitriGekhtman commented Mar 8, 2021

DmitriGekhtman commented Mar 8, 2021

wuisawesome left a comment

Choose a reason for hiding this comment

DmitriGekhtman commented Mar 5, 2021 •

edited