[Kubernetes] [Dashboard] Remove disk data from dashboard when running on K8s. #14676

DmitriGekhtman · 2021-03-15T01:43:17Z

Why are these changes needed?

The dashboard currently shows incorrect/confusing information in the Disk column when using Ray on K8s.
For example, when running two Ray pods on one physical K8s node, the limit shown for each is the disk limit of the K8s node
and the limit of the K8s node gets counted twice in the "totals" section.

This PR removes the Disk field from the Dashboard when running on Kubernetes -- the Ray dashboard is not the right place to get info on K8s node disk usage. By default K8s does not limit disk usable by a container.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Tested manually, see discussion below.

simon-mo · 2021-03-16T19:15:27Z

dashboard/client/src/pages/dashboard/node-info/NodeInfo.tsx

@@ -209,9 +209,11 @@ const NodeInfo: React.FC<{}> = () => {
  // Show GPU features only if there is at least one GPU in cluster.
  const showGPUs =
    nodes.map((n) => n.gpus).filter((gpus) => gpus.length !== 0).length !== 0;
+  // Don't show disk on K8s. K8s node disk usage should be monitored elsewhere.
+  const showDisk = !("KUBERNETES_SERVICE_HOST" in process.env);


this won't work because process.env is hard coded at dashboard build time, and this code will be ran in user's browser... in order to do this you have to make this variable in the dashboard backend (see reporting agent, @kathryn-zhou can help as well)

Thanks, that makes sense -- so basically my question is how do I get the variable IN_KUBERNETES_POD in reporter_agent.py to NodeInfo.tsx?
https://github.com/ray-project/ray/blob/master/dashboard/modules/reporter/reporter_agent.py#L29

-- by what code path does stuff exported by reporter_agent land in the frontend?

@edoakes do you know how this works?

@kathryn-zhou you have added metrics to this reporter agent and pop it up in the frontend before. Can you leave some pointers here?

I think what you want to do is have the environment variable check here and report a special number that you use to not display usage on the dashboard:
https://github.com/ray-project/ray/blob/master/dashboard/modules/reporter/reporter_agent.py#L238

e.g., if we report 0 used and 0 available we don't even render it

sounds reasonable, will try that

I haven't added metrics to the frontend before, but @DmitriGekhtman it might be worthwhile to look at how GPU metrics are shown on the dashboard. I believe they are only shown if there is a GPU in the cluster.

DmitriGekhtman · 2021-04-03T14:54:14Z

This PR is ready for review.

Disk field gone on K8s when running image with these changes:

Disk field still present on AWS when running image with these changes:

edoakes · 2021-04-05T15:00:34Z

@DmitriGekhtman looks good but lint is failing:
https://buildkite.com/ray-project/ray-builders-pr/builds/3760#f3712600-6600-458b-a523-ca3079d7d61e/412-519

Looks like you need to run Prettier to fmt the frontend code. @kathryn-zhou I think you ran into this recently?

DmitriGekhtman · 2021-04-05T23:04:09Z

Lint fixed, good to merge.

…24416) #14676 disabled the disk usage/total display for Ray nodes on K8s, because Ray nodes on K8s are run as pods, which in general do not use up the entire machine. However, in some situations, it is useful to run one Ray pod per K8s node and report the disk usage. This PR adds a flag to enable displaying disk usage in those situations.

…ay-project#24416) ray-project#14676 disabled the disk usage/total display for Ray nodes on K8s, because Ray nodes on K8s are run as pods, which in general do not use up the entire machine. However, in some situations, it is useful to run one Ray pod per K8s node and report the disk usage. This PR adds a flag to enable displaying disk usage in those situations.

DmitriGekhtman force-pushed the k8s-dashboard-turn-off-disk branch from 299e36f to ddd2b41 Compare March 16, 2021 16:52

DmitriGekhtman assigned rkooo567 and simon-mo Mar 16, 2021

simon-mo reviewed Mar 16, 2021

View reviewed changes

DmitriGekhtman assigned kathryn-zhou and edoakes Mar 16, 2021

rkooo567 removed their assignment Mar 18, 2021

DmitriGekhtman force-pushed the k8s-dashboard-turn-off-disk branch 2 times, most recently from 2dcccfa to ac91c90 Compare April 3, 2021 14:45

DmitriGekhtman changed the title ~~[Draft] [Kubernetes] [Dashboard] Remove disk data from dashboard when running on K8s.~~ [Kubernetes] [Dashboard] Remove disk data from dashboard when running on K8s. Apr 3, 2021

AmeerHajAli added this to the Serverless Autoscaling milestone Apr 4, 2021

edoakes approved these changes Apr 5, 2021

View reviewed changes

DmitriGekhtman added 6 commits April 5, 2021 13:00

attempt #2

98f6d31

Fix showDisk logic

f6a361f

K8s disk dummy

a536f1a

Type fix

ca55f74

That works

b7b79d5

lint

448a78d

DmitriGekhtman force-pushed the k8s-dashboard-turn-off-disk branch from ac91c90 to 448a78d Compare April 5, 2021 20:00

simon-mo merged commit 410f768 into ray-project:master Apr 6, 2021

DmitriGekhtman mentioned this pull request May 3, 2022

[Dashboard][K8s] Add toggle to enable showing node disk usage on K8s #24416

Merged

6 tasks

DmitriGekhtman mentioned this pull request May 3, 2022

[Release branch PR][Dashboard][K8s] Add toggle to enable showing node disk usage on K8s … #24440

Merged

6 tasks

kevin85421 mentioned this pull request Jun 9, 2023

[Ray Observability] Disk usage in Dashboard ray-project/kuberay#1152

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kubernetes] [Dashboard] Remove disk data from dashboard when running on K8s. #14676

[Kubernetes] [Dashboard] Remove disk data from dashboard when running on K8s. #14676

DmitriGekhtman commented Mar 15, 2021 •

edited

Loading

simon-mo Mar 16, 2021

DmitriGekhtman Mar 16, 2021 •

edited

Loading

DmitriGekhtman Mar 18, 2021

simon-mo Mar 18, 2021

edoakes Mar 18, 2021

edoakes Mar 18, 2021

DmitriGekhtman Mar 18, 2021

kathryn-zhou Mar 18, 2021

DmitriGekhtman commented Apr 3, 2021

edoakes commented Apr 5, 2021

DmitriGekhtman commented Apr 5, 2021

[Kubernetes] [Dashboard] Remove disk data from dashboard when running on K8s. #14676

[Kubernetes] [Dashboard] Remove disk data from dashboard when running on K8s. #14676

Conversation

DmitriGekhtman commented Mar 15, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

simon-mo Mar 16, 2021

Choose a reason for hiding this comment

DmitriGekhtman Mar 16, 2021 • edited Loading

Choose a reason for hiding this comment

DmitriGekhtman Mar 18, 2021

Choose a reason for hiding this comment

simon-mo Mar 18, 2021

Choose a reason for hiding this comment

edoakes Mar 18, 2021

Choose a reason for hiding this comment

edoakes Mar 18, 2021

Choose a reason for hiding this comment

DmitriGekhtman Mar 18, 2021

Choose a reason for hiding this comment

kathryn-zhou Mar 18, 2021

Choose a reason for hiding this comment

DmitriGekhtman commented Apr 3, 2021

edoakes commented Apr 5, 2021

DmitriGekhtman commented Apr 5, 2021

DmitriGekhtman commented Mar 15, 2021 •

edited

Loading

DmitriGekhtman Mar 16, 2021 •

edited

Loading