Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Observability] Disk usage in Dashboard #1152

Merged
merged 1 commit into from
Jun 13, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Jun 9, 2023

Why are these changes needed?

Related slack thread: https://ray-distributed.slack.com/archives/C01DLHZHRBJ/p1686170239143299

  • Without this PR, the values in the column Disk(root) are 0.0000B/1.0000B.
    Screen Shot 2023-06-08 at 1 30 04 PM

  • With this PR, everything works fine.
    Screen Shot 2023-06-08 at 5 43 37 PM

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421 kevin85421 changed the title WIP [Ray Observability] DIsk usage in Dashboard Jun 9, 2023
@kevin85421 kevin85421 changed the title [Ray Observability] DIsk usage in Dashboard [Ray Observability] Disk usage in Dashboard Jun 9, 2023
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Curious why this is opt-in instead of enabled by default (question for the Ray side I guess)

@kevin85421
Copy link
Member Author

@architkulkarni

I found ray-project/ray#24416 and ray-project/ray#14676.

For example, when running two Ray pods on one physical K8s node, the limit shown for each is the disk limit of the K8s node and the limit of the K8s node gets counted twice in the "totals" section.

CC: @rkooo567 @DmitriGekhtman, could you please provide some insight on whether enabling disk usage for Kubernetes Pods by default would be advisable or not? Thanks!

@kevin85421
Copy link
Member Author

#1152 (comment)

Also cc @scottsun94

@scottsun94
Copy link

IIUC, the problem with enabling it by default is: when multiple ray pods share the same k8s node, all these ray pods show the disk usage of that node. People might think that they are independent which might lead to confusion.

I will still vote for enabling it by default:

  • I believe the best practice is to have each ray pod on a separate k8s node? (then this issue won't exist)
  • not showing it makes it not useful at all
  • we can probably add some tooltips or other UI treatments for disk usage column to explain this?

@rkooo567
Copy link

rkooo567 commented Jun 9, 2023

Yeah I also think it is better enabling it, but we should have some tooltip or UI improvement to make sure people understand the problem. What @scottsun94 said is correct (disk usage in k8s is not set per pod, so the information could be misleading).

I believe the best practice is to have each ray pod on a separate k8s node? (then this issue won't exist)

I don't think we have such best practices IIRC (pr does kuberay use daemonset to achieve this? @kevin85421 ).

@kevin85421
Copy link
Member Author

Thank @scottsun94 @rkooo567 for the reply.

we can probably add some tooltips or other UI treatments for disk usage column to explain this?

Open an issue to track the progress ray-project/ray#36362.

I believe the best practice is to have each ray pod on a separate k8s node?

Yes, this is the best practice, but a lot of users do not follow it, so the issue is still there.

I don't think we have such best practices IIRC (pr does kuberay use daemonset to achieve this? @kevin85421 ).

No, KubeRay does not use DaemonSet.

@kevin85421 kevin85421 merged commit f652d5d into ray-project:master Jun 13, 2023
17 checks passed
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
[Ray Observability] Disk usage in Dashboard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants