[core][autoscaler] Add Pod names to the output of `ray status -v` by kevin85421 · Pull Request #51192 · ray-project/ray

kevin85421 · 2025-03-09T07:55:25Z

Why are these changes needed?

Currently, the output of ray status -v only includes information on node types (i.e., group names in KubeRay) and Ray node IDs. However, it is not easy to map a Ray node ID to the name of the corresponding Ray Pod (i.e. instance id in Autoscaler).

Refactor

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 · 2025-03-09T08:03:24Z

-        usage_by_node = {}
-        node_type_mapping = {}
-        idle_time_map = {}
+    def _node_usage_report(


cc @ryanaoleary I refactored this function a bit:

Avoid passing the whole ClusterStatus to the function to make it more unit testable.

It's not necessary to pass verbose into this function.

Rename dictionaries to ....to.....

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 · 2025-03-09T08:06:50Z

cc @ryanaoleary @rueian for review

edoakes · 2025-03-10T12:59:35Z

 {'GPU': 2, 'CPU': 100}: 2+ from request_resources()

-Node: fffffffffffffffffffffffffffffffffffffffffffffffffff00001 (head_node)
+Node: instance1 (head_node)


My understanding is that instance1 here is the name of the node that the autoscaler provides (in the case of kuberay, it'd be the pod name)

Am I correct?

It should be "instance id". In KubeRay, "instance id" is Pod name. You can see the screenshot I added in the PR description for more details.

Am I correct?

For KubeRay, you are correct.

(outside of the scope of this PR and a bit esoteric)

I think the terminology I used in the top-level comment is more general. That is, it's an abstraction leak to call it "instance ID" within the autoscaler because it is not an "instance" in all cases (e.g., it's a pod in Kubernetes). So using a more generic term such as "node name" would be preferable.

Take the concrete example of the line being changed here.

With the terminology you used, it's: Node: <instance_id (but sometimes pod name)>

With my suggestion, it's: Node: <node name>

it's an abstraction leak to call it "instance ID" within the autoscaler because it is not an "instance" in all cases

What's your definition of 'instance' here? Are you referring to a VM? In Autoscaler, an 'instance' is defined as the Ray node runner created by node providers.

https://docs.google.com/document/d/1NzQjA8Mh-oMc-QxXOa529oneWCoA8sDiVoNkBqqDb4U/edit?tab=t.0

Do you suggest to also allow node providers to set "instance name" which is possible to be different from "instance id" and ray status -v shows the "instance name" instead of "instance id"?

I'm basically just suggesting that choosing the name "instance" in this chart was misguided :)

Thanks for posting that link it is quite helpful actually

Signed-off-by: kaihsun <kaihsun@anyscale.com>

…y-project#51192) 1. Currently, the output of `ray status -v` only includes information on node types (i.e., group names in KubeRay) and Ray node IDs. However, it is not easy to map a Ray node ID to the name of the corresponding Ray Pod (i.e. instance id in Autoscaler). <img width="496" alt="Screenshot 2025-03-08 at 11 50 43 PM" src="https://github.com/user-attachments/assets/89c66096-88c2-47fb-80d6-08067c7b9d90" /> 2. Refactor --------- Signed-off-by: kaihsun <kaihsun@anyscale.com>

…y-project#51192) 1. Currently, the output of `ray status -v` only includes information on node types (i.e., group names in KubeRay) and Ray node IDs. However, it is not easy to map a Ray node ID to the name of the corresponding Ray Pod (i.e. instance id in Autoscaler). <img width="496" alt="Screenshot 2025-03-08 at 11 50 43 PM" src="https://github.com/user-attachments/assets/89c66096-88c2-47fb-80d6-08067c7b9d90" /> 2. Refactor --------- Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: Dhakshin Suriakannu <d_suriakannu@apple.com>

update

aecf561

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 changed the title ~~[core][autoscaler v2] Add instance_id (Pod name) to the output of ray status -v~~ [core][autoscaler v2] Add Pod names to the output of ray status -v Mar 9, 2025

kevin85421 changed the title ~~[core][autoscaler v2] Add Pod names to the output of ray status -v~~ [core][autoscaler] Add Pod names to the output of ray status -v Mar 9, 2025

kevin85421 commented Mar 9, 2025

View reviewed changes

update

e6af3b5

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 added the go add ONLY when ready to merge, run all tests label Mar 9, 2025

kevin85421 marked this pull request as ready for review March 9, 2025 17:42

kevin85421 requested a review from a team as a code owner March 9, 2025 17:42

rueian approved these changes Mar 9, 2025

View reviewed changes

kevin85421 assigned jjyao Mar 9, 2025

edoakes reviewed Mar 10, 2025

View reviewed changes

update

34f111f

Signed-off-by: kaihsun <kaihsun@anyscale.com>

edoakes approved these changes Mar 10, 2025

View reviewed changes

edoakes enabled auto-merge (squash) March 10, 2025 16:29

edoakes merged commit 902b55a into ray-project:master Mar 10, 2025

kevin85421 mentioned this pull request Mar 18, 2025

[Umbrella] Autoscaler V2 ray-project/kuberay#2600

Open

51 tasks

hainesmichaelc added the community-backlog label May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][autoscaler] Add Pod names to the output of `ray status -v`#51192

[core][autoscaler] Add Pod names to the output of `ray status -v`#51192
edoakes merged 3 commits into
ray-project:masterfrom
kevin85421:20250308-devbox1-tmux-6-ray2

kevin85421 commented Mar 9, 2025 •

edited

Loading

Uh oh!

kevin85421 Mar 9, 2025

Uh oh!

kevin85421 commented Mar 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edoakes Mar 10, 2025

Uh oh!

kevin85421 Mar 10, 2025

Uh oh!

kevin85421 Mar 10, 2025

Uh oh!

edoakes Mar 10, 2025

Uh oh!

edoakes Mar 10, 2025

Uh oh!

kevin85421 Mar 10, 2025

Uh oh!

edoakes Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kevin85421 commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin85421 commented Mar 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kevin85421 commented Mar 9, 2025 •

edited

Loading