[core][autoscaler] Add Pod names to the output of ray status -v#51192
Conversation
ray status -vray status -v
ray status -vray status -v
| usage_by_node = {} | ||
| node_type_mapping = {} | ||
| idle_time_map = {} | ||
| def _node_usage_report( |
There was a problem hiding this comment.
cc @ryanaoleary I refactored this function a bit:
- Avoid passing the whole
ClusterStatusto the function to make it more unit testable. - It's not necessary to pass
verboseinto this function. - Rename dictionaries to
....to.....
|
cc @ryanaoleary @rueian for review |
| {'GPU': 2, 'CPU': 100}: 2+ from request_resources() | ||
|
|
||
| Node: fffffffffffffffffffffffffffffffffffffffffffffffffff00001 (head_node) | ||
| Node: instance1 (head_node) |
There was a problem hiding this comment.
My understanding is that instance1 here is the name of the node that the autoscaler provides (in the case of kuberay, it'd be the pod name)
Am I correct?
There was a problem hiding this comment.
It should be "instance id". In KubeRay, "instance id" is Pod name. You can see the screenshot I added in the PR description for more details.
There was a problem hiding this comment.
Am I correct?
For KubeRay, you are correct.
There was a problem hiding this comment.
(outside of the scope of this PR and a bit esoteric)
I think the terminology I used in the top-level comment is more general. That is, it's an abstraction leak to call it "instance ID" within the autoscaler because it is not an "instance" in all cases (e.g., it's a pod in Kubernetes). So using a more generic term such as "node name" would be preferable.
There was a problem hiding this comment.
Take the concrete example of the line being changed here.
With the terminology you used, it's: Node: <instance_id (but sometimes pod name)>
With my suggestion, it's: Node: <node name>
There was a problem hiding this comment.
it's an abstraction leak to call it "instance ID" within the autoscaler because it is not an "instance" in all cases
What's your definition of 'instance' here? Are you referring to a VM? In Autoscaler, an 'instance' is defined as the Ray node runner created by node providers.
https://docs.google.com/document/d/1NzQjA8Mh-oMc-QxXOa529oneWCoA8sDiVoNkBqqDb4U/edit?tab=t.0
Do you suggest to also allow node providers to set "instance name" which is possible to be different from "instance id" and ray status -v shows the "instance name" instead of "instance id"?
There was a problem hiding this comment.
I'm basically just suggesting that choosing the name "instance" in this chart was misguided :)
Thanks for posting that link it is quite helpful actually
…y-project#51192) 1. Currently, the output of `ray status -v` only includes information on node types (i.e., group names in KubeRay) and Ray node IDs. However, it is not easy to map a Ray node ID to the name of the corresponding Ray Pod (i.e. instance id in Autoscaler). <img width="496" alt="Screenshot 2025-03-08 at 11 50 43 PM" src="https://github.com/user-attachments/assets/89c66096-88c2-47fb-80d6-08067c7b9d90" /> 2. Refactor --------- Signed-off-by: kaihsun <kaihsun@anyscale.com>
…y-project#51192) 1. Currently, the output of `ray status -v` only includes information on node types (i.e., group names in KubeRay) and Ray node IDs. However, it is not easy to map a Ray node ID to the name of the corresponding Ray Pod (i.e. instance id in Autoscaler). <img width="496" alt="Screenshot 2025-03-08 at 11 50 43 PM" src="https://github.com/user-attachments/assets/89c66096-88c2-47fb-80d6-08067c7b9d90" /> 2. Refactor --------- Signed-off-by: kaihsun <kaihsun@anyscale.com> Signed-off-by: Dhakshin Suriakannu <d_suriakannu@apple.com>
Why are these changes needed?
ray status -vonly includes information on node types (i.e., group names in KubeRay) and Ray node IDs. However, it is not easy to map a Ray node ID to the name of the corresponding Ray Pod (i.e. instance id in Autoscaler).Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.