You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have been utilizing Ray-Client (port 10001) for job execution, and now are exploring the transition to Ray-Submit (port 8265). Our cluster is configured based on one of the best practices outlined here to skip the head node when scheduling workloads. To achieve this, we have set num-cpus: 0 in rayStartParams of headGroupSpec. This is an important feature for us to maintain the stability of the head, and also, we prefer only worker nodes to get involved in job executions.
In the current configuration, when we submit jobs with Ray-Client (10001), the behavior aligns with our expectations. The head node creates worker nodes, as evident in the dashboard, where all workloads are handled by worker nodes:
However, when employing the same configuration with Ray-Submit (8265), we observe a different behavior. The head node does not create worker nodes and attempts to handle the jobs by itself, unless we specify entrypoint_num_cpus parameter with JobSubmissionClient as below:
However, given the number of end-users, relying on each of them to pass this parameter is not ideal. We kindly request an enhancement to the basic JobSubmissionClient method in Ray-Submit, allowing it to skip the head node and schedule jobs on worker nodes without the need for explicit resource specification as in Ray-Client.
Thank you for your time and consideration.
Use case
No response
The text was updated successfully, but these errors were encountered:
Description
We have been utilizing Ray-Client (port 10001) for job execution, and now are exploring the transition to Ray-Submit (port 8265). Our cluster is configured based on one of the best practices outlined here to skip the head node when scheduling workloads. To achieve this, we have set
num-cpus: 0
inrayStartParams
ofheadGroupSpec
. This is an important feature for us to maintain the stability of the head, and also, we prefer only worker nodes to get involved in job executions.In the current configuration, when we submit jobs with Ray-Client (10001), the behavior aligns with our expectations. The head node creates worker nodes, as evident in the dashboard, where all workloads are handled by worker nodes:
However, when employing the same configuration with Ray-Submit (8265), we observe a different behavior. The head node does not create worker nodes and attempts to handle the jobs by itself, unless we specify
entrypoint_num_cpus
parameter withJobSubmissionClient
as below:However, given the number of end-users, relying on each of them to pass this parameter is not ideal. We kindly request an enhancement to the basic
JobSubmissionClient
method in Ray-Submit, allowing it to skip the head node and schedule jobs on worker nodes without the need for explicit resource specification as in Ray-Client.Thank you for your time and consideration.
Use case
No response
The text was updated successfully, but these errors were encountered: