[Core] [JobSubmissionClient] Ray Head to create Worker nodes without specifying any additional parameters #42436

sercanCyberVision · 2024-01-16T23:11:12Z

Description

We have been utilizing Ray-Client (port 10001) for job execution, and now are exploring the transition to Ray-Submit (port 8265). Our cluster is configured based on one of the best practices outlined here to skip the head node when scheduling workloads. To achieve this, we have set num-cpus: 0 in rayStartParams of headGroupSpec. This is an important feature for us to maintain the stability of the head, and also, we prefer only worker nodes to get involved in job executions.

In the current configuration, when we submit jobs with Ray-Client (10001), the behavior aligns with our expectations. The head node creates worker nodes, as evident in the dashboard, where all workloads are handled by worker nodes:

However, when employing the same configuration with Ray-Submit (8265), we observe a different behavior. The head node does not create worker nodes and attempts to handle the jobs by itself, unless we specify entrypoint_num_cpus parameter with JobSubmissionClient as below:

client = JobSubmissionClient(ray_address)
job_id = client.submit_job(
    entrypoint="python wrk_cpu.py",
    runtime_env={
        "working_dir": "./",
        # "excludes": ['']
    },
    entrypoint_num_cpus = 3
)

However, given the number of end-users, relying on each of them to pass this parameter is not ideal. We kindly request an enhancement to the basic JobSubmissionClient method in Ray-Submit, allowing it to skip the head node and schedule jobs on worker nodes without the need for explicit resource specification as in Ray-Client.

Thank you for your time and consideration.

Use case

No response

The text was updated successfully, but these errors were encountered:

sercanCyberVision · 2024-01-16T23:23:40Z

@architkulkarni, thank you and FYI.

sip-aravind-g · 2024-01-24T19:05:33Z

@architkulkarni
Any tentative idea when this can be implemented and available?

sercanCyberVision added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 16, 2024

sercanCyberVision mentioned this issue Jan 16, 2024

[Core] [JobSubmissionClient] Ray Head does create Worker nodes when submitting a job with JobSubmissionClient #42432

Closed

architkulkarni added P2 Important issue, but not time-critical job and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] [JobSubmissionClient] Ray Head to create Worker nodes without specifying any additional parameters #42436

[Core] [JobSubmissionClient] Ray Head to create Worker nodes without specifying any additional parameters #42436

sercanCyberVision commented Jan 16, 2024

sercanCyberVision commented Jan 16, 2024

sip-aravind-g commented Jan 24, 2024 •

edited

[Core] [JobSubmissionClient] Ray Head to create Worker nodes without specifying any additional parameters #42436

[Core] [JobSubmissionClient] Ray Head to create Worker nodes without specifying any additional parameters #42436

Comments

sercanCyberVision commented Jan 16, 2024

Description

Use case

sercanCyberVision commented Jan 16, 2024

sip-aravind-g commented Jan 24, 2024 • edited

sip-aravind-g commented Jan 24, 2024 •

edited