Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] [JobSubmissionClient] Ray Head to create Worker nodes without specifying any additional parameters #42436

Open
sercanCyberVision opened this issue Jan 16, 2024 · 2 comments
Labels
enhancement Request for new feature and/or capability job P2 Important issue, but not time-critical

Comments

@sercanCyberVision
Copy link

Description

We have been utilizing Ray-Client (port 10001) for job execution, and now are exploring the transition to Ray-Submit (port 8265). Our cluster is configured based on one of the best practices outlined here to skip the head node when scheduling workloads. To achieve this, we have set num-cpus: 0 in rayStartParams of headGroupSpec. This is an important feature for us to maintain the stability of the head, and also, we prefer only worker nodes to get involved in job executions.

In the current configuration, when we submit jobs with Ray-Client (10001), the behavior aligns with our expectations. The head node creates worker nodes, as evident in the dashboard, where all workloads are handled by worker nodes:

skipping-head-node

However, when employing the same configuration with Ray-Submit (8265), we observe a different behavior. The head node does not create worker nodes and attempts to handle the jobs by itself, unless we specify entrypoint_num_cpus parameter with JobSubmissionClient as below:

client = JobSubmissionClient(ray_address)
job_id = client.submit_job(
    entrypoint="python wrk_cpu.py",
    runtime_env={
        "working_dir": "./",
        # "excludes": ['']
    },
    entrypoint_num_cpus = 3
)

However, given the number of end-users, relying on each of them to pass this parameter is not ideal. We kindly request an enhancement to the basic JobSubmissionClient method in Ray-Submit, allowing it to skip the head node and schedule jobs on worker nodes without the need for explicit resource specification as in Ray-Client.

Thank you for your time and consideration.

Use case

No response

@sercanCyberVision sercanCyberVision added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 16, 2024
@sercanCyberVision
Copy link
Author

@architkulkarni, thank you and FYI.

@architkulkarni architkulkarni added P2 Important issue, but not time-critical job and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 17, 2024
@sip-aravind-g
Copy link

sip-aravind-g commented Jan 24, 2024

@architkulkarni
Any tentative idea when this can be implemented and available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability job P2 Important issue, but not time-critical
Projects
None yet
Development

No branches or pull requests

3 participants