Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ray-cluster.without-block.yaml #675

Merged

Conversation

kevin85421
Copy link
Member

Why are these changes needed?

We did not encourage users to run ray start without --block.

  1. Without --block, we need to append sleep infinity to the end of the ray start command to keep the container running.
  2. With --block, when the ray process crashes, the KubeRay operator can detect the unhealthy condition in a short time because the container will exit immediately. Without --block, the unhealthy condition can still be detected by both readiness and liveness probes, but it may take more time to detect it.

Note for those who are still interested in ray-cluster.without-block.yaml

There are two bugs in ray-cluster.without-block.yaml detected by the configuration test framework #605. See the change of ray-cluster.without-block.yaml in kevin85421@04bdd77 to fix the bugs.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421
Copy link
Member Author

  1. Without --block, we need to append sleep infinity to the end of the ray start command to keep the container running.
  2. With --block, when the ray process crashes, the KubeRay operator can detect the unhealthy condition in a short time because the container will exit immediately. Without --block, the unhealthy condition can still be detected by both readiness and liveness probes, but it may take more time to detect it.

@DmitriGekhtman Hope to double-check whether I have any misunderstanding about the reasons why we did not encourage users to run ray start without --block. Thank you!

@DmitriGekhtman
Copy link
Collaborator

Your understanding is correct.
We prefer to separate the process of deploying a Ray cluster (kubectl apply -f raycluster.yaml) and submitting work (e.g. ray job submit stuff.py)

For users who need custom entry-points, we had another discussion. The conclusion was that custom entrypoints should be supported in the obvious way (if an entrypoint is specified, honor it, otherwise format the relevant ray start command).

@kevin85421
Copy link
Member Author

Your understanding is correct. We prefer to separate the process of deploying a Ray cluster (kubectl apply -f raycluster.yaml) and submitting work (e.g. ray job submit stuff.py)

For users who need custom entry-points, we had another discussion. The conclusion was that custom entrypoints should be supported in the obvious way (if an entrypoint is specified, honor it, otherwise format the relevant ray start command).

Got it. Thank you!

@DmitriGekhtman DmitriGekhtman merged commit 7850773 into ray-project:master Nov 3, 2022
@DmitriGekhtman
Copy link
Collaborator

Later, we can consider inject the "block" automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants