Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime env] Raise error when creating runtime env when ray[default] is not installed #19491

Merged
merged 7 commits into from Oct 19, 2021

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Oct 18, 2021

Why are these changes needed?

Prior to this, Ray would hang when creating a runtime environment without the requisite ray[default] installed. This PR adds an error message in this case.

There's two ways this can happen: the CreateRuntimeEnv RPC call on the RuntimeEnvAgent can either be made

  • from the Raylet (in the case of a per-task/actor runtime env), or
  • from the Ray Client server (in the case where we're specifying a runtime env in the Ray Client connection step.).

In the Raylet, we check if the runtime env agent has been started, and fail the CreateRuntimeEnv call if it hasn't been started (and fail the corresponding task).

In the Ray Client server, we check if ray[default] is installed on the cluster and fail the request if it isn't.

Related issue number

Closes #17845

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raylet part looks good. Please ping when you've added ray client part.

src/ray/raylet/agent_manager.cc Outdated Show resolved Hide resolved
src/ray/raylet/agent_manager.cc Outdated Show resolved Hide resolved
src/ray/raylet/worker_pool.h Show resolved Hide resolved
@simon-mo simon-mo mentioned this pull request Oct 19, 2021
5 tasks
Comment on lines +211 to +216
if not check_dashboard_dependencies_installed():
raise RuntimeError("Not all required Ray dependencies for the "
"runtime_env feature were found on the "
"cluster. To install the required "
"dependencies, please run `pip install "
"'ray[default]'` on all cluster nodes.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you decided to go with the dependency check instead of passing a flag to the ray client server? This should be fine for now, my only concern is it isn't as future-proof. For example, @scv119 is working on adding a private flag to disable the agent explicitly even if ray[default] is installed, which this wouldn't work for.

I'm fine to merge this as-is for now but we should follow up to avoid confusing behavior when the flag is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, this approach just seemed simpler and less error-prone for a cherry pick. I have a branch with the other approach but it requires plumbing in a lot of places and there were some edge cases with default/missing args I wasn't 100% sure about. I'll make a followup PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[runtime env] Runtime env hangs silently if "ray[default]" is not installed
2 participants