New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime env] Raise error when creating runtime env when ray[default] is not installed #19491
[runtime env] Raise error when creating runtime env when ray[default] is not installed #19491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raylet part looks good. Please ping when you've added ray client part.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
if not check_dashboard_dependencies_installed(): | ||
raise RuntimeError("Not all required Ray dependencies for the " | ||
"runtime_env feature were found on the " | ||
"cluster. To install the required " | ||
"dependencies, please run `pip install " | ||
"'ray[default]'` on all cluster nodes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you decided to go with the dependency check instead of passing a flag to the ray client server? This should be fine for now, my only concern is it isn't as future-proof. For example, @scv119 is working on adding a private flag to disable the agent explicitly even if ray[default] is installed, which this wouldn't work for.
I'm fine to merge this as-is for now but we should follow up to avoid confusing behavior when the flag is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, this approach just seemed simpler and less error-prone for a cherry pick. I have a branch with the other approach but it requires plumbing in a lot of places and there were some edge cases with default/missing args I wasn't 100% sure about. I'll make a followup PR.
Why are these changes needed?
Prior to this, Ray would hang when creating a runtime environment without the requisite
ray[default]
installed. This PR adds an error message in this case.There's two ways this can happen: the CreateRuntimeEnv RPC call on the RuntimeEnvAgent can either be made
In the Raylet, we check if the runtime env agent has been started, and fail the CreateRuntimeEnv call if it hasn't been started (and fail the corresponding task).
In the Ray Client server, we check if
ray[default]
is installed on the cluster and fail the request if it isn't.Related issue number
Closes #17845
Checks
scripts/format.sh
to lint the changes in this PR.