-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Add a utility to check GCS / Ray cluster health #23382
Conversation
Just curious, but how would this be different from Line 60 in f91a134
|
They look very similar. The only difference is that |
I don't have a strong opinion what to use, but I think we should unify to one (seems like both are for health checking the cluster). cc @wuisawesome do you see any downsides of just using this util? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please please add this functionality to ray healthcheck
(which uses CheckAliveRequest
). We shouldn't let this type of functionality deviate between components
How about using |
I thought about the use case and only verifying Ray versions match should be sufficient. |
Ok can we change the api name to make it clear that this is not a healthcheck or way to ping the cluster? some form of cluster metadata that we can expose through the dashboard api seems reasonable. |
@wuisawesome which API are you referring to, |
I will leave it up to @wuisawesome also cc @jon-chuang |
If you're trying to add a cluster metadata api and not a health check, I'm saying you should name it Btw it's also still not clear to me why you don't just add this in the health check. The main reason to separate them would be if we have some big metadata/expensive query that we didn't want to run in a health check (which we don't). |
Actually I'm using the |
The plan is to add a public health check API in a subsequent PR. |
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Why are these changes needed?
ray.init()
. This utility is integrated withray memory
to provide a better error message when the Ray cluster is unavailable. There seem to be user demand for exposing this as an API as well.Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.