-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] The New GcsClient binding #46186
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@@ -661,6 +662,8 @@ def test_get_applications_while_gcs_down( | |||
): | |||
# Test serve REST API availability when the GCS is down. | |||
monkeypatch.setenv("RAY_SERVE_KV_TIMEOUT_S", "3") | |||
importlib.reload(ray.serve._private.constants) # to reload the constants set above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, Ray Serve uses infinite timeout for internal kv put/get. To test when GCS is down, it sets the timeout to 3s. However this setting never worked because the env did not load.
Previous PythonGcsClient would return error of GrpcUnavailable on GCS down even if timeout is inf. The new GcsClient properly infinitely retries and hangs. To make the env work, we need to reload it.
python/ray/includes/common.pxi
Outdated
|
||
cdef class GcsClientOptions: | ||
"""Cython wrapper class of C++ `ray::gcs::GcsClientOptions`.""" | ||
cdef: | ||
unique_ptr[CGcsClientOptions] inner | ||
|
||
@classmethod | ||
def from_gcs_address(cls, gcs_address): | ||
def from_gcs_address(cls, gcs_address, cluster_id_hex=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method name is no longer accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also moving cluster_id into GcsClientOptions can be its own PR?
@@ -104,7 +104,7 @@ def ping(self): | |||
|
|||
gcs_client = GcsClient(address=ray.get_runtime_context().gcs_address) | |||
|
|||
with pytest.raises(ray.exceptions.RpcError): | |||
with pytest.raises(ray.exceptions.RaySystemError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any breaking changes. Ideally this PR shouldn't touch serve code or any tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fix forward. drain_node
returns an error from GCS side which should not be considered a RpcError (indicates network issue). Ray serve changes fix bad test fixtures.
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Creates a direct Cython binding for
ray::gcs::GcsClient
and replaces the existingPythonGcsClient
binding. The new binding is enabled by default; one can switch back withRAY_USE_OLD_GCS_CLIENT=1
.The new binding is in its own file
gcs_client.pxi
included by_raylet.pyx
.Changes:
cluster_id
from arg to a GcsClientOptions field.timeout_ms
arg forNodeInfoAccessor::AsyncGetAll
andJobInfoAccessor::AsyncGetAll
.JobInfoAccessor::GetAll
andNodeInfoAccessor::DrainNodes
andNodeResourceInfoAccessor::GetAllResourceUsage
.NodeInfoAccessor::GetAllNoCache
.python/ray/tune/tests/test_tune_restore.py::ResourceExhaustedTest::test_resource_exhausted_info