Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Refactor how PythonGcsClient treats errors #45817

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Jun 8, 2024

C++ GcsClient and PythonGcsClient (and python bindings in raylet_.pyx) are not very consistent. This PR aims to change the behavior of the latter to close to the former.

Behavior Changes

Situation C++ Behavior Python GcsClient Behavior New Python GcsClient Behavior (this PR)
Timeout ray::Status::TimedOut() raise RpcError(rpc_code=DEADLINE_EXCEEDED) raise RpcError(rpc_code=DEADLINE_EXCEEDED)
gRPC Issue (Unknown or Unavailable) Retry the call or invoke Disconnected() raise RpcError(rpc_code=the code) raise RpcError(rpc_code=the code)
Other Status Return status raise RpcError(rpc_code=the code) raise the error mapped from status
Payload Error Return reply.status raise InvalidError, or special treatment (see below) raise the error mapped from reply.status, mostly RaySystemError
OK Return OK OK OK

Payload Error Special Treatments

Method reply.status Python GcsClient Behavior New Python GcsClient Behavior (this PR)
InternalKV(Multi)Get KeyError (key invalid) return None return None
InternalKV(Multi)Get NotFound (key valid but not found) return None raise RaySystemError

PythonGcsClient mostly just return an RpcError for everything without raising a more specific error type. With this PR we will do some error type mapping and return any non OK status as returned from the server side.

Implementation changes:

  • Removed many manual error treating code and their quirks in PythonGcsClient.
  • New error treatment function HandleGcsStatuses, which behaves similar to that in the C++ GcsRpcClient. The only difference is that we don't do any retry or return Disconnected.
  • ray::Status::GrpcUnknown can carry a grpc code. To expose that rpc_code to Python, allow its ctor to accept a code.
  • Fixed all tests.

All these should be OK, because GcsClient is not a public API and we can change its internal APIs arbitrarily, as long as all Ray call sites are handled.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang rynewang requested a review from a team as a code owner June 8, 2024 02:25
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang rynewang added the go add ONLY when ready to merge, run all tests label Jun 14, 2024
@rynewang rynewang assigned ruisearch42 and unassigned jjyao Jun 14, 2024
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@@ -3288,7 +3298,7 @@ def check_health(address: str, timeout=2, skip_version_check=False):
check_status(PythonCheckGcsHealth(
c_gcs_address, c_gcs_port, timeout_ms, c_ray_version,
c_skip_version_check, c_is_healthy))
except RpcError:
except (RpcError, GetTimeoutError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTimeoutError is for ray.get()?

I think RpcError(rpc_code=DEADLINE_EXCEEDED) is the one we want to raise

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
…-refactor

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants