-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Refactor how PythonGcsClient treats errors #45817
Open
rynewang
wants to merge
17
commits into
ray-project:master
Choose a base branch
from
rynewang:py-gcs-client-error-refactor
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[core] Refactor how PythonGcsClient treats errors #45817
rynewang
wants to merge
17
commits into
ray-project:master
from
rynewang:py-gcs-client-error-refactor
+147
−184
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
jjyao
reviewed
Jun 17, 2024
python/ray/_raylet.pyx
Outdated
@@ -3288,7 +3298,7 @@ def check_health(address: str, timeout=2, skip_version_check=False): | |||
check_status(PythonCheckGcsHealth( | |||
c_gcs_address, c_gcs_port, timeout_ms, c_ray_version, | |||
c_skip_version_check, c_is_healthy)) | |||
except RpcError: | |||
except (RpcError, GetTimeoutError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetTimeoutError
is for ray.get()
?
I think RpcError(rpc_code=DEADLINE_EXCEEDED)
is the one we want to raise
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
…-refactor Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
2f34bac
to
6141b7b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
C++ GcsClient and PythonGcsClient (and python bindings in raylet_.pyx) are not very consistent. This PR aims to change the behavior of the latter to close to the former.
Behavior Changes
ray::Status::TimedOut()
raise RpcError(rpc_code=DEADLINE_EXCEEDED)
raise RpcError(rpc_code=DEADLINE_EXCEEDED)
Disconnected()
raise RpcError(rpc_code=the code)
raise RpcError(rpc_code=the code)
status
raise RpcError(rpc_code=the code)
status
reply.status
raise InvalidError
, or special treatment (see below)reply.status
, mostlyRaySystemError
OK
OK
OK
Payload Error Special Treatments
PythonGcsClient mostly just return an RpcError for everything without raising a more specific error type. With this PR we will do some error type mapping and return any non OK status as returned from the server side.
Implementation changes:
HandleGcsStatuses
, which behaves similar to that in the C++GcsRpcClient
. The only difference is that we don't do any retry or return Disconnected.ray::Status::GrpcUnknown
can carry a grpc code. To expose that rpc_code to Python, allow its ctor to accept a code.All these should be OK, because GcsClient is not a public API and we can change its internal APIs arbitrarily, as long as all Ray call sites are handled.