Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Client] Disconnect on dataclient error #16588

Merged
merged 7 commits into from
Jun 22, 2021

Conversation

ckw017
Copy link
Member

@ckw017 ckw017 commented Jun 21, 2021

Why are these changes needed?

When the data client's grpc channel disconnects (either for connection issues, or for a bad request) client is left in a state where it can't perform any remote functionality, but is still technically connected (client_ray.is_connected() is True).Since is_connected() still returns True, this causes errors if the user tries to start a new connection. This PR disconnects the client when the data client is found to be shut down by the main thread so that the user can cleanly try to reconnect.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ckw017 ckw017 force-pushed the disconnect_on_dataclient_error branch 2 times, most recently from 3f57979 to 502d0ac Compare June 21, 2021 19:33
@ckw017 ckw017 requested a review from ijrsvt June 21, 2021 19:44
@ckw017 ckw017 force-pushed the disconnect_on_dataclient_error branch from 8817342 to 6f56625 Compare June 21, 2021 22:05
python/ray/util/client/dataclient.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@AmeerHajAli
Copy link
Contributor

@DmitriGekhtman , if this is not too much work, it would be nice to include this PR as well in 1.4.1

@ckw017 ckw017 force-pushed the disconnect_on_dataclient_error branch from b3bbec6 to c26ab22 Compare June 22, 2021 03:41
@ckw017
Copy link
Member Author

ckw017 commented Jun 22, 2021

Rebased with master to fix failing java test, reviewers can merge at their discretion once everything is passing.

@DmitriGekhtman
Copy link
Contributor

@DmitriGekhtman , if this is not too much work, it would be nice to include this PR as well in 1.4.1

sure, np -- any additional release testing required?

@AmeerHajAli
Copy link
Contributor

AmeerHajAli commented Jun 22, 2021

@DmitriGekhtman , if this is not too much work, it would be nice to include this PR as well in 1.4.1

sure, np -- any additional release testing required?

No. Thanks!

@DmitriGekhtman
Copy link
Contributor

DmitriGekhtman commented Jun 22, 2021

test_array, test_async are both known flakey on windows -- travis build failure is widespread
Looks ready to merge from test perspective.

@AmeerHajAli AmeerHajAli merged commit b4f2cbc into ray-project:master Jun 22, 2021
DmitriGekhtman pushed a commit that referenced this pull request Jun 22, 2021
* disconnect when main thread finds dataclient shut down, update error messages

* Add test_dataclient_disconnect to small tests

* drop unused var

* add __main__ section to test

* avoid direct ray import

* rerun
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants