Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray] Rerun subtask for ray backend #2288

Merged
merged 8 commits into from
Aug 9, 2021

Conversation

keyile
Copy link
Contributor

@keyile keyile commented Aug 4, 2021

What do these changes do?

This PR generally adds support of basic rerun subtask for ray backend, and is basically the subsequent work for #2198. The details are as follows.

  • Fix the hanging problem when recovering ray sub pool.
  • Raise a ServerClosed when sub pool dies, which is the same as the original backend.
  • Fix the wait_actor_pool_recovered logic.
  • Fix coverage of some ray subprocesses.

Note: This PR changes back to one way channel for ray backend communication, which can have some side effects. If there are better ways to achieve the ServerClosed support, it's ok to change this.

@keyile keyile force-pushed the rerun_subtask_ray branch 2 times, most recently from 18df6ef to 2695c6f Compare August 5, 2021 02:07
@keyile keyile changed the title Basic rerun subtask for ray [Ray] basic rerun subtask for ray backend Aug 5, 2021
@qinxuye
Copy link
Collaborator

qinxuye commented Aug 5, 2021

Still some issue with codacy: https://app.codacy.com/gh/mars-project/mars/pullRequest?prid=7861162
And coverage is a bit low, try to increase it.

@qinxuye qinxuye added this to the v0.8.0a2 milestone Aug 5, 2021
@keyile
Copy link
Contributor Author

keyile commented Aug 5, 2021

@qinxuye Thanks for helping. I'm working on improving it.

mars/services/scheduling/worker/execution.py Outdated Show resolved Hide resolved
mars/oscar/backends/ray/pool.py Outdated Show resolved Hide resolved
mars/oscar/backends/ray/pool.py Show resolved Hide resolved
mars/oscar/backends/ray/pool.py Outdated Show resolved Hide resolved
mars/oscar/backends/ray/communication.py Show resolved Hide resolved
mars/oscar/backends/pool.py Outdated Show resolved Hide resolved
@keyile keyile marked this pull request as ready for review August 6, 2021 07:31
@qinxuye
Copy link
Collaborator

qinxuye commented Aug 6, 2021

Will review ASAP, we plan to release v0.8.0a1 as well as v0.7.0 soon enough, hope this could be merged soon after that.

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wjsi wjsi changed the title [Ray] basic rerun subtask for ray backend [Ray] Rerun subtask for ray backend Aug 9, 2021
@wjsi wjsi merged commit 83d176a into mars-project:master Aug 9, 2021
@keyile keyile deleted the rerun_subtask_ray branch August 9, 2021 06:16
wjsi pushed a commit to wjsi/mars that referenced this pull request Aug 9, 2021
@qinxuye
Copy link
Collaborator

qinxuye commented Aug 9, 2021

Thanks for your contribution, look forward to seeing more contributions from you.

chaokunyang pushed a commit to chaokunyang/mars that referenced this pull request Aug 16, 2021
qinxuye pushed a commit to qinxuye/mars that referenced this pull request Aug 18, 2021
qinxuye pushed a commit to qinxuye/mars that referenced this pull request Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants