Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ray] Default enable fault tolerance for ray #2801

Merged

Conversation

fyrestone
Copy link
Contributor

What do these changes do?

The worker fault tolerance is well tested in ray, it can be enabled by default.

Related issue number

Fixes #xxxx

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

留宝 added 2 commits March 9, 2022 11:46
Merge branch default_enable_fo of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/154

Signed-off-by: 天苍 <yiming.yym@antgroup.com>

* Enable fo by default, subtask_max_retries: 3, subtask_max_reschedules: 2
Merge branch fix_default_enable_fo of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/164

Signed-off-by: 天苍 <yiming.yym@antgroup.com>


* Disable fo for test
@fyrestone fyrestone self-assigned this Mar 9, 2022
@fyrestone fyrestone marked this pull request as ready for review March 16, 2022 07:09
Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye added this to PR-In progress in v0.9 Release via automation Mar 16, 2022
@qinxuye qinxuye added this to the v0.9.0rc1 milestone Mar 16, 2022
Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 1464b0f into mars-project:master Mar 16, 2022
v0.9 Release automation moved this from PR-In progress to PR-Done Mar 16, 2022
chaokunyang pushed a commit to chaokunyang/mars that referenced this pull request May 31, 2022
Merge branch cp_2748_2755_2801 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/267

Signed-off-by: 不涸 <zhongchun.yzc@antgroup.com>


* Fix long exception of asyncio.gather (mars-project#2748)

* Fix profiling band_subtasks and most_calls are empty if the slow duration is large (mars-project#2755)

* [Ray] Default enable fault tolerance for ray (mars-project#2801)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Mars on Ray
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

3 participants