Skip to content

Conversation

JainTwinkle
Copy link
Contributor

What does this PR do?
This is first step towards improving resiliency and performance in Ray without modifying the source code. This PR includes a new tool that helps configure Ray cluster conveniently. The tool helps in fetching and parsing ray configurations, and generating resiliency profiles (e.g., strict, relaxed, recommended). Currently, we are working on deciding configuration options for each resiliency profile manually by evaluating them on various ray workloads. We'll update this PR accordingly.

Description of Changes
The changes in this PR is currently independent of the main codeFlare code. We intend to put this tool in a new folder called utils in the codeFlare root directory.

@JainTwinkle
Copy link
Contributor Author

@chcost could we assign someone to this PR review?

Thanks!

Copy link
Contributor

@raghukiran1224 raghukiran1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JainTwinkle Can you comment on the unmarked TODOs in the README?

@raghukiran1224
Copy link
Contributor

@JainTwinkle please let me know re above comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants