-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hotfix][autoscaler] Request resources refactor2 #12661
Merged
ericl
merged 86 commits into
ray-project:master
from
AmeerHajAli:request_resources_refactor2
Dec 9, 2020
Merged
[hotfix][autoscaler] Request resources refactor2 #12661
ericl
merged 86 commits into
ray-project:master
from
AmeerHajAli:request_resources_refactor2
Dec 9, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit 818a63a.
Which part is the bug here? this looks really big for a hotfix. Can you also describe what "rewriting" is here? |
my bad, @wuisawesome , updated the description. |
ericl
reviewed
Dec 8, 2020
ericl
reviewed
Dec 8, 2020
ericl
approved these changes
Dec 8, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please leave TODO comments for followups before merging.
mfitton
pushed a commit
that referenced
this pull request
Dec 10, 2020
* prepare for head node * move command runner interface outside _private * remove space * Eric * flake * min_workers in multi node type * fixing edge cases * eric not idle * fix target_workers to consider min_workers of node types * idle timeout * minor * minor fix * test * lint * eric v2 * eric 3 * min_workers constraint before bin packing * Update resource_demand_scheduler.py * Revert "Update resource_demand_scheduler.py" This reverts commit 818a63a. * reducing diff * make get_nodes_to_launch return a dict * merge * weird merge fix * auto fill instance types for AWS * Alex/Eric * Update doc/source/cluster/autoscaling.rst * merge autofill and input from user * logger.exception * make the yaml use the default autofill * docs Eric * remove test_autoscaler_yaml from windows tests * lets try changing the test a bit * return test * lets see * edward * Limit max launch concurrency * commenting frac TODO * move to resource demand scheduler * use STATUS UP TO DATE * Eric * make logger of gc freed refs debug instead of info * add cluster name to docker mount prefix directory * grrR * fix tests * moving docker directory to sdk * move the import to prevent circular dependency * smallf fix * ian * fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running * small fix * request_resources -> min workers * test fixes * add race condition tests * Eric * fixes * semi final * semi final * lint * lint Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan> Co-authored-by: Alex Wu <alex@anyscale.io> Co-authored-by: Alex Wu <itswu.alex@gmail.com> Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #12498 and #12005 and #12503.
Refactors request_resources() by "rewriting" to its min_workers equivalent.
Request_resources() is handled by adding any additional necessary resources when calculating
_add_min_workers_nodes
in the scheduler. This basically makes handling request_resources() similar to keeping the min_workers.The PR also prioritized the connected nodes sorted based on last use when "keeping nodes" so that we always have resources available immediately for min_workers and request_resources.
The PR also includes the code necessary to keep the idle nodes necessary for request_resources().
Around 200+ LOC are tests.
The PR unveiled multiple bugs in the tests and autoscaler that required fixing other race conditions for command runner in the tests and auto terminating failed to initialize/update nodes.
Checks
scripts/format.sh
to lint the changes in this PR.