Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler] Better validation for min_workers and max_workers #13779

Merged
merged 118 commits into from
Jan 29, 2021
Merged

[autoscaler] Better validation for min_workers and max_workers #13779

merged 118 commits into from
Jan 29, 2021

Conversation

AmeerHajAli
Copy link
Contributor

closes #13775 reported by Ed.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@AmeerHajAli
Copy link
Contributor Author

@ericl, can you please merge?

@AmeerHajAli AmeerHajAli added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 29, 2021
@ericl ericl merged commit 4d6817c into ray-project:master Jan 29, 2021
simon-mo added a commit that referenced this pull request Jan 29, 2021
@simon-mo
Copy link
Contributor

Reverting this PR because it consistently fail on Windows, please re-revert once you fix windows build.


================================== FAILURES ===================================
________ AutoscalingConfigTest.testValidateDefaultConfigMinMaxWorkers _________

self = <com_github_ray_project_ray.python.ray.tests.test_autoscaler_yaml.AutoscalingConfigTest testMethod=testValidateDefaultConfigMinMaxWorkers>

    def testValidateDefaultConfigMinMaxWorkers(self):
        aws_config_path = os.path.join(
            RAY_PATH, "autoscaler/aws/example-multi-node-type.yaml")
>       with open(aws_config_path) as f:
E       OSError: [Errno 22] Invalid argument: '\\\\?\\C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\Bazel.runfiles_6y4y9g7o\\runfiles\\com_github_ray_project_ray\\python\\ray\\autoscaler/aws/example-multi-node-type.yaml'

\\?\C:\Users\RUNNER~1\AppData\Local\Temp\Bazel.runfiles_6y4y9g7o\runfiles\com_github_ray_project_ray\python\ray\tests\test_autoscaler_yaml.py:51: OSError

simon-mo added a commit that referenced this pull request Jan 29, 2021
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
…roject#13779)

* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* deflake test_joblib

* lint

* placement groups bypass

* remove space

* Eric

* first ocmmit

* lint

* exmaple

* documentation

* hmm

* file path fix

* fix test

* some format issue in docs

* modified docs

* joblib strikes again on windows

* add ability to not start autoscaler/monitor

* a

* remove worker_default

* Remove default pod type from operator

* Remove worker_default_node_type from rewrite_legacy_yaml_to_availble_node_types

* deprecate useless fields

* fix error msg

* validate sum min_workers < max_workers

* 1 more edge case test

* lint

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[autoscaler] Better validation for min_workers and max_workers
6 participants