-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Provisioner] Get rid of ray dependency locally for aws #2625
Conversation
fix sky status refresh update setup update metadata per-instance logs improve ux update comments update logging reduce retries of provisioning to match the original number fix Ray compat issue update the config with the latest changes sync events with the latest changes sync with upstream resolve conflict
* new provisioner with AWS support fix sky status refresh update setup update metadata per-instance logs improve ux update comments update logging reduce retries of provisioning to match the original number fix Ray compat issue update the config with the latest changes sync events with the latest changes sync with upstream resolve conflict * cleanup unused functions * lint * update * fix * fix * Fix network configuration and authentication * Fix creation cluster name * fix logging * format * format * slight improvement for logging * minor fix logging * fix autostop * fix autostop * Fix ray ports used * fix ray start * Fix logging for ray start and skylet * fix ports for ray dashboard * add head / worker in name * More logging information for launching * fix cluster name in progress bar * Fix ssh * format * grammar * Fix message * fix ssh_proxy_commands * fix merge error * Get rid of ray logging * use info * change logger to use contextmanager * format * [major] remove dependency for ray on aws * minor fix * allow warning to be printed * adopt changes * rename * Better spinner * format * update logging * rich_utils in provision_utils * Fix instance log path * Log head node to provision.log * remove unused var * fix ssh method * Fix ray for worker node * Fix ray cluster on worker * fix ray cluster * Fix logging and ip fetching * show cluster name * fix wait after cluster launch * Add logging * Add continue * Change back to sleep before wait * use sleep 1 instead * clean * Avoid check skylet twice * skip docker * remove get_feasible_ips * fix comments * minor fix for logs * minor fix * Fix the ip fetching logic * Add quote for `cluster_name!r` * fix return * fix test * fix comments * fix comments * fix comments * Smoke tests changes to use a private VPC region. * refactor get_ips * minor debug info fix * fix * UX fix * Fix proxy command to be None * Address comments * format * Fix quote * update logging * compatibility for python 3.7 * format * format * change launched to provisioned * clean up ray yaml * format * [Provisioner] Add docker support back (#2507) * Add docker support back * Fix * Works! * format * Add docker back to the supported features * fix docker_cmd * Fix docker cmd * fix DockerLoginConfig * Move docker login config * Fix backward compatibility for docker * Fix docker login * Fix docker login config new line issue * Fix string process in ray yaml * Update sky/provision/docker_utils.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/instance_setup.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/docker_utils.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/instance_setup.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * address comments * format * format * Add comment * Address comments * format --------- Co-authored-by: Tian Xia <cblmemo@gmail.com> * Wording for SSH connection * Fix ray status check for backward compatibility * remnant * stronger backward compatibility tests * remove unused tag * Revert "remove unused tag" This reverts commit a12df8e. * Add default value for `docker_user` * fix botocore config * lint * Fix mypy * remove unused variable * minor fix for logging * expose bootstrap error * format * minor change for `wait_instances` API * lint * use `command_runner.ssh_options_list` * fix ssh command * reword * Dim error for bootstrapping * fix user known file * Update sky/setup_files/setup.py Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Add ray back, will fix the dependency issue #2625 * move dependency to local ray * Address renaming comments * renamed to provisioner * refactoring for logging to reduce confusion * format * rename back to meta data for metadata utils * format * move provisioner to `provision/` * Do not propagate to provisioner logger * Minor changes * Fix color for error of provisioner * Remove dimmer * Add back missing handler for `provision_logger` * add comments * Print error message for the failed ssh command * Make the error yellow --------- Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> Co-authored-by: Tian Xia <cblmemo@gmail.com>
3e14582
to
a6359ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll, a quick question.
Also, previously there was some logging change, is that no longer needed?
sky/clouds/utils/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason of moving these <cloud>_utils.py
to a new module?
We may want to document in README.md in this dir to give future clouds some guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 should add some guidance in README or other doc about adding new clouds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a simple one. We may want to add a more detailed one to the new cloud guide instead?
sky/clouds/utils/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 should add some guidance in README or other doc about adding new clouds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @Michaelvll.
This reverts commit 94b4ce3.
* new provisioner with AWS support fix sky status refresh update setup update metadata per-instance logs improve ux update comments update logging reduce retries of provisioning to match the original number fix Ray compat issue update the config with the latest changes sync events with the latest changes sync with upstream resolve conflict * cleanup unused functions * lint * update * fix * fix * Fix network configuration and authentication * Fix creation cluster name * fix logging * format * format * slight improvement for logging * minor fix logging * fix autostop * fix autostop * Fix ray ports used * fix ray start * Fix logging for ray start and skylet * fix ports for ray dashboard * add head / worker in name * More logging information for launching * fix cluster name in progress bar * Fix ssh * format * grammar * Fix message * fix ssh_proxy_commands * fix merge error * Get rid of ray logging * use info * change logger to use contextmanager * format * [major] remove dependency for ray on aws * minor fix * allow warning to be printed * adopt changes * rename * Better spinner * format * update logging * rich_utils in provision_utils * Fix instance log path * Log head node to provision.log * remove unused var * fix ssh method * Fix ray for worker node * Fix ray cluster on worker * fix ray cluster * Fix logging and ip fetching * show cluster name * fix wait after cluster launch * Add logging * Add continue * Change back to sleep before wait * use sleep 1 instead * clean * Avoid check skylet twice * skip docker * remove get_feasible_ips * fix comments * minor fix for logs * minor fix * Fix the ip fetching logic * Add quote for `cluster_name!r` * fix return * fix test * fix comments * fix comments * fix comments * Smoke tests changes to use a private VPC region. * refactor get_ips * minor debug info fix * fix * UX fix * Fix proxy command to be None * Address comments * format * Fix quote * update logging * compatibility for python 3.7 * format * format * change launched to provisioned * clean up ray yaml * format * [Provisioner] Add docker support back (skypilot-org#2507) * Add docker support back * Fix * Works! * format * Add docker back to the supported features * fix docker_cmd * Fix docker cmd * fix DockerLoginConfig * Move docker login config * Fix backward compatibility for docker * Fix docker login * Fix docker login config new line issue * Fix string process in ray yaml * Update sky/provision/docker_utils.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/instance_setup.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/docker_utils.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/instance_setup.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * address comments * format * format * Add comment * Address comments * format --------- Co-authored-by: Tian Xia <cblmemo@gmail.com> * Wording for SSH connection * Fix ray status check for backward compatibility * remnant * stronger backward compatibility tests * remove unused tag * Revert "remove unused tag" This reverts commit a12df8e. * Add default value for `docker_user` * fix botocore config * lint * Fix mypy * remove unused variable * minor fix for logging * expose bootstrap error * format * minor change for `wait_instances` API * lint * use `command_runner.ssh_options_list` * fix ssh command * reword * Dim error for bootstrapping * fix user known file * Update sky/setup_files/setup.py Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> * Add ray back, will fix the dependency issue skypilot-org#2625 * move dependency to local ray * Address renaming comments * renamed to provisioner * refactoring for logging to reduce confusion * format * rename back to meta data for metadata utils * format * move provisioner to `provision/` * Do not propagate to provisioner logger * Minor changes * Fix color for error of provisioner * Remove dimmer * Add back missing handler for `provision_logger` * add comments * Print error message for the failed ssh command * Make the error yellow --------- Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> Co-authored-by: Tian Xia <cblmemo@gmail.com>
…#2625) * new provisioner with AWS support fix sky status refresh update setup update metadata per-instance logs improve ux update comments update logging reduce retries of provisioning to match the original number fix Ray compat issue update the config with the latest changes sync events with the latest changes sync with upstream resolve conflict * cleanup unused functions * lint * update * fix * fix * Fix network configuration and authentication * Fix creation cluster name * fix logging * format * format * slight improvement for logging * minor fix logging * fix autostop * fix autostop * Fix ray ports used * fix ray start * Fix logging for ray start and skylet * fix ports for ray dashboard * add head / worker in name * More logging information for launching * fix cluster name in progress bar * Fix ssh * format * grammar * Fix message * fix ssh_proxy_commands * fix merge error * Get rid of ray logging * use info * change logger to use contextmanager * format * [major] remove dependency for ray on aws * minor fix * allow warning to be printed * adopt changes * rename * Better spinner * format * update logging * rich_utils in provision_utils * Fix instance log path * Log head node to provision.log * remove unused var * fix ssh method * Fix ray for worker node * Fix ray cluster on worker * fix ray cluster * Fix logging and ip fetching * show cluster name * fix wait after cluster launch * Add logging * Add continue * Change back to sleep before wait * use sleep 1 instead * clean * Avoid check skylet twice * skip docker * remove get_feasible_ips * fix comments * minor fix for logs * minor fix * Fix the ip fetching logic * Add quote for `cluster_name!r` * fix return * fix test * fix comments * fix comments * fix comments * Smoke tests changes to use a private VPC region. * refactor get_ips * minor debug info fix * fix * UX fix * Fix proxy command to be None * Address comments * format * Fix quote * update logging * compatibility for python 3.7 * format * format * change launched to provisioned * clean up ray yaml * format * [Provisioner] Add docker support back (skypilot-org#2507) * Add docker support back * Fix * Works! * format * Add docker back to the supported features * fix docker_cmd * Fix docker cmd * fix DockerLoginConfig * Move docker login config * Fix backward compatibility for docker * Fix docker login * Fix docker login config new line issue * Fix string process in ray yaml * Update sky/provision/docker_utils.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/instance_setup.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/docker_utils.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * Update sky/provision/instance_setup.py Co-authored-by: Tian Xia <cblmemo@gmail.com> * address comments * format * format * Add comment * Address comments * format --------- Co-authored-by: Tian Xia <cblmemo@gmail.com> * Wording for SSH connection * Fix ray status check for backward compatibility * remnant * stronger backward compatibility tests * remove unused tag * Revert "remove unused tag" This reverts commit a12df8e. * Add default value for `docker_user` * fix botocore config * lint * Fix mypy * remove unused variable * minor fix for logging * expose bootstrap error * format * minor change for `wait_instances` API * lint * use `command_runner.ssh_options_list` * fix ssh command * reword * Dim error for bootstrapping * fix user known file * avoid ray dependency for aws * fix setup.py * format * remove unused file * fix setup.py * __init__ for utils * format * format * Address comments * Elaborate readme --------- Co-authored-by: Siyuan <suquark@gmail.com> Co-authored-by: Zongheng Yang <zongheng.y@gmail.com> Co-authored-by: Tian Xia <cblmemo@gmail.com>
Follow up for #1702 to get rid of ray dependency locally for AWS
Tested (run the relevant ones):
bash format.sh
docker run -it --rm --name skypilot-debug berkeleyskypilot/skypilot-debug /bin/bash
time pip install -e .
: 28ssky launch --cloud aws --cpus 2 -c test-launch-aws
time pip install -e .[gcp]
: 25s ;time conda install -c conda-forge google-cloud-sdk
(1m20s)sky launch --cloud aws --cpus 2 -c test-launch-aws-with-ray
sky launch --cloud gcp --cpus 2 -c test-launch-gcp
pip install -e .[lambda]
;sky launch --cloud lambda --cpus 2+ -c test-lambda
(failed to get capacity)pip install -e .[all]
: 2m30s (originally: 2m30s);sky launch --cloud aws --cpus 2 -c test-launch-aws
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh