Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Provisioner] Get rid of ray dependency locally for aws #2625

Merged
merged 125 commits into from
Oct 2, 2023

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Sep 28, 2023

Follow up for #1702 to get rid of ray dependency locally for AWS

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • In a new docker container docker run -it --rm --name skypilot-debug berkeleyskypilot/skypilot-debug /bin/bash
    • time pip install -e .: 28s
    • sky launch --cloud aws --cpus 2 -c test-launch-aws
    • time pip install -e .[gcp]: 25s ; time conda install -c conda-forge google-cloud-sdk (1m20s)
    • sky launch --cloud aws --cpus 2 -c test-launch-aws-with-ray
    • sky launch --cloud gcp --cpus 2 -c test-launch-gcp
    • pip install -e .[lambda]; sky launch --cloud lambda --cpus 2+ -c test-lambda (failed to get capacity)
    • In a new container, pip install -e .[all]: 2m30s (originally: 2m30s); sky launch --cloud aws --cpus 2 -c test-launch-aws
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll added the blocked PR blocked by other issues label Sep 28, 2023
Michaelvll added a commit that referenced this pull request Sep 29, 2023
* new provisioner with AWS support

fix sky status refresh

update setup

update metadata

per-instance logs

improve ux

update comments

update logging

reduce retries of provisioning to match the original number

fix Ray compat issue

update the config with the latest changes

sync events with the latest changes

sync with upstream

resolve conflict

* cleanup unused functions

* lint

* update

* fix

* fix

* Fix network configuration and authentication

* Fix creation cluster name

* fix logging

* format

* format

* slight improvement for logging

* minor fix logging

* fix autostop

* fix autostop

* Fix ray ports used

* fix ray start

* Fix logging for ray start and skylet

* fix ports for ray dashboard

* add head / worker in name

* More logging information for launching

* fix cluster name in progress bar

* Fix ssh

* format

* grammar

* Fix message

* fix ssh_proxy_commands

* fix merge error

* Get rid of ray logging

* use info

* change logger to use contextmanager

* format

* [major] remove dependency for ray on aws

* minor fix

* allow warning to be printed

* adopt changes

* rename

* Better spinner

* format

* update logging

* rich_utils in provision_utils

* Fix instance log path

* Log head node to provision.log

* remove unused var

* fix ssh method

* Fix ray for worker node

* Fix ray cluster on worker

* fix ray cluster

* Fix logging and ip fetching

* show cluster name

* fix wait after cluster launch

* Add logging

* Add continue

* Change back to sleep before wait

* use sleep 1 instead

* clean

* Avoid check skylet twice

* skip docker

* remove get_feasible_ips

* fix comments

* minor fix for logs

* minor fix

* Fix the ip fetching logic

* Add quote for `cluster_name!r`

* fix return

* fix test

* fix comments

* fix comments

* fix comments

* Smoke tests changes to use a private VPC region.

* refactor get_ips

* minor debug info fix

* fix

* UX fix

* Fix proxy command to be None

* Address comments

* format

* Fix quote

* update logging

* compatibility for python 3.7

* format

* format

* change launched to provisioned

* clean up ray yaml

* format

* [Provisioner] Add docker support back (#2507)

* Add docker support back

* Fix

* Works!

* format

* Add docker back to the supported features

* fix docker_cmd

* Fix docker cmd

* fix DockerLoginConfig

* Move docker login config

* Fix backward compatibility for docker

* Fix docker login

* Fix docker login config new line issue

* Fix string process in ray yaml

* Update sky/provision/docker_utils.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/instance_setup.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/docker_utils.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/instance_setup.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* address comments

* format

* format

* Add comment

* Address comments

* format

---------

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Wording for SSH connection

* Fix ray status check for backward compatibility

* remnant

* stronger backward compatibility tests

* remove unused tag

* Revert "remove unused tag"

This reverts commit a12df8e.

* Add default value for `docker_user`

* fix botocore config

* lint

* Fix mypy

* remove unused variable

* minor fix for logging

* expose bootstrap error

* format

* minor change for `wait_instances` API

* lint

* use `command_runner.ssh_options_list`

* fix ssh command

* reword

* Dim error for bootstrapping

* fix user known file

* Update sky/setup_files/setup.py

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Add ray back, will fix the dependency issue #2625

* move dependency to local ray

* Address renaming comments

* renamed to provisioner

* refactoring for logging to reduce confusion

* format

* rename back to meta data for metadata utils

* format

* move provisioner to `provision/`

* Do not propagate to provisioner logger

* Minor changes

* Fix color for error of provisioner

* Remove dimmer

* Add back missing handler for `provision_logger`

* add comments

* Print error message for the failed ssh command

* Make the error yellow

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
@Michaelvll Michaelvll removed the blocked PR blocked by other issues label Sep 29, 2023
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll, a quick question.

Also, previously there was some logging change, is that no longer needed?

sky/setup_files/setup.py Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of moving these <cloud>_utils.py to a new module?

We may want to document in README.md in this dir to give future clouds some guidance.

Copy link
Collaborator

@suquark suquark Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 should add some guidance in README or other doc about adding new clouds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a simple one. We may want to add a more detailed one to the new cloud guide instead?

Copy link
Collaborator

@suquark suquark Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 should add some guidance in README or other doc about adding new clouds

sky/clouds/utils/scp_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Michaelvll.

sky/clouds/utils/README.md Outdated Show resolved Hide resolved
@Michaelvll Michaelvll merged commit 94b4ce3 into master Oct 2, 2023
18 checks passed
@Michaelvll Michaelvll deleted the get-rid-of-ray-locally-for-aws branch October 2, 2023 20:05
Michaelvll added a commit that referenced this pull request Oct 4, 2023
jc9123 pushed a commit to jc9123/skypilot that referenced this pull request Oct 11, 2023
* new provisioner with AWS support

fix sky status refresh

update setup

update metadata

per-instance logs

improve ux

update comments

update logging

reduce retries of provisioning to match the original number

fix Ray compat issue

update the config with the latest changes

sync events with the latest changes

sync with upstream

resolve conflict

* cleanup unused functions

* lint

* update

* fix

* fix

* Fix network configuration and authentication

* Fix creation cluster name

* fix logging

* format

* format

* slight improvement for logging

* minor fix logging

* fix autostop

* fix autostop

* Fix ray ports used

* fix ray start

* Fix logging for ray start and skylet

* fix ports for ray dashboard

* add head / worker in name

* More logging information for launching

* fix cluster name in progress bar

* Fix ssh

* format

* grammar

* Fix message

* fix ssh_proxy_commands

* fix merge error

* Get rid of ray logging

* use info

* change logger to use contextmanager

* format

* [major] remove dependency for ray on aws

* minor fix

* allow warning to be printed

* adopt changes

* rename

* Better spinner

* format

* update logging

* rich_utils in provision_utils

* Fix instance log path

* Log head node to provision.log

* remove unused var

* fix ssh method

* Fix ray for worker node

* Fix ray cluster on worker

* fix ray cluster

* Fix logging and ip fetching

* show cluster name

* fix wait after cluster launch

* Add logging

* Add continue

* Change back to sleep before wait

* use sleep 1 instead

* clean

* Avoid check skylet twice

* skip docker

* remove get_feasible_ips

* fix comments

* minor fix for logs

* minor fix

* Fix the ip fetching logic

* Add quote for `cluster_name!r`

* fix return

* fix test

* fix comments

* fix comments

* fix comments

* Smoke tests changes to use a private VPC region.

* refactor get_ips

* minor debug info fix

* fix

* UX fix

* Fix proxy command to be None

* Address comments

* format

* Fix quote

* update logging

* compatibility for python 3.7

* format

* format

* change launched to provisioned

* clean up ray yaml

* format

* [Provisioner] Add docker support back (skypilot-org#2507)

* Add docker support back

* Fix

* Works!

* format

* Add docker back to the supported features

* fix docker_cmd

* Fix docker cmd

* fix DockerLoginConfig

* Move docker login config

* Fix backward compatibility for docker

* Fix docker login

* Fix docker login config new line issue

* Fix string process in ray yaml

* Update sky/provision/docker_utils.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/instance_setup.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/docker_utils.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/instance_setup.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* address comments

* format

* format

* Add comment

* Address comments

* format

---------

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Wording for SSH connection

* Fix ray status check for backward compatibility

* remnant

* stronger backward compatibility tests

* remove unused tag

* Revert "remove unused tag"

This reverts commit a12df8e.

* Add default value for `docker_user`

* fix botocore config

* lint

* Fix mypy

* remove unused variable

* minor fix for logging

* expose bootstrap error

* format

* minor change for `wait_instances` API

* lint

* use `command_runner.ssh_options_list`

* fix ssh command

* reword

* Dim error for bootstrapping

* fix user known file

* Update sky/setup_files/setup.py

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Add ray back, will fix the dependency issue skypilot-org#2625

* move dependency to local ray

* Address renaming comments

* renamed to provisioner

* refactoring for logging to reduce confusion

* format

* rename back to meta data for metadata utils

* format

* move provisioner to `provision/`

* Do not propagate to provisioner logger

* Minor changes

* Fix color for error of provisioner

* Remove dimmer

* Add back missing handler for `provision_logger`

* add comments

* Print error message for the failed ssh command

* Make the error yellow

---------

Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
jc9123 pushed a commit to jc9123/skypilot that referenced this pull request Oct 11, 2023
…#2625)

* new provisioner with AWS support

fix sky status refresh

update setup

update metadata

per-instance logs

improve ux

update comments

update logging

reduce retries of provisioning to match the original number

fix Ray compat issue

update the config with the latest changes

sync events with the latest changes

sync with upstream

resolve conflict

* cleanup unused functions

* lint

* update

* fix

* fix

* Fix network configuration and authentication

* Fix creation cluster name

* fix logging

* format

* format

* slight improvement for logging

* minor fix logging

* fix autostop

* fix autostop

* Fix ray ports used

* fix ray start

* Fix logging for ray start and skylet

* fix ports for ray dashboard

* add head / worker in name

* More logging information for launching

* fix cluster name in progress bar

* Fix ssh

* format

* grammar

* Fix message

* fix ssh_proxy_commands

* fix merge error

* Get rid of ray logging

* use info

* change logger to use contextmanager

* format

* [major] remove dependency for ray on aws

* minor fix

* allow warning to be printed

* adopt changes

* rename

* Better spinner

* format

* update logging

* rich_utils in provision_utils

* Fix instance log path

* Log head node to provision.log

* remove unused var

* fix ssh method

* Fix ray for worker node

* Fix ray cluster on worker

* fix ray cluster

* Fix logging and ip fetching

* show cluster name

* fix wait after cluster launch

* Add logging

* Add continue

* Change back to sleep before wait

* use sleep 1 instead

* clean

* Avoid check skylet twice

* skip docker

* remove get_feasible_ips

* fix comments

* minor fix for logs

* minor fix

* Fix the ip fetching logic

* Add quote for `cluster_name!r`

* fix return

* fix test

* fix comments

* fix comments

* fix comments

* Smoke tests changes to use a private VPC region.

* refactor get_ips

* minor debug info fix

* fix

* UX fix

* Fix proxy command to be None

* Address comments

* format

* Fix quote

* update logging

* compatibility for python 3.7

* format

* format

* change launched to provisioned

* clean up ray yaml

* format

* [Provisioner] Add docker support back (skypilot-org#2507)

* Add docker support back

* Fix

* Works!

* format

* Add docker back to the supported features

* fix docker_cmd

* Fix docker cmd

* fix DockerLoginConfig

* Move docker login config

* Fix backward compatibility for docker

* Fix docker login

* Fix docker login config new line issue

* Fix string process in ray yaml

* Update sky/provision/docker_utils.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/instance_setup.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/docker_utils.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Update sky/provision/instance_setup.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* address comments

* format

* format

* Add comment

* Address comments

* format

---------

Co-authored-by: Tian Xia <cblmemo@gmail.com>

* Wording for SSH connection

* Fix ray status check for backward compatibility

* remnant

* stronger backward compatibility tests

* remove unused tag

* Revert "remove unused tag"

This reverts commit a12df8e.

* Add default value for `docker_user`

* fix botocore config

* lint

* Fix mypy

* remove unused variable

* minor fix for logging

* expose bootstrap error

* format

* minor change for `wait_instances` API

* lint

* use `command_runner.ssh_options_list`

* fix ssh command

* reword

* Dim error for bootstrapping

* fix user known file

* avoid ray dependency for aws

* fix setup.py

* format

* remove unused file

* fix setup.py

* __init__ for utils

* format

* format

* Address comments

* Elaborate readme

---------

Co-authored-by: Siyuan <suquark@gmail.com>
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants