Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dependency] Remove AWS dependency by default #2841

Merged
merged 13 commits into from
Dec 24, 2023

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Dec 5, 2023

Fixes #2810

This PR removes AWS dependency from the default installation dependency.

Reason:

  1. AWS dependency is not actually lightweight - with AWS dependency: 39.5s, without AWS dependency: 31.1s (20% difference)
  2. AWS depdency has a strict version requirement colorama < 0.4.5 which is quite old and can conflict with users' existing environment.
  3. sky show-gpus should be available without awscli installed.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • sky show-gpus without aws dependencies installed
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll marked this pull request as draft December 5, 2023 04:52
@Michaelvll Michaelvll marked this pull request as ready for review December 5, 2023 05:18
@concretevitamin
Copy link
Collaborator

Shall we list out the pros and cons of two options and discuss:

@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Dec 7, 2023

Make skypilot have no cloud dependencies (this PR) vs skypilot[minimal]

Pro:

  1. Installing the default pip install skypilot will be faster
  2. Minimizing the dependency confliction issue when the user install SkyPilot in an existing environment
  3. A user that is only planning to use a specific cloud, e.g. GCP, does not need to install AWS dependencies.
  4. A user with multiple cloud credential setup already will not be confused after installing skypilot, but only see AWS in optimizer table when sky launch is called.

Con:

  1. A user with AWS credentials already setup will not be able to use the skypilot out-of-the-box.

Please feel free to add any other points you find that is important. : )

@concretevitamin
Copy link
Collaborator

@dongreenberg - would be awesome if you could confirm this PR solved #2810 for you!

@dongreenberg
Copy link
Contributor

Confirmed!! Thank you guys!!

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongreenberg @Michaelvll - quick comments and otherwise LGTM.

sky/setup_files/setup.py Show resolved Hide resolved
docs/source/contribute/contributing.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll. Pushed some finetuning:

  • Made the action sentence (choose your cloud) stand out
  • A bit of rewording
  • Add Kubernetes in sky check output

Wdyt?

@Michaelvll
Copy link
Collaborator Author

Thanks @Michaelvll. Pushed some finetuning:

  • Made the action sentence (choose your cloud) stand out
  • A bit of rewording
  • Add Kubernetes in sky check output

Wdyt?

Looks great! Thanks for updating it. I added the python requirement for Apple silicon back to the sentence above. Is that still valid?

@@ -5,7 +5,7 @@ Installation

.. note::

For Macs, macOS >= 10.15 is required to install SkyPilot. Apple Silicon-based devices (e.g. Apple M1) must run :code:`pip uninstall grpcio; conda install -c conda-forge grpcio=1.43.0` prior to installing SkyPilot.
For Macs, macOS >= 10.15 is required to install SkyPilot. Apple Silicon-based devices (e.g. Apple M1) must use :code:`python>=3.8` and run :code:`pip uninstall grpcio; conda install -c conda-forge grpcio=1.43.0` prior to installing SkyPilot.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm on my M1 Mac I've been using 3.7 until recently with no problems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting! Let me revert it then. Thanks for confirming!

@Michaelvll Michaelvll merged commit d6f57cc into master Dec 24, 2023
19 checks passed
@Michaelvll Michaelvll deleted the remove-aws-dependency-by-default branch December 24, 2023 03:53
@sqr00t
Copy link

sqr00t commented Dec 24, 2023

Heya @Michaelvll thanks for addressing this. I wonder if deprecation of awscli v1 would be in the pipeline, in which case awscli could also be removed from aws_dependencies.

Con:

  1. A user with AWS credentials already setup will not be able to use the skypilot out-of-the-box.

Originally posted by @Michaelvll in #2841 (comment)

I suppose this only applies for those using awscli v1? I do have AWS credentials already setup via the latest awscliv2 setup, then ran the usual aws configure .... I've forked Skypilot@ d6f57cc , removed awscli from setup.py, and still able to use Skypilot in a clean environment with pip install "skypilot[aws]@git+https://github.com/sqr00t/skypilot-no-awscli.git" --no-cache.

Is it worth to separate awscli as a separate extra? Either way, there's now no conflict with my other projects🙏 when listing skypilot and boto3 as dependencies using d6f57cc

Also to note only pip install boto3 is needed if awscli is already installed, and possibly using importlib.utils to check imports here would make a bit more sense?

Not sure if the above necessitates it's own issue/PR to address pragmatically. But keen to here your thoughts on this, and look forward to the next release :)

@Michaelvll
Copy link
Collaborator Author

Heya @Michaelvll thanks for addressing this. I wonder if deprecation of awscli v1 would be in the pipeline, in which case awscli could also be removed from aws_dependencies.

Con:

  1. A user with AWS credentials already setup will not be able to use the skypilot out-of-the-box.

Originally posted by @Michaelvll in #2841 (comment)

I suppose this only applies for those using awscli v1? I do have AWS credentials already setup via the latest awscliv2 setup, then ran the usual aws configure .... I've forked Skypilot@ d6f57cc , removed awscli from setup.py, and still able to use Skypilot in a clean environment with pip install "skypilot[aws]@git+https://github.com/sqr00t/skypilot-no-awscli.git" --no-cache.

Is it worth to separate awscli as a separate extra? Either way, there's now no conflict with my other projects🙏 when listing skypilot and boto3 as dependencies using d6f57cc

Also to note only pip install boto3 is needed if awscli is already installed, and possibly using importlib.utils to check imports here would make a bit more sense?

Not sure if the above necessitates it's own issue/PR to address pragmatically. But keen to here your thoughts on this, and look forward to the next release :)

Thanks for the comment @sqr00t!
We are aware of the deprecation of awscli v1, but unfortunately, AWS decides to not release the awscliv2 on PyPI (which I don't personally understand).

I am slightly leaning towards having the awscli in the setup, so that the user do not have to go through another instruction for installing the dependencies required for AWS. Having another extra also seems complicating the installation processes a bit. Currently, seems you could install the raw skypilot and install boto3 manually to get AWS setup when you have awsv2 manually installed. We should make it more clear in our document. : )

For the check of boto3 installation, we have included the checks in sky check which calls the following function to check the dependencies:

def check_credentials(cls) -> Tuple[bool, Optional[str]]:

If the issue still bothers, please feel free to file another issue so we can better keep track of it. : )
Just a note, we have a nightly build release that will include the latest changes in the master branch, so if you would like to get the latest commits, pip install -U skypilot-nightly would work for you.

@sqr00t
Copy link

sqr00t commented Dec 24, 2023

Thanks for your quick response! I'll start a separate issue to address a small bug related to checking imports.

Also thanks for linking the issue on AWS's decision not to release awscliv2 on pypi, the main pain point that I've encounted is:

  1. Supporting pip as an installation mechanism requires supporting large
    dependency ranges for the packages we depend on. While simple in concept, this
    is difficult to execute on reliably as it's not feasible to test all
    permutations of platform, Python version, and package version. When new
    packages we depend on are published V1 customers installing via pip are often
    broken. This leads us to have relatively aggressive version ranges and we end
    up becoming a blocker for the Python community to upgrade packages.
    See #4749 #4561 [Serve] Change back from autodown to autostop #3535

Originally posted by @joguSD in aws/aws-cli#4947 (comment)

To add awscli as a dependency means that downstream needs to track changes conforming to awscli. Things "broke" frequently even in v1. I had recommended my team to install Skypilot in a separate environment to projects, and removed awscli from our project templates (based on cookiecutter).

In dev environments, some users that are less focused on engineering (i.e. Statistician-oriented Data Scientists) would prefer to minimise context switching when they want to use Skypilot, i.e. not even needing to conda activate skypilot_env. I'd like to also highlight that It's been a bliss to easily setup cloud Jupyter environments because of Skypilot 💯 , and not relying on Sagemaker etc.

  1. We cannot include or ship additional binaries (that may not even be written
    in Python) as part of an AWS CLI install. In the future the AWS CLI can be
    partially or even entirely rewritten in another language while maintaining
    backwards compatibility.

I think there may be some exploration in the future to integrate tooling in Go or Rust. There are already more performant tools than awscli for syncing objects, like s5cmd (although benchmarks will depend on use-case).

@concretevitamin
Copy link
Collaborator

Thanks for the excellent feedback @sqr00t!

I'd like to also highlight that It's been a bliss to easily setup cloud Jupyter environments because of Skypilot 💯 , and not relying on Sagemaker etc.

That's great to hear. Curious if you use other parts of SkyPilot (e.g., managed spot?). We'd love to chat to understand how to support your use cases better; please feel free to join our Slack if you'd like. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] Don't include AWS dependencies by default
4 participants