Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Requirements] Support extras #620

Merged
merged 23 commits into from Dec 28, 2020

Conversation

Hedingber
Copy link
Contributor

@Hedingber Hedingber commented Dec 28, 2020

Fixes #551

This PR leverages setuptools' extras_require to enable a user to install only the dependencies they really need. This will give:

  • Faster installation
  • Less conflicts with local venv
  • Don't install un-needed dependencies

The extras we will support are:

  • api - already existed - needed dependencies to run the API
  • s3 - boto3~=1.9 - needed dependencies to use s3 as the storage layer
  • azure-blob-storage - azure-storage-blob~=12.0 - needed dependencies to use azure blob storage as the storage layer
  • complete - all of the above excluding api
  • complete-api - all of the above including api

For examples on how to install package with extras using pip see this

I wanted to evaluate how effective was this change on the downloaded dependencies size. I couldn't find an automatic tool online to do that, so what I did is executing pip install --no-cache-dir . which prints output such as:

Collecting starlette==0.13.6
  Downloading starlette-0.13.6-py3-none-any.whl (59 kB)
     |████████████████████████████████| 59 kB 2.5 MB/s

and parsed its output. I'm uploading here the code I used so we can re-use it in the future:
package_sizes.tar.gz
I also used https://github.com/naiquevin/pipdeptree to easily understand the dependency tree

The old dependencies size was 83.39 MB, the new size is 81 MB (2.8% decrease)
The packages we don't install anymore are:

cryptography:1.8 MB
azure-storage-blob:328 KB
azure-core:124 KB
msrest:84 KB
isodate:45 KB

Notes:

  • Unfortunately although boto3 moved to the s3 extra, it is still installed by default since it's a sub-requirement of nuclio-jupyter (boto3 and its sub requirements are 7.4MB) - We may be able to remove it, I'm checking it
  • Ideally I would want to also make an extra for kfp but since its code is not very ordered doing it meaning to add an import inside a lot of functions so I decided to skip it for now
  • While working on this PR I considered another 2 extras:
  1. v3io - v3io-frames~=0.8.5, v3io~=0.5.0 - needed dependencies to use v3io as the storage layer, the total size it could save is 5.75MB, but since it's widely used by most of MLRun users decided to keep it as part of the base
  2. dask - dask~=2.12 - needed dependencies to run dask functions - the total size it could save is 848 KB, low size + widely used - decided to keep it as well
  • Most of our dependencies size is coming from several packages, these are the top:
numpy:15.3 MB
pyarrow:13.4 MB
pandas:10.3 MB
notebook:9.5 MB
botocore:7.2 MB
pydantic:2.3 MB
kubernetes:1.5 MB
jedi:1.4 MB
sqlalchemy:1.2 MB
protobuf:1 MB

Some analysis:

  • numpy and pyarrow are sub-requirements of pandas so practically pandas by itself is 39MB, almost half of the size (though most data scientists will already have pandas in their venv)
  • notebook is a sub requirement of nuclio-jupyter, we may be able to remove it from there, I'm checking it
  • botocore is a sub requirement of boto3 details above

Also:

  • Removed google-auth<2.0dev,>=1.19.1 from requirements, it was added in Requirements fixes #373 to fix some conflict which doesn't happen anymore
  • Added chardet>=3.0.2, <4.0 to requirements to fix a conflict

@Hedingber Hedingber marked this pull request as ready for review December 28, 2020 04:40
* add complete-api (and api out of complete)
* test also complete and complete-api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEATURE: Update setup.py to support multiple extras_requires
1 participant