Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use poetry for dependency management #111

Merged
merged 23 commits into from
Mar 2, 2023
Merged

Use poetry for dependency management #111

merged 23 commits into from
Mar 2, 2023

Conversation

maik-schmidt
Copy link
Contributor

@maik-schmidt maik-schmidt commented Feb 16, 2023

Description

This PR introduces poetry for dependency and venv management.

Because there were some version conflicts, we also made the following changes:

  • Pin fsspec, gcsfs and s3fs versions
  • Lower pyarrow version to <0.11 because of conflict with mlflow
  • Move sphinx from dev to doc dependencies
  • The "dev" extra is removed from the published package

For building the python package, we switch from setuptools to poetry-core. We still build the package with our setup.py wrapper to inject our custom versioning logic.

This PR contains minimal changes to stay compatible with the current Docker and CloudBuild workflows.

  • Locally use poetry for managing the venv (.venv/) and the dependencies (pyproject.toml, poetry.lock)
  • Export requirements.txt from the lock file in a pre-commit hook. No poetry is required from this moment.
  • Build the Docker image for CI/CD by installing the dependencies with pip
  • For publishing the package:
    1. Build it using python -m build (uses poetry-core)
    2. Publish it using twine

As a next step we can look into using poetry also for publishing the package to PyPI. This would require setting up poetry in Docker.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring including code style reformatting
  • Other (please describe):

Checklist:

  • I have read the contributing guideline doc (external contributors only)
  • Lint and unit tests pass locally with my changes
  • I have kept the PR small so that it can be easily reviewed
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • All dependency changes have been reflected in the pip requirement files.

@maik-schmidt maik-schmidt marked this pull request as draft February 16, 2023 09:00
@github-actions
Copy link

github-actions bot commented Feb 16, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@maik-schmidt maik-schmidt changed the title Maik poetry setup Use poetry for dependency management Feb 16, 2023
@maik-schmidt
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@maik-schmidt
Copy link
Contributor Author

recheck

@maik-schmidt
Copy link
Contributor Author

Ive published to pypi from cloudbuild (https://pypi.org/project/squirrel-core/0.18.4.dev776/). This is the diff of the package metadata to the latest version (mostly dev dependencies removed from the published package)

 {
-    "filename":"squirrel_core-0.18.4.dev25985-py3-none-any.whl",
+    "filename":"squirrel_core-0.18.4.dev776-py3-none-any.whl",
     "metadata_version":"2.1",
     "name":"squirrel-core",
-    "version":"0.18.4.dev25985",
-    "summary":"Squirrel is a Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way.",
+    "version":"0.18.4.dev776",
+    "summary":"Squirrel is a Python library that enables ML teams to share, load, and transform data in a collaborative, flexible and efficient way.",
+    "home_page":"https://merantix-momentum.com/technology/squirrel/",
     "author":"Merantix Momentum",
     "license":"Apache 2.0",
     "classifiers":[
        "Development Status :: 5 - Production/Stable",
        "License :: OSI Approved :: Apache Software License",
-       "Programming Language :: Python :: 3.8",
+       "License :: Other/Proprietary License",
+       "Programming Language :: Python :: 3",
+       "Programming Language :: Python :: 3.9",
+       "Programming Language :: Python :: 3.10",
+       "Programming Language :: Python :: 3.11",
+       "Programming Language :: Python :: 3.9",
        "Typing :: Typed"
     ],
+    "requires_python":">=3.9,<4.0",
     "requires_dist":[
-       "fsspec (>=0.8.7)",
-       "msgpack",
-       "msgpack-numpy",
-       "more-itertools",
-       "pluggy",
-       "random-name",
-       "ruamel.yaml",
-       "tqdm",
-       "numba",
-       "numpy",
-       "pyjwt (>=2.4.0)",
-       "mako (>=1.2.2)",
-       "oauthlib (>=3.2.1)",
-       "aiohttp (>=3.7.4)",
-       "twine ; extra == 'all'",
-       "wheel ; extra == 'all'",
-       "pytest (>=6.2.1) ; extra == 'all'",
-       "pytest-timeout ; extra == 'all'",
-       "pytest-cov ; extra == 'all'",
-       "pytest-xdist ; extra == 'all'",
-       "wandb ; extra == 'all'",
-       "mlflow ; extra == 'all'",
-       "pre-commit (==2.16.0) ; extra == 'all'",
-       "pip-tools (>=6.6.2) ; extra == 'all'",
-       "sphinx ; extra == 'all'",
-       "sphinx-autoapi ; extra == 'all'",
-       "sphinxcontrib-mermaid ; extra == 'all'",
-       "sphinx-rtd-theme ; extra == 'all'",
-       "gcsfs (>=2021.06.0) ; extra == 'all'",
-       "adlfs (<2021.10) ; extra == 'all'",
-       "s3fs ; extra == 'all'",
-       "zarr (==2.10.3) ; extra == 'all'",
-       "pyarrow ; extra == 'all'",
-       "dask[dataframe,distributed] ; extra == 'all'",
-       "torch (>=1.13.1) ; extra == 'all'",
-       "odfpy ; extra == 'all'",
-       "openpyxl ; extra == 'all'",
-       "pyxlsb ; extra == 'all'",
-       "xlrd ; extra == 'all'",
-       "adlfs (<2021.10) ; extra == 'azure'",
-       "dask[dataframe,distributed] ; extra == 'dask'",
-       "twine ; extra == 'dev'",
-       "wheel ; extra == 'dev'",
-       "pytest (>=6.2.1) ; extra == 'dev'",
-       "pytest-timeout ; extra == 'dev'",
-       "pytest-cov ; extra == 'dev'",
-       "pytest-xdist ; extra == 'dev'",
-       "wandb ; extra == 'dev'",
-       "mlflow ; extra == 'dev'",
-       "pre-commit (==2.16.0) ; extra == 'dev'",
-       "pip-tools (>=6.6.2) ; extra == 'dev'",
-       "sphinx ; extra == 'dev'",
-       "sphinx-autoapi ; extra == 'dev'",
-       "sphinxcontrib-mermaid ; extra == 'dev'",
-       "sphinx-rtd-theme ; extra == 'dev'",
-       "odfpy ; extra == 'excel'",
-       "openpyxl ; extra == 'excel'",
-       "pyxlsb ; extra == 'excel'",
-       "xlrd ; extra == 'excel'",
-       "pyarrow ; extra == 'feather'",
-       "gcsfs (>=2021.06.0) ; extra == 'gcp'",
-       "pyarrow ; extra == 'parquet'",
-       "s3fs ; extra == 's3'",
-       "torch (>=1.13.1) ; extra == 'torch'",
-       "zarr (==2.10.3) ; extra == 'zarr'"
+       "adlfs (<2021.10) ; extra == \"azure\" or extra == \"all\"",
+       "aiohttp (>=3.7.4,<4.0.0)",
+       "dask[dataframe,distributed] (>=2021.7.0) ; extra == \"dask\" or extra == \"all\"",
+       "fsspec (>=2021.7.0)",
+       "gcsfs (>=2021.7.0) ; extra == \"gcp\" or extra == \"all\"",
+       "mako (>=1.2.2,<2.0.0)",
+       "more-itertools (>=9.0.0,<10.0.0)",
+       "msgpack (>=1.0.4,<2.0.0)",
+       "msgpack-numpy (>=0.4.8,<0.5.0)",
+       "numba (>=0.56.4,<0.57.0)",
+       "numpy (>=1.23.5,<2.0.0)",
+       "oauthlib (>=3.2.1,<4.0.0)",
+       "odfpy (>=1.4.1,<2.0.0) ; extra == \"excel\" or extra == \"all\"",
+       "openpyxl (>=3.1.1,<4.0.0) ; extra == \"excel\" or extra == \"all\"",
+       "pluggy (>=1.0.0,<2.0.0)",
+       "pyarrow (>=10.0.1,<11.0.0) ; extra == \"feather\" or extra == \"parquet\" or extra == \"all\"",
+       "pyjwt (>=2.4.0,<3.0.0)",
+       "pyxlsb (>=1.0.10,<2.0.0) ; extra == \"excel\" or extra == \"all\"",
+       "random-name (>=0.1.1,<0.2.0)",
+       "ruamel-yaml (>=0.17.21,<0.18.0)",
+       "s3fs (>=2021.7.0) ; extra == \"s3\" or extra == \"all\"",
+       "torch (>=1.13.1,<2.0.0) ; extra == \"torch\" or extra == \"all\"",
+       "tqdm (>=4.64.1,<5.0.0)",
+       "xlrd (>=2.0.1,<3.0.0) ; extra == \"excel\" or extra == \"all\"",
+       "zarr (>=2.10.3,<3.0.0) ; extra == \"zarr\" or extra == \"all\""
+    ],
+    "project_urls":[
+       "Documentation, https://squirrel-core.readthedocs.io/en/latest/",
+       "Repository, https://github.com/merantix-momentum/squirrel-core"
     ],
     "provides_extras":[
        "all",
        "azure",
        "dask",
-       "dev",
        "excel",
        "feather",
        "gcp",
        "zarr"
     ],
     "description_content_type":"text/markdown",
-    "description":"<div align=\"center\">\n\n# <img src=\"https://raw.githubusercontent.com/merantix-momentum/squirrel-core/main/docs/_static/logo.png\" width=\"150px\"> Squirrel Core\n\n**Share, load, and transform data in a collaborative, flexible, and efficient way**\n\n[![Python](https://img.shields.io/pypi/pyversions/squirrel-core.svg?style=plastic)](https://badge.fury.io/py/squirrel-core)\n[![PyPI](https://img.shields.io/pypi/v/squirrel-core?label=pypi%20package)](https://pypi.org/project/squirrel-core/)\n[![Conda](https://img.shields.io/conda/vn/conda-forge/squirrel-core)](https://anaconda.org/conda-forge/squirrel-core)\n[![Documentation Status](https://readthedocs.org/projects/squirrel-core/badge/?version=latest)](https://squirrel-core.readthedocs.io/en/latest)\n[![Downloads](https://static.pepy.tech/personalized-badge/squirrel-core?period=total&units=international_system&left_color=grey&right_color=blue&left_text=Downloads)](https://pepy.tech/project/squirrel-core)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://raw.githubusercontent.com/merantix-momentum/squirrel-core/main/LICENSE)\n[![DOI](https://zenodo.org/badge/458099869.svg)](https://zenodo.org/badge/latestdoi/458099869)\n[![Generic badge](https://img.shields.io/badge/Website-Merantix%20Momentum-blue)](https://merantix-momentum.com)\n[![Slack](https://img.shields.io/badge/slack-chat-green.svg?logo=slack)](https://join.slack.com/t/squirrel-core/shared_invite/zt-14k6sk6sw-zQPHfqAI8Xq5WYd~UqgNFw)\n\n</div>\n\n---\n\n## What is Squirrel?\n\nSquirrel is a Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way.\n\n1. **SPEED:** Avoid data stall, i.e. the expensive GPU will not be idle while waiting for the data.\n\n2. **COSTS:** First, avoid GPU stalling, and second allow to shard & cluster your data and store & load it in bundles, decreasing the cost for your data bucket cloud storage.\n\n3. **FLEXIBILITY:** Work with a flexible standard data scheme which is adaptable to any setting, including multimodal data.\n\n4. **COLLABORATION:** Make it easier to share data & code between teams and projects in a self-service model.\n\nStream data from anywhere to your machine learning model as easy as:\n```python\nit = (\n    Catalog.from_plugins()[\"imagenet\"]\n    .get_driver()\n    .get_iter(\"train\")\n    .map(lambda r: (augment(r[\"image\"]), r[\"label\"]))\n    .batched(100)\n)\n```\n\nCheck out our full [getting started](https://github.com/merantix-momentum/squirrel-datasets-core/blob/main/examples/01.Getting_Started.ipynb) tutorial notebook. If you have any questions or would like to contribute, join our [Slack community](https://join.slack.com/t/squirrel-core/shared_invite/zt-14k6sk6sw-zQPHfqAI8Xq5WYd~UqgNFw).\n\n## Installation\nYou can install `squirrel-core` by\n```shell\npip install squirrel-core\n```\n\nTo install all features and functionalities:\n\n```shell\npip install \"squirrel-core[all]\"\n```\n\nOr select the dependencies you need:\n\n```shell\npip install \"squirrel-core[gcs,torch]\"\n```\n\nPlease refer to the [installation](https://squirrel-core.readthedocs.io/en/latest/getting_started/installation.html) \nsection of the documentation for a complete list of supported dependencies.\n\n## Documentation\n\nRead our documentation at [ReadTheDocs](https://squirrel-core.readthedocs.io/en/latest)\n\n## Squirrel Datasets\n\n[Squirrel-datasets-core](https://github.com/merantix-momentum/squirrel-datasets-core) is an accompanying Python package that does three things.\n1. It extends the Squirrel platform for data transform, access, and discovery through custom drivers for public datasets. \n2. It also allows you to tap into the vast amounts of open-source datasets from [Huggingface](https://huggingface.co/), [Activeloop Hub](https://www.activeloop.ai/) and [Torchvision](https://pytorch.org/vision/stable/datasets.html), and you\\'ll get all of Squirrel\\'s functionality on top!\n3. It provides open-source and community-contributed [tutorials and example notebooks](https://github.com/merantix-momentum/squirrel-datasets-core/tree/main/examples) for using Squirrel.\n\n## Contributing\nSquirrel is open source and community contributions are welcome!\n\nCheck out the [contribution guide](https://squirrel-core.readthedocs.io/en/latest/developer/contribute.html) to learn how to get involved.\n\n## The Humans Behind Squirrel\nWe are [Merantix Momentum](https://merantix-momentum.com/), a team of ~30 machine learning engineers, developing machine learning solutions for industry and research. Each project comes with its own challenges, data types and learnings, but one issue we always faced was scalable data loading, transforming and sharing. We were looking for a solution that would allow us to load the data in a fast and cost-efficient way, while keeping the flexibility to work with any possible dataset and integrate with any API. That\\'s why we build Squirrel – and we hope you\\'ll find it as useful as we do! By the way, [we are hiring](https://merantix-momentum.com/about#jobs)!\n\n\n## Citation\n\nIf you use Squirrel in your research, please cite it using:\n```bibtex\n@article{2022squirrelcore,\n  title={Squirrel: A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way.},\n  author={Squirrel Developer Team},\n  journal={GitHub. Note: https://github.com/merantix-momentum/squirrel-core},\n  doi={10.5281/zenodo.6418280},\n  year={2022}\n}\n```\n"
+    "description":"<div align=\"center\">\n\n# <img src=\"https://raw.githubusercontent.com/merantix-momentum/squirrel-core/main/docs/_static/logo.png\" width=\"150px\"> Squirrel Core\n\n**Share, load, and transform data in a collaborative, flexible, and efficient way**\n\n[![Python](https://img.shields.io/pypi/pyversions/squirrel-core.svg?style=plastic)](https://badge.fury.io/py/squirrel-core)\n[![PyPI](https://img.shields.io/pypi/v/squirrel-core?label=pypi%20package)](https://pypi.org/project/squirrel-core/)\n[![Conda](https://img.shields.io/conda/vn/conda-forge/squirrel-core)](https://anaconda.org/conda-forge/squirrel-core)\n[![Documentation Status](https://readthedocs.org/projects/squirrel-core/badge/?version=latest)](https://squirrel-core.readthedocs.io/en/latest)\n[![Downloads](https://static.pepy.tech/personalized-badge/squirrel-core?period=total&units=international_system&left_color=grey&right_color=blue&left_text=Downloads)](https://pepy.tech/project/squirrel-core)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://raw.githubusercontent.com/merantix-momentum/squirrel-core/main/LICENSE)\n[![DOI](https://zenodo.org/badge/458099869.svg)](https://zenodo.org/badge/latestdoi/458099869)\n[![Generic badge](https://img.shields.io/badge/Website-Merantix%20Momentum-blue)](https://merantix-momentum.com)\n[![Slack](https://img.shields.io/badge/slack-chat-green.svg?logo=slack)](https://join.slack.com/t/squirrel-core/shared_invite/zt-14k6sk6sw-zQPHfqAI8Xq5WYd~UqgNFw)\n\n</div>\n\n---\n\n## What is Squirrel?\n\nSquirrel is a Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way.\n\n1. **SPEED:** Avoid data stall, i.e. the expensive GPU will not be idle while waiting for the data.\n\n2. **COSTS:** First, avoid GPU stalling, and second allow to shard & cluster your data and store & load it in bundles, decreasing the cost for your data bucket cloud storage.\n\n3. **FLEXIBILITY:** Work with a flexible standard data scheme which is adaptable to any setting, including multimodal data.\n\n4. **COLLABORATION:** Make it easier to share data & code between teams and projects in a self-service model.\n\nStream data from anywhere to your machine learning model as easy as:\n```python\nit = (\n    Catalog.from_plugins()[\"imagenet\"]\n    .get_driver()\n    .get_iter(\"train\")\n    .map(lambda r: (augment(r[\"image\"]), r[\"label\"]))\n    .batched(100)\n)\n```\n\nCheck out our full [getting started](https://github.com/merantix-momentum/squirrel-datasets-core/blob/main/examples/01.Getting_Started.ipynb) tutorial notebook. If you have any questions or would like to contribute, join our [Slack community](https://join.slack.com/t/squirrel-core/shared_invite/zt-14k6sk6sw-zQPHfqAI8Xq5WYd~UqgNFw).\n\n## Installation\nYou can install `squirrel-core` by\n```shell\npip install squirrel-core\n```\n\nTo install all features and functionalities:\n\n```shell\npip install \"squirrel-core[all]\"\n```\n\nOr select the dependencies you need:\n\n```shell\npip install \"squirrel-core[gcs,torch]\"\n```\n\nPlease refer to the [installation](https://squirrel-core.readthedocs.io/en/latest/getting_started/installation.html) \nsection of the documentation for a complete list of supported dependencies.\n\n## Documentation\n\nRead our documentation at [ReadTheDocs](https://squirrel-core.readthedocs.io/en/latest)\n\n## Squirrel Datasets\n\n[Squirrel-datasets-core](https://github.com/merantix-momentum/squirrel-datasets-core) is an accompanying Python package that does three things.\n1. It extends the Squirrel platform for data transform, access, and discovery through custom drivers for public datasets. \n2. It also allows you to tap into the vast amounts of open-source datasets from [Huggingface](https://huggingface.co/), [Activeloop Hub](https://www.activeloop.ai/) and [Torchvision](https://pytorch.org/vision/stable/datasets.html), and you\\'ll get all of Squirrel\\'s functionality on top!\n3. It provides open-source and community-contributed [tutorials and example notebooks](https://github.com/merantix-momentum/squirrel-datasets-core/tree/main/examples) for using Squirrel.\n\n## Contributing\nSquirrel is open source and community contributions are welcome!\n\nCheck out the [contribution guide](https://squirrel-core.readthedocs.io/en/latest/developer/contribute.html) to learn how to get involved.\n\n## The Humans Behind Squirrel\nWe are [Merantix Momentum](https://merantix-momentum.com/), a team of ~30 machine learning engineers, developing machine learning solutions for industry and research. Each project comes with its own challenges, data types and learnings, but one issue we always faced was scalable data loading, transforming and sharing. We were looking for a solution that would allow us to load the data in a fast and cost-efficient way, while keeping the flexibility to work with any possible dataset and integrate with any API. That\\'s why we build Squirrel – and we hope you\\'ll find it as useful as we do! By the way, [we are hiring](https://merantix-momentum.com/about#jobs)!\n\n\n## Citation\n\nIf you use Squirrel in your research, please cite it using:\n```bibtex\n@article{2022squirrelcore,\n  title={Squirrel: A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way.},\n  author={Squirrel Developer Team},\n  journal={GitHub. Note: https://github.com/merantix-momentum/squirrel-core},\n  doi={10.5281/zenodo.6418280},\n  year={2022}\n}\n```\n\n"
   }

@maik-schmidt maik-schmidt marked this pull request as ready for review February 19, 2023 15:56
pyproject.toml Outdated Show resolved Hide resolved
AlirezaSohofi
AlirezaSohofi previously approved these changes Feb 28, 2023
Copy link
Contributor

@AlirezaSohofi AlirezaSohofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

docs/developer/contribute.rst Outdated Show resolved Hide resolved
docs/developer/contribute.rst Show resolved Hide resolved
Copy link
Member

@ThomasWollmann ThomasWollmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR in conda and it worked. Steps:

  • conda env create -f .../sandbox.yaml
  • conda activate sandbox
  • poetry install --all-extras

my sandbox.yaml

name: sandbox
dependencies:
- python=3.9
- anaconda
- pip
- pip:
    - keyrings.google-artifactregistry-auth==1.1.1
    - poetry

@maik-schmidt maik-schmidt merged commit f68d4d2 into main Mar 2, 2023
@maik-schmidt maik-schmidt deleted the maik-poetry-setup branch March 2, 2023 12:52
@github-actions github-actions bot locked and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants