ENH: Add conda subpackages corresponding to pip extras #52490

jamesmyatt · 2023-04-06T14:26:57Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

It would be good to be able to install extras along with pandas using conda as well as pip, since 2.0.0. For example:

conda install pandas-performance

should be equivalent to

python -m pip install pandas[performance]

This matches lots of other packages, such as matplotlib, seaborn, dvc, black, etc. e.g. https://github.com/conda-forge/matplotlib-feedstock/blob/main/recipe/meta.yaml, https://dvc.org/doc/install/linux#install-with-conda, https://github.com/conda-forge/black-feedstock/blob/main/recipe/meta.yaml

Feature Description

Use subpackages in https://github.com/conda-forge/pandas-feedstock/

Alternative Solutions

Current situation:

At the start of every project using conda (or when updating the requirements), the user must find the right part of the pandas docs, read it to work out the correct minimum version of optional dependencies they need, map the pypi package names to conda ones and then add those explicitly to their environment.yml file.

Additional Context

Suggest defining both pandas-base (or -core) to match pandas exactly, then pandas that just depends on pandas-base but could be expanded with recommended but not mandatory dependencies, plus all of the non-development and non-complete extras from pyproject.toml.

I can work on this PR for the pandas feedstock it's welcome.

update: Added alternative solution to describe what currently happens.

The text was updated successfully, but these errors were encountered:

youcanbekingagain · 2023-04-06T15:47:46Z

(I am a beginner, it's my first issue) I am not sure what to do exactly
[(https://setuptools.pypa.io/en/latest/userguide/dependency_management.html#optional-dependencies)]
I can make changes in setup.py as -
extras_require = { 'plotter': ['matplotlib', 'seaborn', 'plotly'] }
and changes in pyproject.toml -
[project.optional-dependencies] performance = ["performance"]
I dont't understand for different commands pandas-core and pandas-base you asked about as I can see only one project in project.toml and also about the pandas feedstock

lithomas1 · 2023-04-09T22:56:28Z

Sorry, but I am -1 on this. I think we have way to many groups for this to be sustainable.

I would rather conda actually add proper support for this (xref conda/conda#11053).

I'll leave this issue open, though, if any other people want to give feedback.

jamesmyatt · 2023-04-11T15:41:15Z

Thanks for your comment @lithomas1 , but I'm not sure that waiting for conda to implement some significant new functionality is a viable alternative, when there is a well functioning, widely adopted alternative of using conda subpackages. Besides, I don't see that proposal avoiding duplication since you need to manually map pypi packages to conda ones anyway. And there's no single source of truth for the minimum dependency versions anyway since they're also duplicated in the docs: https://github.com/pandas-dev/pandas/blob/main/doc/source/getting_started/install.rst.

Like you, I also don't like the duplication between the pyproject file and the conda recipe, but much more than that I don't like having to go to the pandas docs every time I start a new project and work out the minimum versions of all of the dependencies I want and then write this in my environment.yml file.

  - pandas >=2.0.0
  - bottleneck >=1.3.4
  - numba >=0.55.2
  - numexpr >=2.8.0
  - pyarrow >=7.0.0
  - matplotlib-base >=3.6.1  # Note this is not matplotlib which includes more optional dependencies

when I could just write this instead

  - pandas-performance >=2.0.0
  - pandas-parquet >=2.0.0
  - pandas-plot >=2.0.0

Nor is ignoring conda users a good idea either. The pip extras were added in the pandas 2.0 for a very good reason: it saves a lot of people time and makes it much more user-friendly. #39164.

A more valuable package manager change would be to allow pip check to check arbitrary install specs, e.g. pip check pandas[performance] rather than just checking that the current environment is consistent, since then the tests in the recipe would be able to check against the right extras. But again, I don't see waiting for another slow-moving project to make changes as a viable strategy.

lithomas1 · 2023-04-11T18:44:00Z

Thanks for the feedback. I don't think my opinion has changed, though.

Thanks for your comment @lithomas1 , but I'm not sure that waiting for conda to implement some significant new functionality is a viable alternative, when there is a well functioning, widely adopted alternative of using conda subpackages. Besides, I don't see that proposal avoiding duplication since you need to manually map pypi packages to conda ones anyway. And there's no single source of truth for the minimum dependency versions anyway since they're also duplicated in the docs: https://github.com/pandas-dev/pandas/blob/main/doc/source/getting_started/install.rst.

Like you, I also don't like the duplication between the pyproject file and the conda recipe, but much more than that I don't like having to go to the pandas docs every time I start a new project and work out the minimum versions of all of the dependencies I want and then write this in my environment.yml file.

This doesn't solve the issue, but you can try looking through our CI env files (e.g. https://github.com/pandas-dev/pandas/blob/main/ci/deps/actions-38.yaml, they are all under the ci/deps folder). I believe, all dependencies there specify a minimum version (sans a couple).

  - pandas >=2.0.0
  - bottleneck >=1.3.4
  - numba >=0.55.2
  - numexpr >=2.8.0
  - pyarrow >=7.0.0
  - matplotlib-base >=3.6.1  # Note this is not matplotlib which includes more optional dependencies

when I could just write this instead

  - pandas-performance >=2.0.0
  - pandas-parquet >=2.0.0
  - pandas-plot >=2.0.0

My main gripe is that this clutters up the conda-forge channel (xref conda-forge/conda-forge.github.io#1558).

Nor is ignoring conda users a good idea either. The pip extras were added in the pandas 2.0 for a very good reason: it saves a lot of people time and makes it much more user-friendly. #39164.

I understand that this is annoying, but I don't think it's reasonable or sustainable to ask every project with conda packages and pip extras, to hack around the issue like this.

This is not a pandas-specific problem, but a conda problem, and I would like it fixed in the right place.

jamesmyatt added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2023

DeaMariaLeon added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add conda subpackages corresponding to pip extras #52490

ENH: Add conda subpackages corresponding to pip extras #52490

jamesmyatt commented Apr 6, 2023 •

edited

Loading

youcanbekingagain commented Apr 6, 2023

lithomas1 commented Apr 9, 2023

jamesmyatt commented Apr 11, 2023 •

edited

Loading

lithomas1 commented Apr 11, 2023

ENH: Add conda subpackages corresponding to pip extras #52490

ENH: Add conda subpackages corresponding to pip extras #52490

Comments

jamesmyatt commented Apr 6, 2023 • edited Loading

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

youcanbekingagain commented Apr 6, 2023

lithomas1 commented Apr 9, 2023

jamesmyatt commented Apr 11, 2023 • edited Loading

lithomas1 commented Apr 11, 2023

jamesmyatt commented Apr 6, 2023 •

edited

Loading

jamesmyatt commented Apr 11, 2023 •

edited

Loading