Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [ci] Dask compatibility with Python 3.8 and Pandas 2.0 #6030

Closed
shiyu1994 opened this issue Aug 11, 2023 · 8 comments · Fixed by #6032
Closed

[RFC] [ci] Dask compatibility with Python 3.8 and Pandas 2.0 #6030

shiyu1994 opened this issue Aug 11, 2023 · 8 comments · Fixed by #6032

Comments

@shiyu1994
Copy link
Collaborator

Description

Dask-core version 2022.12.1 is not compatible with latest pandas versions (>= 2.0). For example, the ci task fails with the combination of dask-core-2022.12.1 and pandas-2.0.3.
https://github.com/microsoft/LightGBM/actions/runs/5820545736/job/15787141680?pr=6028
To fix this issue, we want to upgrade dask to a newer version. This job runs with python 3.8. However, I did not figure out the way to install latest dask for Python 3.8, and the document of dask says that it supports only python 3.9, 3.10 and 3.11.
https://docs.dask.org/en/latest/develop.html#python-versions
Python 3.8 support for dask was drop in May 2023 https://docs.dask.org/en/stable/changelog.html#v2023-5-1

So it seems that we have two choices:

  1. Keep python 3.8 support and ci jobs, downgrade pandas in the ci job for python 3.8 to 1.5.3.
  2. Drop support for python 3.8.

Would like to hear your opinions @jameslamb @jmoralez @guolinke @StrikerRUS

@shiyu1994
Copy link
Collaborator Author

Link #6028, which fails the ci due to this.

@guolinke
Copy link
Collaborator

I prefer option 1.

@jameslamb
Copy link
Collaborator

Thanks for opening this!

It's helpful to include the exact text of any logs / error messages that lead to a conclusion like "Dask-core version 2022.12.1 is not compatible with latest pandas versions (>= 2.0)", instead of just links to CI. That way, this issue and discussion can be found by others facing the same problem.

Especially since those links only last for a few days before CI services tend to delete the builds.

Here's what I see:

E AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'

This looks like exactly the issue I reported in dask/dask a few months ago: dask/dask#10164


I did not figure out the way to install latest dask for Python 3.8

I think it's important to note that not all Python 3.8 jobs are failing.

❌ On #6028, only the mpi wheel (macOS-latest, Python 3.8) job is failing. (build link)

dask-core-2022.12.1        |     pyhd8ed1ab_0         806 KB  conda-forge
pandas-2.0.3               |   py38h78e6021_1        11.0 MB  conda-forge
python-3.8.17              |hf9b03c3_0_cpython        13.4 MB  conda-forge

Other Python 3.8 jobs are succeeding:

✅ (Azure) Linux_latest_mpi_wheel: (build link)

dask-core-2023.5.0         |     pyhd8ed1ab_0         825 KB  conda-forge
pandas-2.0.3               |   py38h01efb38_1        11.8 MB  conda-forge
python-3.8.17              |he550d4f_0_cpython        23.4 MB  conda-forge

✅ (Azure) Linux_latest_gpu_wheel (build link)

dask-core-2023.5.0         |     pyhd8ed1ab_0         825 KB  conda-forge
pandas-2.0.3               |   py38h01efb38_1        11.8 MB  conda-forge
python-3.8.17              |he550d4f_0_cpython        23.4 MB  conda-forge

✅ (Azure) Linux bdist (build link)

dask-core-2023.5.0         |     pyhd8ed1ab_0         825 KB  conda-forge
pandas-2.0.3               |   py38h01efb38_1        11.8 MB  conda-forge
python-3.8.17              |he550d4f_0_cpython        23.4 MB  conda-forg

✅ (Azure) Linux mpi_source (build link)

dask-core-2023.5.0         |     pyhd8ed1ab_0         825 KB  conda-forge
pandas-2.0.3               |   py38h01efb38_1        11.8 MB  conda-forge
python-3.8.17              |he550d4f_0_cpython        23.4 MB  conda-forge

I suspect that maybe there's an issue with the conda-forge metadata of one of the dependencies that ends up getting installed in that mpi wheel (macOS-latest, Python 3.8) CI job. That's the only CI job LightGBM has that tests Python 3.8 on macOS... so I'm guessing it's specific to one of the macOS release channels for conda-forge.

Such things do sometimes resolve themselves. I just re-triggered the one failing job on #6028 : https://github.com/microsoft/LightGBM/actions/runs/5820545736/job/15820939242?pr=6028. Let's see if if builds successfully.

If not, then I'd support modifying these lines

LightGBM/.ci/test.sh

Lines 125 to 126 in 20975ba

dask-core \
distributed \

To

'dask-core>=2023.5.0' \
'distributed>=2023.5.0' \

To try to force conda to find a solution that involves new-enough Dask versions that they'll work with pandas>=2.0.

I do not support dropping Python 3.8 support in LightGBM over this.

@shiyu1994
Copy link
Collaborator Author

@jameslamb. Good suggestions. It would be a perfect solution if we can have dask-core>=2023.5.0 installed for python 3.8. I tried to do this with my own dev machine but failed. Maybe we can have a PR to see if it works in the MacOS agents.

$ conda create -q -n py-env python=3.8 'dask-core>=2023.5.0'  pandas 
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                                                                                                                                                                                                                     

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package libuuid conflicts for:
pandas -> python[version='>=3.11,<3.12.0a0'] -> libuuid[version='>=1.0.3,<2.0a0|>=1.41.5,<2.0a0']
dask-core[version='>=2023.5.0'] -> python[version='>=3.10,<3.11.0a0'] -> libuuid[version='>=1.0.3,<2.0a0|>=1.41.5,<2.0a0']

Package python conflicts for:
python=3.8
dask-core[version='>=2023.5.0'] -> click[version='>=8.0'] -> python[version='>=3.5|>=3.6|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.7|>=3.6,<3.7.0a0']
dask-core[version='>=2023.5.0'] -> python[version='>=3.10,<3.11.0a0|>=3.9,<3.10.0a0|>=3.11,<3.12.0a0']

Package ca-certificates conflicts for:
python=3.8 -> openssl[version='>=3.0.9,<4.0a0'] -> ca-certificates
pandas -> python[version='>=2.7,<2.8.0a0'] -> ca-certificatesThe following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.27=0
  - feature:|@/linux-64::__glibc==2.27=0
  - python=3.8 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.27

@jameslamb
Copy link
Collaborator

Ah interesting! That's very helpful. I can do some investigation today and open up a PR trying different combinations.

@shiyu1994
Copy link
Collaborator Author

I've opened a PR. You may use that branch for the trials as well. #6032

@jameslamb
Copy link
Collaborator

oh great, thank you!

shiyu1994 added a commit that referenced this issue Aug 11, 2023
… (#6032)

* enforce dask version to be >=2023.5.0

* fix Python 3.7 builds

---------

Co-authored-by: James Lamb <jaylamb20@gmail.com>
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot removed the blocking label Nov 15, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants