Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [ci] remove pin on dask and distributed in CI (fixes #4285) #4307

Closed
wants to merge 5 commits into from

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented May 20, 2021

#4288 introduced explicit pins on dask and distributed in this project's CI jobs, due to dask/distributed#4819.

Starting from dask / distributed 2021.5.0, Dask maintainers decided to pin distributed like dask=={version} to avoid such issues in the future. See dask/community#155 (comment)

dask-core and distributed 2021.5.0 are now available from the default conda channels. As soon as dask is available there as well, I think the workaround from #4288 can be reverted.

Just opening this PR for visibility. I think it will fail until dask is uploaded too

@jameslamb
Copy link
Collaborator Author

I checked https://repo.anaconda.com/pkgs/main/noarch/ this morning and dask=2021.5.0 is still not up 😞

@jameslamb
Copy link
Collaborator Author

I've learned some interesting things investigating this tonight, but I'm still really unsure what is happening.

On conda channels, there are two packages:

  • dask-core = the dask library with minimal dependencies
  • dask = the dask library + dependencies like pandas (for dask.dataframe) and numpy (for dask.array)

This is described at https://docs.dask.org/en/latest/install.html#conda.

So, said another way, the conda package dask is just a convenience. Since LightGBM's CI already installs numpy and pandas, I thought we could just conda install dask-core instead of conda install dask, and that that would avoid problems with dask and distributed being uploaded many days apart. dask-core seems to often be uploaded at roughly the same time as distributed (see the links in this PR's description).

However, I've found that sometimes a dependency on the conda package dask still gets pulled in in. I cannot figure out how to avoid that.

In a Python 3.8 environment, installing dask-core then distributed results in no dask, but an old version of distributed (version 2.10, January 2020).

docker run \
    --env CONDA_ENV=base \
    -it continuumio/miniconda3 \
    /bin/bash \
        -c "conda install -q -y -n base dask-core && 
            conda install -q -y -n base distributed &&
            conda list --explicit --name base"
https://repo.anaconda.com/pkgs/main/noarch/dask-core-2021.5.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/distributed-2.10.0-py_0.conda
full environment (click me)
https://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.conda
https://repo.anaconda.com/pkgs/main/linux-64/ca-certificates-2021.5.25-h06a4308_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/ld_impl_linux-64-2.33.1-h53a641e_7.conda
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-9.1.0-hdf63c60_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-9.1.0-hdf63c60_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/libffi-3.3-he6710b0_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.2-he6710b0_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/openssl-1.1.1k-h27cfd23_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/xz-5.2.5-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/yaml-0.2.5-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.11-h7b6447c_3.conda
https://repo.anaconda.com/pkgs/main/linux-64/libedit-3.1.20191231-h14c3975_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/readline-8.0-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/tk-8.6.10-hbc83047_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.33.0-h62c20be_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/python-3.8.5-h7579374_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/certifi-2020.12.5-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/chardet-3.0.4-py38h06a4308_1003.conda
https://repo.anaconda.com/pkgs/main/noarch/cloudpickle-1.6.0-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/fsspec-0.9.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/heapdict-1.0.1-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/idna-2.10-py_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/locket-0.2.1-py38h06a4308_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/msgpack-python-1.0.2-py38hff7bd54_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/psutil-5.8.0-py38h27cfd23_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/pycosat-0.6.3-py38h7b6447c_1.conda
https://repo.anaconda.com/pkgs/main/noarch/pycparser-2.20-py_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/pysocks-1.7.1-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/pyyaml-5.4.1-py38h27cfd23_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/ruamel_yaml-0.15.87-py38h7b6447c_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/six-1.15.0-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/sortedcontainers-2.3.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/tblib-1.7.0-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/toolz-0.11.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/tornado-6.1-py38h27cfd23_0.conda
https://repo.anaconda.com/pkgs/main/noarch/tqdm-4.51.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/wheel-0.35.1-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/zipp-3.4.1-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/cffi-1.14.3-py38h261ae71_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/conda-package-handling-1.7.2-py38h03888b9_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/cytoolz-0.11.0-py38h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/importlib-metadata-3.10.0-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/partd-1.2.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/setuptools-50.3.1-py38h06a4308_1.conda
https://repo.anaconda.com/pkgs/main/noarch/zict-2.0.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/brotlipy-0.7.0-py38h27cfd23_1003.conda
https://repo.anaconda.com/pkgs/main/noarch/click-8.0.1-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/cryptography-3.2.1-py38h3c74f83_1.conda
https://repo.anaconda.com/pkgs/main/noarch/dask-core-2021.5.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/pip-20.2.4-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/distributed-2.10.0-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/pyopenssl-19.1.0-pyhd3eb1b0_1.conda
https://repo.anaconda.com/pkgs/main/noarch/urllib3-1.25.11-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/requests-2.24.0-py_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/conda-4.10.1-py38h06a4308_1.conda

In a Python 3.8 environment, installing dask-core and distributed together pulls in a dependency on dask, which results in mismatched versions of dask and distributed.

docker run \
    --env CONDA_ENV=base \
    -it continuumio/miniconda3 \
    /bin/bash \
        -c "conda install -q -y -n base dask-core distributed && 
            conda list --explicit --name base"
https://repo.anaconda.com/pkgs/main/noarch/dask-core-2021.4.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/distributed-2021.5.0-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/dask-2021.4.0-pyhd3eb1b0_0.conda
full environment (click me)
https://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.conda
https://repo.anaconda.com/pkgs/main/linux-64/blas-1.0-mkl.conda
https://repo.anaconda.com/pkgs/main/linux-64/ca-certificates-2021.5.25-h06a4308_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/intel-openmp-2021.2.0-h06a4308_610.conda
https://repo.anaconda.com/pkgs/main/linux-64/ld_impl_linux-64-2.33.1-h53a641e_7.conda
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-9.1.0-hdf63c60_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-9.1.0-hdf63c60_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/mkl-2021.2.0-h06a4308_296.conda
https://repo.anaconda.com/pkgs/main/linux-64/jpeg-9b-h024ee3a_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/libffi-3.3-he6710b0_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/libwebp-base-1.2.0-h27cfd23_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/lz4-c-1.9.3-h2531618_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.2-he6710b0_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/openssl-1.1.1k-h27cfd23_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/xz-5.2.5-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/yaml-0.2.5-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.11-h7b6447c_3.conda
https://repo.anaconda.com/pkgs/main/linux-64/libedit-3.1.20191231-h14c3975_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/libpng-1.6.37-hbc83047_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/readline-8.0-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/tk-8.6.10-hbc83047_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/zstd-1.4.9-haebb681_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/freetype-2.10.4-h5ab3b9f_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/libtiff-4.2.0-h85742a9_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.33.0-h62c20be_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/lcms2-2.12-h3be6417_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/python-3.8.5-h7579374_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/certifi-2020.12.5-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/chardet-3.0.4-py38h06a4308_1003.conda
https://repo.anaconda.com/pkgs/main/noarch/cloudpickle-1.6.0-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/fsspec-0.9.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/heapdict-1.0.1-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/idna-2.10-py_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/locket-0.2.1-py38h06a4308_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/markupsafe-2.0.1-py38h27cfd23_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/msgpack-python-1.0.2-py38hff7bd54_1.conda
https://repo.anaconda.com/pkgs/main/noarch/olefile-0.46-py_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/psutil-5.8.0-py38h27cfd23_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/pycosat-0.6.3-py38h7b6447c_1.conda
https://repo.anaconda.com/pkgs/main/noarch/pycparser-2.20-py_2.conda
https://repo.anaconda.com/pkgs/main/noarch/pyparsing-2.4.7-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pysocks-1.7.1-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/pytz-2021.1-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/pyyaml-5.4.1-py38h27cfd23_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/ruamel_yaml-0.15.87-py38h7b6447c_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/six-1.15.0-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/sortedcontainers-2.3.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/tblib-1.7.0-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/toolz-0.11.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/tornado-6.1-py38h27cfd23_0.conda
https://repo.anaconda.com/pkgs/main/noarch/tqdm-4.51.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/typing_extensions-3.7.4.3-pyha847dfd_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/wheel-0.35.1-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/zipp-3.4.1-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/cffi-1.14.3-py38h261ae71_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/conda-package-handling-1.7.2-py38h03888b9_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/cytoolz-0.11.0-py38h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/importlib-metadata-3.10.0-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/mkl-service-2.3.0-py38h27cfd23_1.conda
https://repo.anaconda.com/pkgs/main/noarch/packaging-20.9-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/partd-1.2.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/pillow-8.2.0-py38he98fc37_0.conda
https://repo.anaconda.com/pkgs/main/noarch/python-dateutil-2.8.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/setuptools-50.3.1-py38h06a4308_1.conda
https://repo.anaconda.com/pkgs/main/noarch/zict-2.0.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/brotlipy-0.7.0-py38h27cfd23_1003.conda
https://repo.anaconda.com/pkgs/main/noarch/click-8.0.1-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/cryptography-3.2.1-py38h3c74f83_1.conda
https://repo.anaconda.com/pkgs/main/noarch/dask-core-2021.4.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/jinja2-3.0.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/numpy-base-1.20.2-py38hfae3a4d_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/pip-20.2.4-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/pyopenssl-19.1.0-pyhd3eb1b0_1.conda
https://repo.anaconda.com/pkgs/main/noarch/urllib3-1.25.11-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/requests-2.24.0-py_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/conda-4.10.1-py38h06a4308_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/bokeh-2.3.2-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/distributed-2021.5.0-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/mkl_fft-1.3.0-py38h42c9631_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/mkl_random-1.2.1-py38ha9443f7_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/numpy-1.20.2-py38h2d18471_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/pandas-1.2.4-py38h2531618_0.conda
https://repo.anaconda.com/pkgs/main/noarch/dask-2021.4.0-pyhd3eb1b0_0.conda

And I don't know a good communication channel to report this to. Unlike the dask, dask-core, and distributed packages on conda-forge, those on the Anaconda main channels are not managed by Dask's maintainers and they don't have much influence there: dask/distributed#4819 (comment).

Note that using only conda-forge instead, no transitive dependency on dask is introduced and the newest releases of dask-core and distributed are installed.

docker run \
    --env CONDA_ENV=base \
    -it continuumio/miniconda3 \
    /bin/bash \
        -c "conda install -q -y -c conda-forge -n base dask-core distributed && 
            conda list --explicit --name base"
https://conda.anaconda.org/conda-forge/noarch/dask-core-2021.5.1-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/distributed-2021.5.1-py38h578d9bd_0.tar.bz2
full environment (click me)
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2021.5.30-ha878542_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ld_impl_linux-64-2.33.1-h53a641e_7.conda
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-9.3.0-h6de172a_19.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libgomp-9.3.0-h2828fa1_19.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-1_gnu.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-9.3.0-h2828fa1_19.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libffi-3.3-he6710b0_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.2-he6710b0_1.conda
https://conda.anaconda.org/conda-forge/linux-64/openssl-1.1.1k-h7f98852_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/xz-5.2.5-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/yaml-0.2.5-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.11-h7b6447c_3.conda
https://repo.anaconda.com/pkgs/main/linux-64/libedit-3.1.20191231-h14c3975_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/readline-8.0-h7b6447c_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/tk-8.6.10-hbc83047_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.33.0-h62c20be_0.conda
https://repo.anaconda.com/pkgs/main/linux-64/python-3.8.5-h7579374_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/chardet-3.0.4-py38h06a4308_1003.conda
https://conda.anaconda.org/conda-forge/noarch/cloudpickle-1.6.0-py_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/fsspec-2021.5.0-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/heapdict-1.0.1-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/idna-2.10-py_0.conda
https://conda.anaconda.org/conda-forge/noarch/locket-0.2.0-py_2.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pycosat-0.6.3-py38h7b6447c_1.conda
https://repo.anaconda.com/pkgs/main/noarch/pycparser-2.20-py_2.conda
https://repo.anaconda.com/pkgs/main/linux-64/pysocks-1.7.1-py38h06a4308_0.conda
https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.8-1_cp38.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ruamel_yaml-0.15.87-py38h7b6447c_1.conda
https://repo.anaconda.com/pkgs/main/linux-64/six-1.15.0-py38h06a4308_0.conda
https://conda.anaconda.org/conda-forge/noarch/sortedcontainers-2.4.0-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/tblib-1.7.0-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/toolz-0.11.1-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/tqdm-4.51.0-pyhd3eb1b0_0.conda
https://repo.anaconda.com/pkgs/main/noarch/wheel-0.35.1-pyhd3eb1b0_0.conda
https://conda.anaconda.org/conda-forge/linux-64/certifi-2021.5.30-py38h578d9bd_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/cffi-1.14.3-py38h261ae71_2.conda
https://conda.anaconda.org/conda-forge/linux-64/click-8.0.1-py38h578d9bd_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/conda-package-handling-1.7.2-py38h03888b9_0.conda
https://conda.anaconda.org/conda-forge/linux-64/cytoolz-0.11.0-py38h497a2fe_3.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/msgpack-python-1.0.2-py38h1fd1430_1.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/partd-1.2.0-pyhd8ed1ab_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/psutil-5.8.0-py38h497a2fe_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pyyaml-5.4.1-py38h497a2fe_0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/tornado-6.1-py38h497a2fe_1.tar.bz2
https://conda.anaconda.org/conda-forge/noarch/zict-2.0.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/brotlipy-0.7.0-py38h27cfd23_1003.conda
https://repo.anaconda.com/pkgs/main/linux-64/cryptography-3.2.1-py38h3c74f83_1.conda
https://conda.anaconda.org/conda-forge/noarch/dask-core-2021.5.1-pyhd8ed1ab_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/setuptools-50.3.1-py38h06a4308_1.conda
https://conda.anaconda.org/conda-forge/linux-64/distributed-2021.5.1-py38h578d9bd_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pip-20.2.4-py38h06a4308_0.conda
https://repo.anaconda.com/pkgs/main/noarch/pyopenssl-19.1.0-pyhd3eb1b0_1.conda
https://repo.anaconda.com/pkgs/main/noarch/urllib3-1.25.11-py_0.conda
https://repo.anaconda.com/pkgs/main/noarch/requests-2.24.0-py_0.conda
https://conda.anaconda.org/conda-forge/linux-64/conda-4.10.1-py38h578d9bd_0.tar.bz2

Given all of the preceding information and the fact the the decision was made in #4054 to not use conda-forge, I'm going to investigate installing this libraries from PyPI.

@jameslamb
Copy link
Collaborator Author

Investigating #4285 (comment)

And of course this is suitable only in case these packages are updated simultaneously at PyPI, otherwise we gain nothing from switching to PyPI.

It looks like they are always released to PyPI on the same day.

dask distributed
2020.12.0 12/10/2021 12/10/2021
2021.1.0 1/15/2021 1/15/2021
2021.1.1 1/22/2021 1/22/2021
2021.2.0 2/5/2021 2/5/2021
2021.3.0 3/5/2021 3/5/2021
2021.3.1 3/26/2021 3/26/2021
2021.4.0 4/2/2021 4/2/2021
2021.4.1 4/23/2021 4/23/2021
2021.5.0 5/14/2021 5/14/2021
2021.5.1 5/28/2021 5/28/2021

@jmoralez
Copy link
Collaborator

jmoralez commented Jun 2, 2021

Hi, James. Just out of curiosity, why is installing from conda-forge being avoided? With conda-forge's dask the version mismatch doesn't seem to be a problem since each release has both dask-core and distributed with the same version:
https://github.com/conda-forge/dask-feedstock/blob/849f8b71ca796b785864faaef87f359e1d3940eb/recipe/meta.yaml#L24-L25

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jun 2, 2021

Hi, James. Just out of curiosity, why is installing from conda-forge being avoided? With conda-forge's dask the version mismatch doesn't seem to be a problem since each release has both dask-core and distributed with the same version:
https://github.com/conda-forge/dask-feedstock/blob/849f8b71ca796b785864faaef87f359e1d3940eb/recipe/meta.yaml#L24-L25

see #4054 (review)

@jameslamb
Copy link
Collaborator Author

A relevant issue was just opened in the Dask community repo: dask/community#160

And a corresponding one was opened at ContinuumIO/anaconda-issues, which looks like a place we might be able to go to ask about such things in the future. ContinuumIO/anaconda-issues#12447

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jun 3, 2021

good news: yesterday Anaconda published dask, dask-core, and distributed 2021.5.1 (the latest release) all at the same time, and committed to doing so from this point forward (ContinuumIO/anaconda-issues#12447 (comment))

bad news: in environments with version 2021.5.1 of all those libraries, LightGBM's tests are still breaking

For example, here's linux sdist:

https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=10228&view=logs&j=02a2c3ba-81f8-54e3-0767-5d5adbb0daa9&t=720ee3fa-96d4-5b47-dbf4-01607b74ade2

/opt/conda/envs/test-env/lib/python3.7/site-packages/dask/base.py:285: in compute
    (result,) = compute(self, traverse=False, **kwargs)
/opt/conda/envs/test-env/lib/python3.7/site-packages/dask/base.py:568: in compute
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
/opt/conda/envs/test-env/lib/python3.7/site-packages/dask/base.py:568: in <listcomp>
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
/opt/conda/envs/test-env/lib/python3.7/site-packages/dask/array/core.py:1069: in finalize
    return concatenate3(results)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arrays = [array([[28, 28, 24, ...,  2, 24, 25],
       [30, 11, 20, ..., 24, 13, 12],
       [13, 12, 14, ..., 21, 21, 22],
   ...0, 11, 20, ..., 11, 13, 12],
       [28, 28, 24, ..., 13, 24, 25],
       [ 1,  1,  1, ..., 19,  1,  1]], dtype=int32)]

    def concatenate3(arrays):
              ...
 >           result[idx] = arr
 E           ValueError: could not broadcast input array from shape (500,20) into shape (500,)
 
 /opt/conda/envs/test-env/lib/python3.7/site-packages/dask/array/core.py:4829: ValueError

It looks like we got the newest versions of the relevant dask libraries, but an older version of numpy.

numpy              pkgs/main/linux-64::numpy-1.20.2-py37h2d18471_0
pandas             pkgs/main/linux-64::pandas-1.2.4-py37h2531618_0

We got numpy 1.20.2 (March 27, 2021). The newest release is numpy 1.20.3 (May 10, 2021).

We got pandas 1.2.4 (April 12, 2021), the newest release.

full environment (click me)
attrs              pkgs/main/noarch::attrs-21.2.0-pyhd3eb1b0_0
blas               pkgs/main/linux-64::blas-1.0-mkl
bokeh              pkgs/main/linux-64::bokeh-2.3.2-py37h06a4308_0
click              pkgs/main/noarch::click-8.0.1-pyhd3eb1b0_0
cloudpickle        pkgs/main/noarch::cloudpickle-1.6.0-py_0
cycler             pkgs/main/linux-64::cycler-0.10.0-py37_0
cytoolz            pkgs/main/linux-64::cytoolz-0.11.0-py37h7b6447c_0
dask               pkgs/main/noarch::dask-2021.5.1-pyhd3eb1b0_0
dask-core          pkgs/main/noarch::dask-core-2021.5.1-pyhd3eb1b0_0
dbus               pkgs/main/linux-64::dbus-1.13.18-hb2f20db_0
distributed        pkgs/main/linux-64::distributed-2021.5.1-py37h06a4308_0
expat              pkgs/main/linux-64::expat-2.4.1-h2531618_2
fontconfig         pkgs/main/linux-64::fontconfig-2.13.1-h6c09931_0
freetype           pkgs/main/linux-64::freetype-2.10.4-h5ab3b9f_0
fsspec             pkgs/main/noarch::fsspec-0.9.0-pyhd3eb1b0_0
glib               pkgs/main/linux-64::glib-2.68.2-h36276a3_0
gst-plugins-base   pkgs/main/linux-64::gst-plugins-base-1.14.0-h8213a91_2
gstreamer          pkgs/main/linux-64::gstreamer-1.14.0-h28cd5cc_2
heapdict           pkgs/main/noarch::heapdict-1.0.1-py_0
icu                pkgs/main/linux-64::icu-58.2-he6710b0_3
importlib-metadata pkgs/main/linux-64::importlib-metadata-3.10.0-py37h06a4308_0
importlib_metadata pkgs/main/noarch::importlib_metadata-3.10.0-hd3eb1b0_0
iniconfig          pkgs/main/noarch::iniconfig-1.1.1-pyhd3eb1b0_0
intel-openmp       pkgs/main/linux-64::intel-openmp-2021.2.0-h06a4308_610
jinja2             pkgs/main/noarch::jinja2-3.0.0-pyhd3eb1b0_0
joblib             pkgs/main/noarch::joblib-1.0.1-pyhd3eb1b0_0
jpeg               pkgs/main/linux-64::jpeg-9b-h024ee3a_2
kiwisolver         pkgs/main/linux-64::kiwisolver-1.3.1-py37h2531618_0
lcms2              pkgs/main/linux-64::lcms2-2.12-h3be6417_0
libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
libpng             pkgs/main/linux-64::libpng-1.6.37-hbc83047_0
libtiff            pkgs/main/linux-64::libtiff-4.2.0-h85742a9_0
libuuid            pkgs/main/linux-64::libuuid-1.0.3-h1bed415_2
libwebp-base       pkgs/main/linux-64::libwebp-base-1.2.0-h27cfd23_0
libxcb             pkgs/main/linux-64::libxcb-1.14-h7b6447c_0
libxml2            pkgs/main/linux-64::libxml2-2.9.10-hb55368b_3
locket             pkgs/main/linux-64::locket-0.2.1-py37h06a4308_1
lz4-c              pkgs/main/linux-64::lz4-c-1.9.3-h2531618_0
markupsafe         pkgs/main/linux-64::markupsafe-2.0.1-py37h27cfd23_0
matplotlib         pkgs/main/linux-64::matplotlib-3.3.4-py37h06a4308_0
matplotlib-base    pkgs/main/linux-64::matplotlib-base-3.3.4-py37h62a2d02_0
mkl                pkgs/main/linux-64::mkl-2021.2.0-h06a4308_296
mkl-service        pkgs/main/linux-64::mkl-service-2.3.0-py37h27cfd23_1
mkl_fft            pkgs/main/linux-64::mkl_fft-1.3.0-py37h42c9631_2
mkl_random         pkgs/main/linux-64::mkl_random-1.2.1-py37ha9443f7_2
more-itertools     pkgs/main/noarch::more-itertools-8.7.0-pyhd3eb1b0_0
msgpack-python     pkgs/main/linux-64::msgpack-python-1.0.2-py37hff7bd54_1
numpy              pkgs/main/linux-64::numpy-1.20.2-py37h2d18471_0
numpy-base         pkgs/main/linux-64::numpy-base-1.20.2-py37hfae3a4d_0
olefile            pkgs/main/linux-64::olefile-0.46-py37_0
packaging          pkgs/main/noarch::packaging-20.9-pyhd3eb1b0_0
pandas             pkgs/main/linux-64::pandas-1.2.4-py37h2531618_0
partd              pkgs/main/noarch::partd-1.2.0-pyhd3eb1b0_0
pcre               pkgs/main/linux-64::pcre-8.44-he6710b0_0
pillow             pkgs/main/linux-64::pillow-8.2.0-py37he98fc37_0
pluggy             pkgs/main/linux-64::pluggy-0.13.1-py37h06a4308_0
psutil             pkgs/main/linux-64::psutil-5.8.0-py37h27cfd23_1
py                 pkgs/main/noarch::py-1.10.0-pyhd3eb1b0_0
pyparsing          pkgs/main/noarch::pyparsing-2.4.7-pyhd3eb1b0_0
pyqt               pkgs/main/linux-64::pyqt-5.9.2-py37h05f1152_2
pytest             pkgs/main/linux-64::pytest-6.2.3-py37h06a4308_2
python-dateutil    pkgs/main/noarch::python-dateutil-2.8.1-pyhd3eb1b0_0
pytz               pkgs/main/noarch::pytz-2021.1-pyhd3eb1b0_0
pyyaml             pkgs/main/linux-64::pyyaml-5.4.1-py37h27cfd23_1
qt                 pkgs/main/linux-64::qt-5.9.7-h5867ecd_1
scikit-learn       pkgs/main/linux-64::scikit-learn-0.24.2-py37ha9443f7_0
scipy              pkgs/main/linux-64::scipy-1.6.2-py37had2a1c9_1
sip                pkgs/main/linux-64::sip-4.19.8-py37hf484d3e_0
six                pkgs/main/linux-64::six-1.15.0-py37h06a4308_0
sortedcontainers   pkgs/main/noarch::sortedcontainers-2.3.0-pyhd3eb1b0_0
tblib              pkgs/main/noarch::tblib-1.7.0-py_0
threadpoolctl      pkgs/main/noarch::threadpoolctl-2.1.0-pyh5ca1d4c_0
toml               pkgs/main/noarch::toml-0.10.2-pyhd3eb1b0_0
toolz              pkgs/main/noarch::toolz-0.11.1-pyhd3eb1b0_0
tornado            pkgs/main/linux-64::tornado-6.1-py37h27cfd23_0
typing_extensions  pkgs/main/noarch::typing_extensions-3.7.4.3-pyha847dfd_0
yaml               pkgs/main/linux-64::yaml-0.2.5-h7b6447c_0
zict               pkgs/main/noarch::zict-2.0.0-pyhd3eb1b0_0
zipp               pkgs/main/noarch::zipp-3.4.1-pyhd3eb1b0_0
zstd               pkgs/main/linux-64::zstd-1.4.9-haebb681_0

I need to do some investigation to understand why these tests are breaking.

@jmoralez
Copy link
Collaborator

jmoralez commented Jun 4, 2021

Hi James. It seems that the drop_axis=1 trick doesn't work every time anymore. I believe the predict function is being called once already to determine the meta (numpy array, cupy, etc). So maybe we could call the predict function on a 1 row array to get the output shape and modify the call to map_blocks?

I did some modifications to lgb.dask._predict and the tests pass, except for test_classifier_pred_contrib[multiclass-classification-scipy_csr_matrix] where the output of the predict function is a list:

[<1x3 sparse matrix of type '<class 'numpy.float64'>'                                                                                       
        with 3 stored elements in Compressed Sparse Row format>, <1x3 s...format>, <1x3 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in Compressed Sparse Row format>]

My modification was:

elif isinstance(data, dask_Array):
    predict_fn = partial(
        _predict_part,
        model=model,
        raw_score=raw_score,
        pred_proba=pred_proba,
        pred_leaf=pred_leaf,
        pred_contrib=pred_contrib,
    )
    data_row = data[[0]].compute()  # maybe model.client.compute(data[[0]])?
    pred_row = predict_fn(data_row)
    chunks = (data.chunks[0],)
    if len(pred_row.shape) > 1:
        chunks += (pred_row.shape[1],)
    else:
        kwargs['drop_axis'] = 1
    return data.map_blocks(
        predict_fn,
        chunks=chunks,
        meta=np.array([], dtype=dtype),
        **kwargs
    )

I was hoping meta worked the same way as for dataframes taking the shape and dtype of the output but seems like here it's just used for different types of arrays.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jun 5, 2021

Thanks @jmoralez ! Great investigation. Are you interested in submitting a PR with that fix? If not or if you don't have the time, I can do that as well.

I have some other ideas too, now that you've helped us narrow it down to that place in the code:

  1. The difference in multiclass-classification-scipy_csr_matrix is expected, and documented as a feature request in [dask] Result shape from DaskLGBMClassifier.predict(pred_contrib=True) for CSC matrices is inconsistent with LGBMClassifier #3881
  2. It still must be the root cause you identified PLUS something else specific to certain versions of dask / distributed / pandas / numpy, right? Since the Dask tests have been consistently passing for at least the last month.
  3. I'd like to explore if it's possible to figure out meta without running predict on one row. For example I know "if raw_score=True, the output will be a 1-D array for all cases except multiclass classification and have shape (X.shape[0], model.n_classes_) for multiclass classification". I'm not sure yet if that would be more or less fragile than calling predict() on one row but it's something I'd like to think about.

@jmoralez
Copy link
Collaborator

jmoralez commented Jun 7, 2021

Sure, I can make a PR. Should I include the changes you've made here to the environment as well?

@jameslamb
Copy link
Collaborator Author

Sure, I can make a PR. Should I include the changes you've made here to the environment as well?

yes please, thank you

@jameslamb jameslamb changed the title WIP: [ci] remove floor on dask and distributed in CI (fixes #4285) WIP: [ci] remove pin on dask and distributed in CI (fixes #4285) Jun 14, 2021
@jameslamb
Copy link
Collaborator Author

Closing this in favor of #4351

@jameslamb jameslamb closed this Jul 7, 2021
@StrikerRUS StrikerRUS deleted the ci/dask-pin branch July 7, 2021 12:24
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants