Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot install horovod[spark] for Tensorflow 2.6 #3275

Closed
LifengWang opened this issue Nov 16, 2021 · 8 comments · Fixed by #3301
Closed

Cannot install horovod[spark] for Tensorflow 2.6 #3275

LifengWang opened this issue Nov 16, 2021 · 8 comments · Fixed by #3301
Labels

Comments

@LifengWang
Copy link

LifengWang commented Nov 16, 2021

Environment:

  1. Framework: TensorFlow
  2. Framework version:2.6.2
  3. Horovod version: 0.23
  4. MPI version:4.1.1
  5. CUDA version:N/A
  6. NCCL version:N/A
  7. Python version: 3.7
  8. Spark / PySpark version: 2.4.5
  9. Ray version:N/A
  10. OS and version: RHEL 8.4
  11. GCC version: 9.3.0
  12. CMake version: 3.5.0

Checklist:

  1. Did you search issues to find if somebody asked this question before? Yes
  2. If your question is about hang, did you read this doc? N/A
  3. If your question is about docker, did you read this doc? N/A
  4. Did you check if you question is answered in the [troubleshooting guide] (https://github.com/horovod/horovod/blob/master/docs/troubleshooting.rst)? Yes

Bug report:
Please describe erroneous behavior you're observing and steps to reproduce it.

Installing collected packages: pyparsing, pycparser, pyzmq, pyyaml, pyarrow, psutil, packaging, future, fsspec, diskcache, dill, cloudpickle, cffi, petastorm, horovod, h5py
  Attempting uninstall: h5py
    Found existing installation: h5py 3.1.0
    Uninstalling h5py-3.1.0:
      Successfully uninstalled h5py-3.1.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.6.2 requires h5py~=3.1.0, but you have h5py 2.10.0 which is incompatible.

Reproduce Steps:

  1. conda create -n horovod python=3.7
  2. conda activate horovod
  3. conda install pyspark=2.4.5 openmpi-mpicc cmake -c conda-forge
  4. pip install tensorflow==2.6.2
  5. HOROVOD_WITH_MPI=1 HOROVOD_WITH_TENSORFLOW=1 pip install horovod[spark]
@LifengWang LifengWang added the bug label Nov 16, 2021
@EnricoMi
Copy link
Collaborator

Which pip version do you use?

@LifengWang
Copy link
Author

Hi, @EnricoMi . The pip version I used is 21.0.1.

@EnricoMi
Copy link
Collaborator

EnricoMi commented Nov 16, 2021

I think the h5py==2.10.0 is needed for TF < 2.5. Can you try

HOROVOD_WITH_MPI=1 HOROVOD_WITH_TENSORFLOW=1 pip install horovod[spark] h5py~=3.1.0

@EnricoMi
Copy link
Collaborator

@tgaddair we are seeing the same error in our CI builds, but build does not fail:

https://github.com/horovod/horovod/runs/4199828356?check_suite_focus=true#step:10:7398

Successfully built horovod
Installing collected packages: pycparser, pyzmq, pyarrow, psutil, hiredis, diskcache, dill, cloudpickle, cffi, petastorm, horovod, h5py, aioredis
  changing mode of /usr/local/bin/plasma_store to 755
  changing mode of /usr/local/bin/petastorm-copy-dataset.py to 755
  changing mode of /usr/local/bin/petastorm-generate-metadata.py to 755
  changing mode of /usr/local/bin/petastorm-throughput.py to 755
  changing mode of /usr/local/bin/horovodrun to 755
  Attempting uninstall: h5py
    Found existing installation: h5py 3.1.0
    Uninstalling h5py-3.1.0:
      Removing file or directory /usr/local/lib/python3.7/dist-packages/h5py-3.1.0.dist-info/
      Removing file or directory /usr/local/lib/python3.7/dist-packages/h5py.libs/
      Removing file or directory /usr/local/lib/python3.7/dist-packages/h5py/
      Successfully uninstalled h5py-3.1.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-cpu 2.6.0 requires h5py~=3.1.0, but you have h5py 2.10.0 which is incompatible.
Successfully installed aioredis-1.3.1 cffi-1.15.0 cloudpickle-2.0.0 dill-0.3.4 diskcache-5.2.1 h5py-2.10.0 hiredis-2.0.0 horovod-0.23.0 petastorm-0.11.3 psutil-5.8.0 pyarrow-6.0.0 pycparser-2.21 pyzmq-22.3.0

It successfully installs horovod and then, due to horovod[spark] applies the h5py<3 constraint, which conflicts with TF's dependency. I think h5py<3 is only needed for tf<2.5, at least that is what Dockerfile.test.cpu says:

# Pin h5py only for tensorflow<2.5: https://github.com/h5py/h5py/issues/1732

We cannot articulate that dependency based on the installed tensorflow version. I think we should remove h5py<3 entirely from the spark_require_list in setup.py as we have TF 2.5, 2.6 and 2.7.

@Tony-Feng
Copy link

This is the same issue I encountered when I tried to build a docker image using TensorFlow 2.5 with horovod[spark]. However, the h5py<3 constraint broke the building process. I'm wondering if there would be workaround for this? Thanks in advance.

@EnricoMi
Copy link
Collaborator

This should be fixed in the next release.

@Tony-Feng
Copy link

That's great! Thanks a lot!

@EnricoMi
Copy link
Collaborator

EnricoMi commented Mar 2, 2022

This has now been released in v0.24.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants