Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] pip install deepspeed==0.13.0 fails #4984

Closed
apoorvkh opened this issue Jan 20, 2024 · 5 comments · Fixed by #5001
Closed

[BUG] pip install deepspeed==0.13.0 fails #4984

apoorvkh opened this issue Jan 20, 2024 · 5 comments · Fixed by #5001
Assignees

Comments

@apoorvkh
Copy link

I commented on the release/build commit, but reposting as an issue for better visibility.

Installing fails on my end, when trying to install deepspeed==0.13.0 with

conda create -p ./.venv python=3.9.18
conda activate ./.venv
pip install deepspeed==0.13.0

(see error below)

Solutions: I think torch should either become a build requirement for deepspeed or the build code should be adjusted to prevent this error. This build time error seems to have been introduced in one of the commits in v0.13.0. Then, I think 1c8b8f3 can be reverted.

Error

Collecting deepspeed==0.13.0
  Downloading deepspeed-0.13.0.tar.gz (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 22.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [27 lines of output]
      Traceback (most recent call last):
        File "/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/op_builder/xpu/builder.py", line 14, in <module>
          from op_builder.builder import OpBuilder, TORCH_MAJOR, TORCH_MINOR
      ImportError: cannot import name 'TORCH_MAJOR' from 'op_builder.builder' (/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/op_builder/builder.py)
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/setup.py", line 37, in <module>
          from op_builder import get_default_compute_capabilities, OpBuilder
        File "/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/op_builder/__init__.py", line 48, in <module>
          module = importlib.import_module(f".{module_name}", package=op_builder_dir)
        File ".venv/lib/python3.9/importlib/__init__.py", line 127, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
        File "/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/op_builder/xpu/__init__.py", line 6, in <module>
          from .cpu_adam import CPUAdamBuilder
        File "/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/op_builder/xpu/cpu_adam.py", line 6, in <module>
          from .builder import SYCLOpBuilder
        File "/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/op_builder/xpu/builder.py", line 16, in <module>
          from deepspeed.ops.op_builder.builder import OpBuilder, TORCH_MAJOR, TORCH_MINOR
        File "/tmp/pip-install-3_cqq7tl/deepspeed_405eafe5b6ee43f49e3b9272d170d8ba/deepspeed/__init__.py", line 10, in <module>
          import torch
      ModuleNotFoundError: No module named 'torch'
      [WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
       [WARNING]  unable to import torch, please install it if you want to pre-compile any deepspeed ops.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
@apoorvkh
Copy link
Author

cc @mrwyattii @jeffra

@mrwyattii
Copy link
Contributor

Hi @apoorvkh while in the past we have not required torch to be installed before installing DeepSpeed, it has always been required to run DeepSpeed. This "bug" is from a recent PR: #4547

It is problematic adding torch as a requirement to the project because pip will often install the CPU version rather than CUDA/ROCm versions.

We are discussing internally what the proper action is, but I suspect we will restore the previous behavior of not requiring torch at install time and do a 0.13.1 patch release with this update.

@mrwyattii mrwyattii self-assigned this Jan 20, 2024
@apoorvkh
Copy link
Author

Understood, that would be great -- thank you!

@delock
Copy link
Contributor

delock commented Jan 22, 2024

This issue is caused by these two piece of reflection codes. The problem is the reflection code should not intend to iterate into directories belongs to other accelerator.

https://github.com/microsoft/DeepSpeed/blob/master/op_builder/__init__.py#L46
https://github.com/microsoft/DeepSpeed/blob/master/op_builder/all_ops.py#L22

There are two possible fixes, one is to skip directories belongs to accelerators with code like:
if module_name not in ['cpu', 'hpu', 'mps', 'npu', 'xpu']:
Another possible fix which is more graceful should be move CUDA OpBuilders into cuda/ directory in op_builder but his might need more global code change.

@apoorvkh
Copy link
Author

Everything is working well on my end -- thanks again @mrwyattii!

mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this issue Feb 17, 2024
rraminen pushed a commit to ROCm/DeepSpeed that referenced this issue May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants