Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Using and Building DeepSpeedCPUAdam #5677

Open
oabuhamdan opened this issue Jun 18, 2024 · 23 comments
Open

[BUG] Using and Building DeepSpeedCPUAdam #5677

oabuhamdan opened this issue Jun 18, 2024 · 23 comments
Assignees
Labels
bug Something isn't working training

Comments

@oabuhamdan
Copy link

Describe the bug
I installed deepspeed with pip install deepspeed and tried to use DeepSpeedCPUAdam but with this error

Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x2b0ad49eefc0>
Traceback (most recent call last):
  File "/home/3458/pytorch/venv/lib/python3.11/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
    ^^^^^^^^^^^^^^^^
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x2b5bec88afc0>
Traceback (most recent call last):
  File "/home/3458/pytorch/venv/lib/python3.11/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
    ^^^^^^^^^^^^^^^^
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

After trying one of the solutions posted to one of the issues here

python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'

I got this error

Installed CUDA version 12.4 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Using /home/3458/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Creating extension directory /home/3458/.cache/torch_extensions/py311_cu121/cpu_adam...
Emitting ninja build file /home/3458/.cache/torch_extensions/py311_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/3458/.local/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 508, in load
    return self.jit_load(verbose)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/3458/.local/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 555, in jit_load
    op_module = load(name=self.name,
                ^^^^^^^^^^^^^^^^^^^^
  File "/home/3458/.local/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1309, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/home/3458/.local/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/3458/.local/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 573, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1233, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /home/3458/.cache/torch_extensions/py311_cu121/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory

Expected behavior
DeepSpeedCPUAdam to work.

ds_report output

[2024-06-17 21:47:34,453] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Warning: The default cache directory for DeepSpeed Triton autotune, /home/3458/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.1), only 1.0.0 is known to be compatible
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.1), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/3458/pytorch/venv/lib/python3.11/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/home/3458/pytorch/venv/lib/python3.11/site-packages/deepspeed']
deepspeed info ................... 0.14.3, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.4
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 251.82 GB

System info (please complete the following information):

  • GPU count and types 3 machines each with one Tesla T4 GPU
  • Interconnects: connected with 100 Gbps IB
  • Python version: 3.11.7
  • gcc (GCC) 12.2.0
  • Torch: '2.3.1+cu121'
  • PyTorch lightning: '2.3.0'
  • LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
  • Distributor ID: CentOS
  • Description: CentOS Linux release 7.8.2003 (Core)
  • Release: 7.8.2003
  • Codename: Core
  • SLURM environment
  • GCC 12.2
  • Cuda 12.4.1
  • DeepSpeed 0.14.3
@oabuhamdan oabuhamdan added bug Something isn't working training labels Jun 18, 2024
@sylee96
Copy link

sylee96 commented Jun 18, 2024

I also encountered this bug.

I have been using Docker images to install and use deepspeed, and the same code I have used worked before. However, when I created a new container through the Docker image and installed deepspeed, the same issue as described above occurs. Is this a problem with the 0.14.3 version of deepspeed? How can I resolve this?

Specific system information is as follows:
System info (please complete the following information):

  • OS: Ubuntu 22.04
  • GPU count and types A6000 x4
  • Python version 3.10.6
  • Cuda 12.0
  • deepspeed 0.14.3
  • torch 2.3.1
  • accelerate 0.31.0

@delock
Copy link
Contributor

delock commented Jun 21, 2024

I get same error and noticed that cpu_adam.so didn't get properly built. In my case it seems a missing depedency. You can scroll back further to see what caused the module didn't load.

[2/3] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COM[138/58213]
\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/gma/DeepSpeed/csrc/includes -isystem /h
ome/gma/miniforge3/envs/ds/lib/python3.11/site-packages/torch/include -isystem /home/gma/miniforge3/envs/ds/lib/python3.11/sit
e-packages/torch/include/torch/csrc/api/include -isystem /home/gma/miniforge3/envs/ds/lib/python3.11/site-packages/torch/inclu
de/TH -isystem /home/gma/miniforge3/envs/ds/lib/python3.11/site-packages/torch/include/THC -isystem /home/gma/miniforge3/envs/
ds/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcud
art -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /home/gma/DeepSpeed/csrc/adam/cpu_ad
am_impl.cpp -o cpu_adam_impl.o                                                                                                
[3/3] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/gma/miniforge3/envs/ds/lib/python3.11/site-packages/torch/lib -l
c10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so                                                                         
FAILED: cpu_adam.so                                                                                                           
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/gma/miniforge3/envs/ds/lib/python3.11/site-packages/torch/lib -lc10 -l
torch_cpu -ltorch -ltorch_python -o cpu_adam.so                                                                               
/usr/bin/ld: cannot find -lcurand: No such file or directory                                                                  

@oabuhamdan
Copy link
Author

@delock
that's not the case with me.
I tried
DS_BUILD_CPU_ADAM=1 pip install deepspeed
and I got

gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/tmp/pip-install-gibr8vp7/deepspeed_b3851d81a4bb41fd902ed8835bbe2ecd/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
      In file included from /tmp/pip-install-gibr8vp7/deepspeed_b3851d81a4bb41fd902ed8835bbe2ecd/csrc/includes/cpu_adam.h:14,
                       from csrc/adam/cpu_adam.cpp:6:
...
...
gcc: fatal error: Killed signal terminated program cc1plus
      compilation terminated.
      error: command '/opt/shared/gcc/11.2.0/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for deepspeed
  Running setup.py clean for deepspeed
Failed to build deepspeed
ERROR: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects

@savage95813
Copy link

@delock i got same error,infact i have the libcurand.so

@delock
Copy link
Contributor

delock commented Jun 23, 2024

@delock that's not the case with me. I tried DS_BUILD_CPU_ADAM=1 pip install deepspeed and I got

gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/tmp/pip-install-gibr8vp7/deepspeed_b3851d81a4bb41fd902ed8835bbe2ecd/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
      In file included from /tmp/pip-install-gibr8vp7/deepspeed_b3851d81a4bb41fd902ed8835bbe2ecd/csrc/includes/cpu_adam.h:14,
                       from csrc/adam/cpu_adam.cpp:6:
...
...
gcc: fatal error: Killed signal terminated program cc1plus
      compilation terminated.
      error: command '/opt/shared/gcc/11.2.0/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for deepspeed
  Running setup.py clean for deepspeed
Failed to build deepspeed
ERROR: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects

The full error message from gcc might give an indication what might have gone wrong. The real reason for kernel build failure might be different in your case. One thing I usually try is execute the following command printed out by DeepSpeed manually so this specific error can be reproduced and triaged.

gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/tmp/pip-install-gibr8vp7/deepspeed_b3851d81a4bb41fd902ed8835bbe2ecd/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0

@oabuhamdan
Copy link
Author

oabuhamdan commented Jun 23, 2024

Hi @delock
I followed your advice.
The compilation is stuck at some point when building from source or when using pip.
This is when using

DS_BUILD_CPU_ADAM=1 pip install deepspeed

or

DS_BUILD_CPU_ADAM=1 pip install . # inside DeepSpeed directory

Here is the command it is stuck at

gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/home/3458/pytorch/DeepSpeed/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\"

And here is where it is stuck

 building 'deepspeed.ops.adam.cpu_adam_op' extension
  gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/home/3458/pytorch/DeepSpeed/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /home/3458/pytorch/DeepSpeed/csrc/includes/cpu_adam.h:14,
                   from csrc/adam/cpu_adam.cpp:6:
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:133: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    133 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:154: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    154 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:163: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    163 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:184: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    184 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:191: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    191 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:199: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    199 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:207: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    207 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:215: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    215 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:221: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    221 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:227: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    227 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:233: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    233 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:239: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    239 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:245: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    245 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:251: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    251 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:257: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    257 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:263: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    263 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:269: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    269 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:277: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    277 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:283: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    283 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:289: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    289 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:295: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    295 | #pragma unroll
        |

@oabuhamdan
Copy link
Author

Now when using

DS_BUILD_CPU_ADAM=1 ./install.sh

It gives the same above error but telling which line

gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
error: command '/opt/shared/gcc/11.2.0/bin/gcc' failed with exit code 1
Error on line 155
Fail to install deepspeed

@delock
Copy link
Contributor

delock commented Jun 24, 2024

Hi @delock I followed your advice. The compilation is stuck at some point when building from source or when using pip. This is when using

DS_BUILD_CPU_ADAM=1 pip install deepspeed

or

DS_BUILD_CPU_ADAM=1 pip install . # inside DeepSpeed directory

Here is the command it is stuck at

gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/home/3458/pytorch/DeepSpeed/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\"

And here is where it is stuck

 building 'deepspeed.ops.adam.cpu_adam_op' extension
  gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/home/3458/pytorch/DeepSpeed/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /home/3458/pytorch/DeepSpeed/csrc/includes/cpu_adam.h:14,
                   from csrc/adam/cpu_adam.cpp:6:
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:133: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    133 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:154: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    154 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:163: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    163 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:184: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    184 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:191: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    191 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:199: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    199 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:207: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    207 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:215: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    215 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:221: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    221 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:227: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    227 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:233: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    233 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:239: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    239 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:245: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    245 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:251: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    251 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:257: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    257 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:263: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    263 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:269: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    269 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:277: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    277 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:283: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    283 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:289: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    289 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:295: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    295 | #pragma unroll
        |

The message posted are warning message, which compiler should not stop. Usually compile stops when encountered an error. When you execute the gcc line manually, what is the first error compiler reported?

@oabuhamdan
Copy link
Author

oabuhamdan commented Jun 24, 2024

When running

DS_BUILD_CPU_ADAM=1 pip install . -vv

or

DS_BUILD_CPU_ADAM=1 pip install deepspeed -vv

I have this error

 running build_ext
  building 'deepspeed.ops.adam.cpu_adam_op' extension
  creating build/temp.linux-x86_64-cpython-311
  creating build/temp.linux-x86_64-cpython-311/csrc
  creating build/temp.linux-x86_64-cpython-311/csrc/adam
  gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/home/3458/pytorch/DeepSpeed/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /home/3458/pytorch/DeepSpeed/csrc/includes/cpu_adam.h:14,
                   from csrc/adam/cpu_adam.cpp:6:
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:133: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    133 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:154: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    154 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:163: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    163 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:184: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    184 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:191: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    191 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:199: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    199 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:207: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    207 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:215: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    215 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:221: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    221 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:227: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    227 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:233: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    233 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:239: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    239 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:245: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    245 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:251: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    251 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:257: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    257 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:263: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    263 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:269: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    269 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:277: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    277 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:283: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    283 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:289: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    289 | #pragma unroll
        |
  /home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:295: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
    295 | #pragma unroll
        |
  gcc: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  error: command '/opt/shared/gcc/11.2.0/bin/gcc' failed with exit code 1
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/3458/pytorch/venv/bin/python3 -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/home/3458/pytorch/DeepSpeed/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-v5qo0dag
  cwd: /home/3458/pytorch/DeepSpeed/
error
  ERROR: Failed building wheel for deepspeed

Then when running the latest gcc command

gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/home/3458/pytorch/DeepSpeed/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0

I get

gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/home/3458/pytorch/DeepSpeed/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /home/3458/pytorch/DeepSpeed/csrc/includes/cpu_adam.h:14,
                 from csrc/adam/cpu_adam.cpp:6:
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:133: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  133 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:154: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  154 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:163: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  163 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:184: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  184 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:191: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  191 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:199: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  199 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:207: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  207 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:215: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  215 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:221: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  221 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:227: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  227 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:233: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  233 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:239: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  239 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:245: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  245 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:251: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  251 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:257: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  257 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:263: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  263 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:269: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  269 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:277: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  277 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:283: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  283 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:289: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  289 | #pragma unroll
      | 
/home/3458/pytorch/DeepSpeed/csrc/includes/simd.h:295: warning: ignoring '#pragma unroll ' [-Wunknown-pragmas]
  295 | #pragma unroll
      | 
gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.

@oabuhamdan
Copy link
Author

OK so the building is stuck sometimes due to having ~/.cache/torch_extensions/.
Removing the dir and rebuilding causes the error above.

@delock
Copy link
Contributor

delock commented Jun 25, 2024

This specific error indicates that gcc has encountered some serious error. I ran the same command on my system and I see gcc spend several seconds after the unroll warning message before it finally exists successfully. I suspect there are something complicate in the code that it stress some version of gcc out.

gcc: fatal error: Killed signal terminated program cc1plus

What is your gcc version? I'm using gcc 12.3.0 and I can finally finish compiling.

@oabuhamdan
Copy link
Author

Thanks @delock
I am using a controlled environment with VALE. I switch between GCC versions but usually I use 11.2.0.
Other info:

DeepSpeed general environment info:
torch version .................... 2.3.1+cu121
deepspeed info ................... 0.14.4, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.4
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.1
shared memory (/dev/shm) size .... 251.82 GB

@delock
Copy link
Contributor

delock commented Jun 25, 2024

Thanks @delock I am using a controlled environment with VALE. I switch between GCC versions but usually I use 11.2.0. Other info:

DeepSpeed general environment info:
torch version .................... 2.3.1+cu121
deepspeed info ................... 0.14.4, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.4
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.1
shared memory (/dev/shm) size .... 251.82 GB

Did you encountered same issue with other GCC version? Have you tried gcc 12.3?

@oabuhamdan
Copy link
Author

I checked out the latest code and it's back to the 'stuck' phase ..

I used GCC 12.2.0

@delock
Copy link
Contributor

delock commented Jun 26, 2024

I encountered the build paused several seconds but never stuck. I saw CPU adam has some new feature last month. Hi @BacharL did you encountered anything abnormal during building process when you change this kernel?

@BacharL
Copy link
Contributor

BacharL commented Jun 26, 2024

I never encountered compilation being aborted. I used GCC 11.4.0 and now I tested on 12.2.0
Both compilers take about 30 seconds to complete on my CPU (xeon server). 11.4.0 does not show the warnings while 12.2.0 does.

@oabuhamdan
Copy link
Author

Both compilers take about 30 seconds to complete on my CPU (xeon server)

Can you try on a device with a GPU and CUDA please?
Thanks

@oabuhamdan
Copy link
Author

(I use a supercomputer with a shared FS environment. The GPU node shares the venv with the CPU node)

When I run DS_SKIP_CUDA_CHECK=1 DS_BUILD_CPU_ADAM=1 pip install deepspeed --no-cache-dir -vv on a CPU only node, it builds successfully. Here is the output of ds_report

$ ds_report
/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
[2024-06-26 11:41:45,126] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2024-06-26 11:41:45,139] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
Warning: The default cache directory for DeepSpeed Triton autotune, /home/3458/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [YES] ...... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/3458/pytorch/venv/lib/python3.11/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/home/3458/pytorch/venv/lib/python3.11/site-packages/deepspeed']
deepspeed info ................... 0.14.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.3 
shared memory (/dev/shm) size .... 251.79 GB

Then when I run the code I get this error

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

I run the command to fix this issue as I read here in other solutions, but still it doesn't work

$ python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'
/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
[2024-06-26 12:00:22,463] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2024-06-26 12:00:22,476] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")

Now when I log in to the GPU node an run ds_report

$ ds_report
[2024-06-26 12:03:13,649] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Warning: The default cache directory for DeepSpeed Triton autotune, /home/3458/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.1), only 1.0.0 is known to be compatible
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.1), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/3458/pytorch/venv/lib/python3.11/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/home/3458/pytorch/venv/lib/python3.11/site-packages/deepspeed']
deepspeed info ................... 0.14.4, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.4
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.1
shared memory (/dev/shm) size .... 251.82 GB

And

python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'
[2024-06-26 12:08:35,396] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Warning: The default cache directory for DeepSpeed Triton autotune, /home/3458/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  NVIDIA Inference is only supported on Ampere and newer architectures
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.1), only 1.0.0 is known to be compatible
Installed CUDA version 12.4 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Using /home/3458/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Creating extension directory /home/3458/.cache/torch_extensions/py311_cu121/cpu_adam...
Emitting ninja build file /home/3458/.cache/torch_extensions/py311_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/3458/pytorch/venv/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 508, in load
    return self.jit_load(verbose)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/3458/pytorch/venv/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 555, in jit_load
    op_module = load(name=self.name,
                ^^^^^^^^^^^^^^^^^^^^
  File "/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1309, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1745, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2143, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 573, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1233, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /home/3458/.cache/torch_extensions/py311_cu121/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory

Now when I uninstall deepspeed, remove .cache/torch_extensions, and run DS_SKIP_CUDA_CHECK=1 DS_BUILD_CPU_ADAM=1 pip install deepspeed --no-cache-dir -vv in the GPU node, the build is stuck here

running build_ext
  building 'deepspeed.ops.adam.cpu_adam_op' extension
  creating build/temp.linux-x86_64-cpython-311
  creating build/temp.linux-x86_64-cpython-311/csrc
  creating build/temp.linux-x86_64-cpython-311/csrc/adam
  gcc -pthread -B /opt/shared/anaconda/2024.02/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -O2 -isystem /opt/shared/anaconda/2024.02/include -fPIC -I/tmp/pip-install-m14c3pnf/deepspeed_f9629db6986c4d97985ce99cf8ae5b76/csrc/includes -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/TH -I/home/3458/pytorch/venv/lib/python3.11/site-packages/torch/include/THC -I/opt/shared/cuda/12.4.1-550.54.15/include -I/home/3458/pytorch/venv/include -I/opt/shared/anaconda/2024.02/include/python3.11 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-cpython-311/csrc/adam/cpu_adam.o -O3 -std=c++17 -g -Wno-reorder -L/opt/shared/cuda/12.4.1-550.54.15/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /tmp/pip-install-m14c3pnf/deepspeed_f9629db6986c4d97985ce99cf8ae5b76/csrc/includes/cpu_adam.h:14,
                   from csrc/adam/cpu_adam.cpp:6:

or sometimes it continues to this error

 gcc: fatal error: Killed signal terminated program cc1plus
  compilation terminated.
  error: command '/opt/shared/gcc/12.2.0/bin/gcc' failed with exit code 1
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/3458/pytorch/venv/bin/python3 -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/tmp/pip-install-m14c3pnf/deepspeed_f9629db6986c4d97985ce99cf8ae5b76/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-p5lwia58
  cwd: /tmp/pip-install-m14c3pnf/deepspeed_f9629db6986c4d97985ce99cf8ae5b76/

This is the whole story :)

@loadams
Copy link
Contributor

loadams commented Jun 26, 2024

Hi @oabuhamdan - can you summarize the state of this, is there a bug that needs more debugging, or do we think this is something perhaps unique to your setup/cuda/torch/hw?

@oabuhamdan
Copy link
Author

Hi @loadams

Summary:
DeepSpeedCPUAdam doesn't work.
When I try to build it (DS_BUILD_CPU_ADAM), the build works on the CPU-only node but not a GPU node.
My previous comment shows a detailed description.

Thanks!

@loadams
Copy link
Contributor

loadams commented Jun 28, 2024

Thanks for the quick summary @oabuhamdan - I'll test this on my side as well. Though I believe this runs currently in the nv-pre-compile-ops workflow, so this may be setup related?

@oabuhamdan
Copy link
Author

Thanks for continuing this thread @loadams.
I checked the workflow, but I think it doesn't reflect the issue here.
First, checking the workflow environment, it uses an old version of torch and cuda

Run which python
/usr/bin/python
Python 3.8.10
torch: 1.13.1+cu116 <module 'torch' from '/usr/local/lib/python3.8/dist-packages/torch/__init__.py'>

Cuda 11.6 was released in January 2022. We have 12.5 now (May 2024).
Torch 1.13 is December 2022. We have 2.3.1 (June 2024)

Second, the ds_report doesn't look fine. It says it uses CPU, and CPUAdam is not even installed.

Run ds_report
Warning: 28 00:14:55,508] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2024-06-28 00:14:55,515] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
.
.
.
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY]

In my previous comment, I stated that the build works when I use CPU node, but with GPU node it fails.
Correct me if I have mistaken anything here.

Appreciate your help!

@loadams loadams self-assigned this Jun 28, 2024
@loadams
Copy link
Contributor

loadams commented Jun 28, 2024

@oabuhamdan - thanks for clarifying, I forgot that our node for that wasn't using GPUs, I'll work on getting a repro and will share my results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

6 participants