-
Notifications
You must be signed in to change notification settings - Fork 808
Description
Describe the bug
DPC++ when targeting NVIDIA GPUs in CMake's Debug build type results in
llvm-foreach: Segmentation fault
clang-15: error: ptxas command failed with exit code 254 (use -v to see invocation)
In Release or RelWithDebInfo mode, everything works just fine, it only crashes in Debug builds.
To Reproduce
We tried to reproduce the bug in an MWE (see attached
mwe.zip; cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=clang++ ..
is sufficient to compile this MWE). However, more often than not the segmentation fault did not occur. The linker crashed only in a very few cases.
To reliable reproduce the bug, we used our library code:
git clone git@github.com:SC-SGS/PLSSVM.git
git checkout 06cd4e04a11dabde90e01c5523135f7075d13be6
cd PLSSVM
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=clang++ -DPLSSVM_TAREGT_PLATFORMS="nvidia:sm_XX" -DPLSSVM_ENABLE_OPENMP_BACKEND=OFF -DPLSSVM_ENABLE_CUDA_BACKEND=OFF -DPLSSVM_ENABLE_OPENCL_BACKEND=OFF -DPLSSVM_ENABLE_SYCL_BACKEND=OFF -DPLSSVM_ENABLE_LTO=OFF ..
make -j
where clang++
refers to a DPC++ installation and sm_XX
is the compute architecture of the used NVIDIA GPU (the other flags only ensure that only the SYCL backend is built).
Here, the bug always occurs when compiling in Debug mode.
This bug is only present, since we added the hierarchical kernels (include/plssvm/backends/SYCL/svm_kernel_hierarchical.hpp
).
The bug disappears the second we don't use a function object but instead directly put the kernel code inside a lambda in the respective src/plssvm/backends/SYCL/csvm.cpp
file.
However, this is not an option, since it would extremely bloat said .cpp file (and it works in Release mode).
Environment (please complete the following information):
- OS: Ubuntu 20.04.4 LTS
- Target device and vendor: reproduced on NVIDIA RTX 3080 and NVIDIA A100
- DPC++ version: reproduced with DPC++ commit hashes faaba28, 9f2b7bd, and a618ca7
- Dependencies version: reproduced with CUDA 11.2.2 and 11.4.3
Additional context
The full linker error (with -v
):
[ 70%] Linking CXX shared library libsvm-SYCL.so
clang version 14.0.0 (https://github.com/intel/llvm faaba28541138d7ad39a7fa85fa85b863560b45f)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /import/local.ubuntu/sw/cuda/cuda-11.2.2, version 11.2
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/detail/device_ptr.cpp.o -check-section
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/detail/utility.cpp.o -check-section
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/csvm.cpp.o -check-section
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/exceptions.cpp.o -check-section
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/__/gpu_csvm.cpp.o -check-section
clang-offload-bundler -type=ao -targets=sycl-spir64-unknown-unknown -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=sycl-spir64-unknown-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/detail/device_ptr.cpp.o -outputs=/tmp/device_ptr-4c87b6.o,/tmp/device_ptr-47e824/device_ptr-sm_50.o -unbundle -allow-missing-bundles
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/detail/utility.cpp.o -outputs=/tmp/utility-3277ae.o,/tmp/utility-c66303/utility-sm_50.o -unbundle -allow-missing-bundles
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/csvm.cpp.o -outputs=/tmp/csvm-a95331.o,/tmp/csvm-07d8f9/csvm-sm_50.o -unbundle -allow-missing-bundles
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/exceptions.cpp.o -outputs=/tmp/exceptions-e30db2.o,/tmp/exceptions-1f486e/exceptions-sm_50.o -unbundle -allow-missing-bundles
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build
/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/__/gpu_csvm.cpp.o -outputs=/tmp/gpu_csvm-724d55.o,/tmp/gpu_csvm-0fe631/gpu_csvm-sm_50.o -unbundle -allow-missing-bundles
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/llvm-link" /tmp/device_ptr-47e824/device_ptr-sm_50.o /tmp/utility-c66303/utility-sm_50.o /tmp/csvm-07d8f9/csvm-sm_50.o /tmp/exceptions-1f486e/exceptions-sm_50.o /tmp/gpu_csvm-0fe631/gpu_csvm-sm_50.o -o /tmp/device_ptr-7395c8/device_ptr-sm_50.bc --suppress-warnings
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/sycl-post-link" -split=auto -emit-param-info -emit-program-metadata -symbols -emit-exported-symbols -split-esimd -lower-esimd -O2 -spec-const=default -o /tmp/device_ptr-eabe77/device_ptr-sm_50.bc /tmp/device_ptr-7395c8/device_ptr-sm_50.bc
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/file-table-tform" -extract=Code -drop_titles -o /tmp/device_ptr-08641c/device_ptr-sm_50.bc /tmp/device_ptr-eabe77/device_ptr-sm_50.bc
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/llvm-foreach" --out-ext=s --in-file-list=/tmp/device_ptr-08641c/device_ptr-sm_50.bc --in-replace=/tmp/device_ptr-08641c/device_ptr-sm_50.bc --out-file-list=/tmp/device_ptr-38f754/device_ptr-sm_50.s --out-replace=/tmp/device_ptr-38f754/device_ptr-sm_50.s -- /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-14 -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -fenable-sycl-dae -Wno-sycl-strict -sycl-std=2020 -S -disable-free -clear-ast-before-backend -main-file-name device_ptr.cpp.o -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -mframe-pointer=all -ffp-contract=on -fno-rounding-math -fno-verbose-asm -no-integrated-as -aux-target-cpu x86-64 -internal-isystem /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/../include/sycl -internal-isystem /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/../include -mlink-builtin-bitcode /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc -mlink-builtin-bitcode /import/local.ubuntu/sw/cuda/cuda-11.2.2/nvvm/libdevice/libdevice.10.bc -target-feature +ptx72 -target-sdk-version=11.2 -target-cpu sm_50 -mllvm -treat-scalable-fixed-error-as-warning -mllvm -sycl-enable-local-accessor -debug-info-kind=constructor -dwarf-version=2 -debugger-tuning=gdb -fno-dwarf-directory-asm -v -resource-dir /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/lib/clang/14.0.0 -fdebug-compilation-dir=/data/scratch/breyerml/PLSSVM/build_dpcpp/src/plssvm/backends/SYCL -ferror-limit 19 -fgnuc-version=4.2.1 -fcolor-diagnostics -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/device_ptr-38f754/device_ptr-sm_50.s -x ir /tmp/device_ptr-08641c/device_ptr-sm_50.bc
clang -cc1 version 14.0.0 based upon LLVM 14.0.0git default target x86_64-unknown-linux-gnu
"/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/llvm-foreach" --out-ext=o --in-file-list=/tmp/device_ptr-38f754/device_ptr-sm_50.s --in-replace=/tmp/device_ptr-38f754/device_ptr-sm_50.s --out-file-list=/tmp/device_ptr-c54ab7/device_ptr-sm_50.o --out-replace=/tmp/device_ptr-c54ab7/device_ptr-sm_50.o -- /import/local.ubuntu/sw/cuda/cuda-11.2.2/bin/ptxas -m64 -g --dont-merge-basicblocks --return-at-end -v --gpu-name sm_50 --output-file /tmp/device_ptr-c54ab7/device_ptr-sm_50.o /tmp/device_ptr-38f754/device_ptr-sm_50.s
llvm-foreach: Segmentation fault
clang-14: error: ptxas command failed with exit code 254 (use -v to see invocation)
clang version 14.0.0 (https://github.com/intel/llvm faaba28541138d7ad39a7fa85fa85b863560b45f)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../libsvm-base.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section
clang-14: note: diagnostic msg: Error generating preprocessed source(s).
make[2]: *** [src/plssvm/backends/SYCL/CMakeFiles/svm-SYCL.dir/build.make:167: src/plssvm/backends/SYCL/libsvm-SYCL.so] Error 254
make[1]: *** [CMakeFiles/Makefile2:433: src/plssvm/backends/SYCL/CMakeFiles/svm-SYCL.dir/all] Error 2
make: *** [Makefile:160: all] Error 2