Skip to content

Linker error (segmentation fault) on NVIDIA GPUs #5754

@breyerml

Description

@breyerml

Describe the bug
DPC++ when targeting NVIDIA GPUs in CMake's Debug build type results in

llvm-foreach: Segmentation fault
clang-15: error: ptxas command failed with exit code 254 (use -v to see invocation)

In Release or RelWithDebInfo mode, everything works just fine, it only crashes in Debug builds.

To Reproduce
We tried to reproduce the bug in an MWE (see attached
mwe.zip; cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=clang++ .. is sufficient to compile this MWE). However, more often than not the segmentation fault did not occur. The linker crashed only in a very few cases.

To reliable reproduce the bug, we used our library code:

git clone git@github.com:SC-SGS/PLSSVM.git
git checkout 06cd4e04a11dabde90e01c5523135f7075d13be6
cd PLSSVM
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=clang++ -DPLSSVM_TAREGT_PLATFORMS="nvidia:sm_XX" -DPLSSVM_ENABLE_OPENMP_BACKEND=OFF -DPLSSVM_ENABLE_CUDA_BACKEND=OFF -DPLSSVM_ENABLE_OPENCL_BACKEND=OFF -DPLSSVM_ENABLE_SYCL_BACKEND=OFF -DPLSSVM_ENABLE_LTO=OFF ..
make -j

where clang++ refers to a DPC++ installation and sm_XX is the compute architecture of the used NVIDIA GPU (the other flags only ensure that only the SYCL backend is built).
Here, the bug always occurs when compiling in Debug mode.
This bug is only present, since we added the hierarchical kernels (include/plssvm/backends/SYCL/svm_kernel_hierarchical.hpp).
The bug disappears the second we don't use a function object but instead directly put the kernel code inside a lambda in the respective src/plssvm/backends/SYCL/csvm.cpp file.
However, this is not an option, since it would extremely bloat said .cpp file (and it works in Release mode).

Environment (please complete the following information):

  • OS: Ubuntu 20.04.4 LTS
  • Target device and vendor: reproduced on NVIDIA RTX 3080 and NVIDIA A100
  • DPC++ version: reproduced with DPC++ commit hashes faaba28, 9f2b7bd, and a618ca7
  • Dependencies version: reproduced with CUDA 11.2.2 and 11.4.3

Additional context
The full linker error (with -v):

[ 70%] Linking CXX shared library libsvm-SYCL.so
clang version 14.0.0 (https://github.com/intel/llvm faaba28541138d7ad39a7fa85fa85b863560b45f)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /import/local.ubuntu/sw/cuda/cuda-11.2.2, version 11.2
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/detail/device_ptr.cpp.o -check-section 
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/detail/utility.cpp.o -check-section 
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/csvm.cpp.o -check-section 
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/exceptions.cpp.o -check-section 
clang-offload-bundler -type=o -targets=sycl-spir64-unknown-unknown -inputs=CMakeFiles/svm-SYCL.dir/__/gpu_csvm.cpp.o -check-section 
clang-offload-bundler -type=ao -targets=sycl-spir64-unknown-unknown -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-spir64-unknown-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/detail/device_ptr.cpp.o -outputs=/tmp/device_ptr-4c87b6.o,/tmp/device_ptr-47e824/device_ptr-sm_50.o -unbundle -allow-missing-bundles
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/detail/utility.cpp.o -outputs=/tmp/utility-3277ae.o,/tmp/utility-c66303/utility-sm_50.o -unbundle -allow-missing-bundles
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/csvm.cpp.o -outputs=/tmp/csvm-a95331.o,/tmp/csvm-07d8f9/csvm-sm_50.o -unbundle -allow-missing-bundles
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/exceptions.cpp.o -outputs=/tmp/exceptions-e30db2.o,/tmp/exceptions-1f486e/exceptions-sm_50.o -unbundle -allow-missing-bundles
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build
/bin/clang-offload-bundler" -type=o -targets=host-x86_64-unknown-linux-gnu,sycl-nvptx64-nvidia-cuda-sm_50 -inputs=CMakeFiles/svm-SYCL.dir/__/gpu_csvm.cpp.o -outputs=/tmp/gpu_csvm-724d55.o,/tmp/gpu_csvm-0fe631/gpu_csvm-sm_50.o -unbundle -allow-missing-bundles
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/llvm-link" /tmp/device_ptr-47e824/device_ptr-sm_50.o /tmp/utility-c66303/utility-sm_50.o /tmp/csvm-07d8f9/csvm-sm_50.o /tmp/exceptions-1f486e/exceptions-sm_50.o /tmp/gpu_csvm-0fe631/gpu_csvm-sm_50.o -o /tmp/device_ptr-7395c8/device_ptr-sm_50.bc --suppress-warnings
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/sycl-post-link" -split=auto -emit-param-info -emit-program-metadata -symbols -emit-exported-symbols -split-esimd -lower-esimd -O2 -spec-const=default -o /tmp/device_ptr-eabe77/device_ptr-sm_50.bc /tmp/device_ptr-7395c8/device_ptr-sm_50.bc
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/file-table-tform" -extract=Code -drop_titles -o /tmp/device_ptr-08641c/device_ptr-sm_50.bc /tmp/device_ptr-eabe77/device_ptr-sm_50.bc
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/llvm-foreach" --out-ext=s --in-file-list=/tmp/device_ptr-08641c/device_ptr-sm_50.bc --in-replace=/tmp/device_ptr-08641c/device_ptr-sm_50.bc --out-file-list=/tmp/device_ptr-38f754/device_ptr-sm_50.s --out-replace=/tmp/device_ptr-38f754/device_ptr-sm_50.s -- /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/clang-14 -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -fenable-sycl-dae -Wno-sycl-strict -sycl-std=2020 -S -disable-free -clear-ast-before-backend -main-file-name device_ptr.cpp.o -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -mframe-pointer=all -ffp-contract=on -fno-rounding-math -fno-verbose-asm -no-integrated-as -aux-target-cpu x86-64 -internal-isystem /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/../include/sycl -internal-isystem /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/../include -mlink-builtin-bitcode /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc -mlink-builtin-bitcode /import/local.ubuntu/sw/cuda/cuda-11.2.2/nvvm/libdevice/libdevice.10.bc -target-feature +ptx72 -target-sdk-version=11.2 -target-cpu sm_50 -mllvm -treat-scalable-fixed-error-as-warning -mllvm -sycl-enable-local-accessor -debug-info-kind=constructor -dwarf-version=2 -debugger-tuning=gdb -fno-dwarf-directory-asm -v -resource-dir /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/lib/clang/14.0.0 -fdebug-compilation-dir=/data/scratch/breyerml/PLSSVM/build_dpcpp/src/plssvm/backends/SYCL -ferror-limit 19 -fgnuc-version=4.2.1 -fcolor-diagnostics -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/device_ptr-38f754/device_ptr-sm_50.s -x ir /tmp/device_ptr-08641c/device_ptr-sm_50.bc
clang -cc1 version 14.0.0 based upon LLVM 14.0.0git default target x86_64-unknown-linux-gnu
 "/data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin/llvm-foreach" --out-ext=o --in-file-list=/tmp/device_ptr-38f754/device_ptr-sm_50.s --in-replace=/tmp/device_ptr-38f754/device_ptr-sm_50.s --out-file-list=/tmp/device_ptr-c54ab7/device_ptr-sm_50.o --out-replace=/tmp/device_ptr-c54ab7/device_ptr-sm_50.o -- /import/local.ubuntu/sw/cuda/cuda-11.2.2/bin/ptxas -m64 -g --dont-merge-basicblocks --return-at-end -v --gpu-name sm_50 --output-file /tmp/device_ptr-c54ab7/device_ptr-sm_50.o /tmp/device_ptr-38f754/device_ptr-sm_50.s
llvm-foreach: Segmentation fault
clang-14: error: ptxas command failed with exit code 254 (use -v to see invocation)
clang version 14.0.0 (https://github.com/intel/llvm faaba28541138d7ad39a7fa85fa85b863560b45f)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /data/scratch/breyerml/Programs/dpcpp_2022_02_04/build/bin
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../libsvm-base.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocx-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=ao -targets=sycl-fpga_aocr_emu-intel-unknown -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-offload-bundler -type=ao -targets=host-x86_64-unknown-linux-gnu -inputs=../../../../_deps/fmt-build/libfmtd.a -check-section 
clang-14: note: diagnostic msg: Error generating preprocessed source(s).
make[2]: *** [src/plssvm/backends/SYCL/CMakeFiles/svm-SYCL.dir/build.make:167: src/plssvm/backends/SYCL/libsvm-SYCL.so] Error 254
make[1]: *** [CMakeFiles/Makefile2:433: src/plssvm/backends/SYCL/CMakeFiles/svm-SYCL.dir/all] Error 2
make: *** [Makefile:160: all] Error 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcompilerCompiler related issuecudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions