-
Notifications
You must be signed in to change notification settings - Fork 809
Open
Labels
Description
Describe the bug
While building the dpctl
project with CUDA offload using intel/llvm (clang-21, assertions on), the NVPTX(SM90, PTX 8.7) backend aborts during isel with:
llvm-foreach: Aborted
clang-21: /vast/users/abagusetty/compilers/llvm/llvm/lib/Support/APInt.cpp:483: llvm::APInt llvm::APInt::extractBits(unsigned int, unsigned int) const: Assertion `bitPosition < BitWidth && (numBits + bitPosition) <= BitWidth && "Illegal bit extraction"' failed.
The crash occurs in NVPTX DAG->DAG Pattern Instruction Selection, within computeKnownBitsForPRMT.
Reproduce:
I couldn`t narrow it down to a minimal-file specific reproducer but here is the way:
git clone https://github.com/IntelPython/dpctl.git dpctl_syclcuda
cd dpctl_syclcuda
python3 scripts/build_locally.py --verbose --no-level-zero --c-compiler `which clang` --cxx-compiler `which clang++` --compiler-root /vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025 --cmake-opts="-DDPCTL_TARGET_CUDA=sm_90 -DCMAKE_C_COMPILER=`which clang` -DCMAKE_CXX_COMPILER=`which clang++` -DSYCL_LIBRARY_DIR=/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/lib64 -DSYCL_INCLUDE_DIR=/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/include -DIntelSyclCompiler_SYCL_LIBRARY=/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/lib"
Fullstack trace:
llvm-foreach: Aborted
clang-21: /vast/users/abagusetty/compilers/llvm/llvm/lib/Support/APInt.cpp:483: llvm::APInt llvm::APInt::extractBits(unsigned int, unsigned int) const: Assertion `bitPosition < BitWidth && (numBits + bitPosition) <= BitWidth && "Illegal bit extraction"' failed.
PLEASE submit a bug report to https://github.com/intel/llvm/issues and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21 -cc1 -triple nvptx64-nvidia-cuda -O3 -fnative-half-type -aux-triple x86_64-unknown-linux-gnu -fsycl-is-device -fdeclare-spirv-builtins -fenable-sycl-dae -Wno-sycl-strict -D__SYCL_TARGET_NVIDIA_GPU_SM_90__ -sycl-std=2020 -D__SYCL_ANY_DEVICE_HAS_ANY_ASPECT__=1 -D__SYCL_ALL_DEVICES_HAVE_ext_oneapi_bindless_images_sample_2d_usm__=1 -S -dumpdir dpctl/tensor/_tensor_reductions_impl.cpython-310-x86_64-linux-gnu.so- -disable-free -clear-ast-before-backend -main-file-name tensor_reductions.cpp.o -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -fno-delete-null-pointer-checks -mframe-pointer=all -ffp-contract=on -fno-rounding-math -no-integrated-as -aux-target-cpu x86-64 -internal-isystem /vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/../include/sycl/stl_wrappers -internal-isystem /vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/../include -mllvm -enable-memcpyopt-without-libcalls -mlink-builtin-bitcode /vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/../lib/clc/remangled-l64-signed_char.libspirv-nvptx64-nvidia-cuda.bc -mlink-builtin-bitcode /soft/compilers/cuda/cuda-12.9.1/nvvm/libdevice/libdevice.10.bc -target-sdk-version=12.8 -target-cpu sm_90 -target-feature +ptx87 -debugger-tuning=gdb -fno-dwarf-directory-asm -fdebug-compilation-dir=/vast/users/abagusetty/gpu4pyscf/gpu4pyscf_sycl_cuda/dpctl_syclcuda/_skbuild/linux-x86_64-3.10/cmake-build -resource-dir /vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/lib/clang/21 -Wall -Wextra -Winit-self -Wunused-function -Wuninitialized -Wmissing-declarations -Wstrict-prototypes -Wno-unused-parameter -Wformat -Wformat-security -ferror-limit 19 -fwrapv -fgpu-rdc -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -vectorize-loops -vectorize-slp -o /tmp/tensor_reductions-sm_90-e9432f-d8b291.s -x ir /tmp/tensor_reductions-sm_90-6ac920_3591.bc
1. Code generation
2. Running pass 'Function Pass Manager' on module '/tmp/tensor_reductions-sm_90-6ac920_3591.bc'.
3. Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@_ZTSN4sycl3_V16detail19__pf_kernel_wrapperIN5dpctl6tensor7kernels17reduction_seq_krnIaaNS0_7maximumIaEENS4_12offset_utils26TwoOffsets_CombinedIndexerINS9_16Strided1DIndexerENS9_11NoOpIndexerEEESC_EEEE'
#0 0x00000000024fce8b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x24fce8b)
#1 0x00000000024f9dfb llvm::sys::RunSignalHandlers() (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x24f9dfb)
#2 0x00000000024f9f24 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
#3 0x00007f2615a57900 __restore_rt (/lib64/libc.so.6+0x57900)
#4 0x00007f2615aa8dfc __pthread_kill_implementation (/lib64/libc.so.6+0xa8dfc)
#5 0x00007f2615a57842 gsignal (/lib64/libc.so.6+0x57842)
#6 0x00007f2615a3f5cf abort (/lib64/libc.so.6+0x3f5cf)
#7 0x00007f2615a3f4e7 _nl_load_domain.cold (/lib64/libc.so.6+0x3f4e7)
#8 0x00007f2615a4fb32 (/lib64/libc.so.6+0x4fb32)
#9 0x000000000241598e (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x241598e)
#10 0x0000000000e87a61 computeKnownBitsForPRMT(llvm::SDValue, llvm::KnownBits&, llvm::SelectionDAG const&, unsigned int) (.isra.0) NVPTXISelLowering.cpp:0:0
#11 0x0000000003c64d0b llvm::SelectionDAG::computeKnownBits(llvm::SDValue, llvm::APInt const&, unsigned int) const (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3c64d0b)
#12 0x0000000003c665d9 llvm::SelectionDAG::computeKnownBits(llvm::SDValue, llvm::APInt const&, unsigned int) const (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3c665d9)
#13 0x0000000003c671b1 llvm::SelectionDAG::computeKnownBits(llvm::SDValue, unsigned int) const (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3c671b1)
#14 0x0000000000e878a1 computeKnownBitsForPRMT(llvm::SDValue, llvm::KnownBits&, llvm::SelectionDAG const&, unsigned int) (.isra.0) NVPTXISelLowering.cpp:0:0
#15 0x0000000003cd8366 llvm::TargetLowering::SimplifyDemandedBitsForTargetNode(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int) const (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3cd8366)
#16 0x0000000003d2acb7 llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3d2acb7)
#17 0x0000000003d28ed5 llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3d28ed5)
#18 0x0000000003b2af2e (anonymous namespace)::DAGCombiner::SimplifyDemandedBits(llvm::SDValue) (.constprop.0) DAGCombiner.cpp:0:0
#19 0x0000000003b43179 (anonymous namespace)::DAGCombiner::visitTRUNCATE(llvm::SDNode*) DAGCombiner.cpp:0:0
#20 0x0000000003b68a7b (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) DAGCombiner.cpp:0:0
#21 0x0000000003b6a4af (anonymous namespace)::DAGCombiner::Run(llvm::CombineLevel) DAGCombiner.cpp:0:0
#22 0x0000000003b6d08d llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::BatchAAResults*, llvm::CodeGenOptLevel) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3b6d08d)
#23 0x0000000003cbb9d6 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3cbb9d6)
#24 0x0000000003cc059b llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3cc059b)
#25 0x0000000003cc1d42 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3cc1d42)
#26 0x0000000003cac383 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x3cac383)
#27 0x0000000001872abd llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#28 0x0000000001f1575a llvm::FPPassManager::runOnFunction(llvm::Function&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x1f1575a)
#29 0x0000000001f15b81 llvm::FPPassManager::runOnModule(llvm::Module&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x1f15b81)
#30 0x0000000001f16460 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x1f16460)
#31 0x00000000027a29ef (anonymous namespace)::EmitAssemblyHelper::RunCodegenPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#32 0x00000000027a3704 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x27a3704)
#33 0x0000000002ebd9c2 clang::CodeGenAction::ExecuteAction() (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x2ebd9c2)
#34 0x000000000323d2e3 clang::FrontendAction::Execute() (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x323d2e3)
#35 0x00000000031ba15e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x31ba15e)
#36 0x000000000332d0fb clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0x332d0fb)
#37 0x0000000000d2f022 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0xd2f022)
#38 0x0000000000d24ab7 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#39 0x0000000000d2947c clang_main(int, char**, llvm::ToolContext const&) (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0xd2947c)
#40 0x0000000000c14863 main (/vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin/clang-21+0xc14863)
#41 0x00007f2615a40e6c __libc_start_call_main (/lib64/libc.so.6+0x40e6c)
#42 0x00007f2615a40f35 __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x40f35)
#43 0x0000000000d24041 _start /home/abuild/rpmbuild/BUILD/glibc-2.38/csu/../sysdeps/x86_64/start.S:117:0
llvm-foreach: Aborted
clang++: error: clang frontend command failed with exit code 254 (use -v to see invocation)
clang version 21.0.0git (https://github.com/intel/llvm.git 68f3fdf41ca373e413c74da2949d807d3d7d777f)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /vast/users/abagusetty/gpu4pyscf/install_llvm_cuda12.9.1_09-16-2025/bin
Build config: +assertions
clang++: note: diagnostic msg: Error generating preprocessed source(s).
To reproduce
No response
Environment
No response
Additional context
No response