Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][OpenMP][Offload] HLFIR AssignOp does not lower to a friendly form for AMDGPU which is used for target offloading for OpenMP #74603

Closed
agozillon opened this issue Dec 6, 2023 · 12 comments
Assignees
Labels

Comments

@agozillon
Copy link
Contributor

This issue was found during the implementation of the following PR (and is dependent on it): #71766

The following example which attempts to map and assign a value to an allocatable variable on device compiles and works for the deprecated FIR flow, but will fail using the new HLFIR flow:

program main
    integer, allocatable :: test
    allocate(test)
    test = 10

!$omp target map(tofrom:test)
    test = 50
!$omp end target

    print *, test

    deallocate(test)
end program

Yielding the following ICE error:

LLVM ERROR: Cannot select: t20: i64,ch = dynamic_stackalloc t16:1, Constant:i64<16>, Constant:i64<0>
  t19: i64 = Constant<16>
  t6: i64 = Constant<0>
In function: __omp_offloading_fd00_4b200ae__QQmain_l5
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --linker-path=/work/agozillo/git/flang-dev/llvm-main-project/build/bin/ld.lld -- -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o single-value-alloca.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/ -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget/DeviceRTL -L/etc/alternatives/rocm/lib /tmp/single-value-alloca-825105.o -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib --whole-archive -lFortran_main --no-whole-archive -lFortranRuntime -lFortranDecimal -lm -lomp -lomptarget -lomptarget.devicertl -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /lib/x86_64-linux-gnu/crtn.o
1.	Running pass 'CallGraph Pass Manager' on module 'ld-temp.o'.
2.	Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function '@__omp_offloading_fd00_4b200ae__QQmain_l5'
 #0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (.part.0) SelectionDAGISel.cpp:0:0
#13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) CallGraphSCCPass.cpp:0:0
#17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#18 0x0000560af1ee0205 codegen(llvm::lto::Config const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) LTOBackend.cpp:0:0
#19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::function<llvm::Expected<std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>> (unsigned int, llvm::StringRef, llvm::Twine const&)>) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::SmallVectorImpl<llvm::StringRef>&, llvm::opt::ArgList const&) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::opt::InputArgList const&, char**, int)::'lambda'(auto&)::operator()<llvm::SmallVector<llvm::object::OffloadFile, 3u>>(auto&) const ClangLinkerWrapper.cpp:0:0
#24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::opt::InputArgList const&, char**, int) ClangLinkerWrapper.cpp:0:0
#25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)
 #0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (.part.0) SelectionDAGISel.cpp:0:0
#13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) CallGraphSCCPass.cpp:0:0
#17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#18 0x0000560af1ee0205 codegen(llvm::lto::Config const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) LTOBackend.cpp:0:0
#19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::function<llvm::Expected<std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>> (unsigned int, llvm::StringRef, llvm::Twine const&)>) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::SmallVectorImpl<llvm::StringRef>&, llvm::opt::ArgList const&) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::opt::InputArgList const&, char**, int)::'lambda'(auto&)::operator()<llvm::SmallVector<llvm::object::OffloadFile, 3u>>(auto&) const ClangLinkerWrapper.cpp:0:0
#24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::opt::InputArgList const&, char**, int) ClangLinkerWrapper.cpp:0:0
#25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)

The command used to compile this and hit the error (should just require having a Clang that's compiled to support AMDGPU and the dependent PR if it's not committed already): flang-new --offload-arch=gfx90a -fopenmp test.f90 -o test.out

From what I can gather, from digging a little into the issue, this comes from AMDGPU not supporting DYNAMIC_STACKALLOCA instructions. I think AMDGPU only performs static allocation, but someone with more understanding of that segment of the compiler will know far better than myself.

However, the generation of this code that's unfriendly for AMD GPU, appears to stem from the HLFIR AssignOp, which lowers to a Fortran runtime call, which likely brings in the instruction that requires a dynamic stack allocation instruction (I've unfortunately not found the exact problematic line, but there's a number of areas that might pose the problem).

The solution, that I can currently think of, is to opt out of the HLFIR AssignOp generation for AMD GPU devices or for OpenMP offload (or both) and utilise the old FIR flow, which does not depend on the runtime call. I am not sure how palatable that is for everyone though, as I imagine the intent was to discard this old FIR flow in the near future. I am more than open to other suggestions however! This is just the option I had in mind just now.

It also brings up the possible issue that errors like this are encountered for other cases where HLFIR operations lower to Fortran rutnime calls, but that may be hyperbole as this is the only case I've encountered so far.

@agozillon agozillon self-assigned this Dec 6, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2023

@llvm/issue-subscribers-flang-ir

Author: None (agozillon)

This issue was found during the implementation of the following PR (and is dependent on it): https://github.com//pull/71766

The following example which attempts to map and assign a value to an allocatable variable on device compiles and works for the deprecated FIR flow, but will fail using the new HLFIR flow:

program main
    integer, allocatable :: test
    allocate(test)
    test = 10

!$omp target map(tofrom:test)
    test = 50
!$omp end target

    print *, test

    deallocate(test)
end program

Yielding the following ICE error:

LLVM ERROR: Cannot select: t20: i64,ch = dynamic_stackalloc t16:1, Constant:i64&lt;16&gt;, Constant:i64&lt;0&gt;
  t19: i64 = Constant&lt;16&gt;
  t6: i64 = Constant&lt;0&gt;
In function: __omp_offloading_fd00_4b200ae__QQmain_l5
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --linker-path=/work/agozillo/git/flang-dev/llvm-main-project/build/bin/ld.lld -- -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o single-value-alloca.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/ -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget/DeviceRTL -L/etc/alternatives/rocm/lib /tmp/single-value-alloca-825105.o -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib --whole-archive -lFortran_main --no-whole-archive -lFortranRuntime -lFortranDecimal -lm -lomp -lomptarget -lomptarget.devicertl -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /lib/x86_64-linux-gnu/crtn.o
1.	Running pass 'CallGraph Pass Manager' on module 'ld-temp.o'.
2.	Running pass 'AMDGPU DAG-&gt;DAG Pattern Instruction Selection' on function '@<!-- -->__omp_offloading_fd00_4b200ae__QQmain_l5'
 #<!-- -->0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #<!-- -->1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #<!-- -->2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #<!-- -->3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #<!-- -->4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #<!-- -->5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #<!-- -->6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #<!-- -->7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #<!-- -->8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #<!-- -->9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#<!-- -->10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#<!-- -->11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#<!-- -->12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (.part.0) SelectionDAGISel.cpp:0:0
#<!-- -->13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#<!-- -->14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&amp;) (.part.0) MachineFunctionPass.cpp:0:0
#<!-- -->15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#<!-- -->16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&amp;) CallGraphSCCPass.cpp:0:0
#<!-- -->17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#<!-- -->18 0x0000560af1ee0205 codegen(llvm::lto::Config const&amp;, llvm::TargetMachine*, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex const&amp;) LTOBackend.cpp:0:0
#<!-- -->19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&amp;, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#<!-- -->20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#<!-- -->21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, std::function&lt;llvm::Expected&lt;std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;&gt; (unsigned int, llvm::StringRef, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#<!-- -->22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::SmallVectorImpl&lt;llvm::StringRef&gt;&amp;, llvm::opt::ArgList const&amp;) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#<!-- -->23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int)::'lambda'(auto&amp;)::operator()&lt;llvm::SmallVector&lt;llvm::object::OffloadFile, 3u&gt;&gt;(auto&amp;) const ClangLinkerWrapper.cpp:0:0
#<!-- -->24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int) ClangLinkerWrapper.cpp:0:0
#<!-- -->25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#<!-- -->26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#<!-- -->27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)
 #<!-- -->0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #<!-- -->1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #<!-- -->2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #<!-- -->3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #<!-- -->4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #<!-- -->5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #<!-- -->6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #<!-- -->7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #<!-- -->8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #<!-- -->9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#<!-- -->10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#<!-- -->11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#<!-- -->12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (.part.0) SelectionDAGISel.cpp:0:0
#<!-- -->13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#<!-- -->14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&amp;) (.part.0) MachineFunctionPass.cpp:0:0
#<!-- -->15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#<!-- -->16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&amp;) CallGraphSCCPass.cpp:0:0
#<!-- -->17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#<!-- -->18 0x0000560af1ee0205 codegen(llvm::lto::Config const&amp;, llvm::TargetMachine*, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex const&amp;) LTOBackend.cpp:0:0
#<!-- -->19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&amp;, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#<!-- -->20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#<!-- -->21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, std::function&lt;llvm::Expected&lt;std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;&gt; (unsigned int, llvm::StringRef, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#<!-- -->22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::SmallVectorImpl&lt;llvm::StringRef&gt;&amp;, llvm::opt::ArgList const&amp;) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#<!-- -->23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int)::'lambda'(auto&amp;)::operator()&lt;llvm::SmallVector&lt;llvm::object::OffloadFile, 3u&gt;&gt;(auto&amp;) const ClangLinkerWrapper.cpp:0:0
#<!-- -->24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int) ClangLinkerWrapper.cpp:0:0
#<!-- -->25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#<!-- -->26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#<!-- -->27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)

The command used to compile this and hit the error (should just require having a Clang that's compiled to support AMDGPU and the dependent PR if it's not committed already): flang-new --offload-arch=gfx90a -fopenmp test.f90 -o test.out

From what I can gather, from digging a little into the issue, this comes from AMDGPU not supporting DYNAMIC_STACKALLOCA instructions. I think AMDGPU only performs static allocation, but someone with more understanding of that segment of the compiler will know far better than myself.

However, the generation of this code that's unfriendly for AMD GPU, appears to stem from the HLFIR AssignOp, which lowers to a Fortran runtime call, which likely brings in the instruction that requires a dynamic stack allocation instruction (I've unfortunately not found the exact problematic line, but there's a number of areas that might pose the problem).

The solution, that I can currently think of, is to opt out of the HLFIR AssignOp generation for AMD GPU devices or for OpenMP offload (or both) and utilise the old FIR flow, which does not depend on the runtime call. I am not sure how palatable that is for everyone though, as I imagine the intent was to discard this old FIR flow in the near future. I am more than open to other suggestions however! This is just the option I had in mind just now.

It also brings up the possible issue that errors like this are encountered for other cases where HLFIR operations lower to Fortran rutnime calls, but that may be hyperbole as this is the only case I've encountered so far.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2023

@llvm/issue-subscribers-flang-runtime

Author: None (agozillon)

This issue was found during the implementation of the following PR (and is dependent on it): https://github.com//pull/71766

The following example which attempts to map and assign a value to an allocatable variable on device compiles and works for the deprecated FIR flow, but will fail using the new HLFIR flow:

program main
    integer, allocatable :: test
    allocate(test)
    test = 10

!$omp target map(tofrom:test)
    test = 50
!$omp end target

    print *, test

    deallocate(test)
end program

Yielding the following ICE error:

LLVM ERROR: Cannot select: t20: i64,ch = dynamic_stackalloc t16:1, Constant:i64&lt;16&gt;, Constant:i64&lt;0&gt;
  t19: i64 = Constant&lt;16&gt;
  t6: i64 = Constant&lt;0&gt;
In function: __omp_offloading_fd00_4b200ae__QQmain_l5
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --linker-path=/work/agozillo/git/flang-dev/llvm-main-project/build/bin/ld.lld -- -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o single-value-alloca.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/ -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget/DeviceRTL -L/etc/alternatives/rocm/lib /tmp/single-value-alloca-825105.o -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib --whole-archive -lFortran_main --no-whole-archive -lFortranRuntime -lFortranDecimal -lm -lomp -lomptarget -lomptarget.devicertl -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /lib/x86_64-linux-gnu/crtn.o
1.	Running pass 'CallGraph Pass Manager' on module 'ld-temp.o'.
2.	Running pass 'AMDGPU DAG-&gt;DAG Pattern Instruction Selection' on function '@<!-- -->__omp_offloading_fd00_4b200ae__QQmain_l5'
 #<!-- -->0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #<!-- -->1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #<!-- -->2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #<!-- -->3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #<!-- -->4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #<!-- -->5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #<!-- -->6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #<!-- -->7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #<!-- -->8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #<!-- -->9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#<!-- -->10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#<!-- -->11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#<!-- -->12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (.part.0) SelectionDAGISel.cpp:0:0
#<!-- -->13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#<!-- -->14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&amp;) (.part.0) MachineFunctionPass.cpp:0:0
#<!-- -->15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#<!-- -->16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&amp;) CallGraphSCCPass.cpp:0:0
#<!-- -->17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#<!-- -->18 0x0000560af1ee0205 codegen(llvm::lto::Config const&amp;, llvm::TargetMachine*, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex const&amp;) LTOBackend.cpp:0:0
#<!-- -->19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&amp;, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#<!-- -->20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#<!-- -->21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, std::function&lt;llvm::Expected&lt;std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;&gt; (unsigned int, llvm::StringRef, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#<!-- -->22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::SmallVectorImpl&lt;llvm::StringRef&gt;&amp;, llvm::opt::ArgList const&amp;) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#<!-- -->23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int)::'lambda'(auto&amp;)::operator()&lt;llvm::SmallVector&lt;llvm::object::OffloadFile, 3u&gt;&gt;(auto&amp;) const ClangLinkerWrapper.cpp:0:0
#<!-- -->24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int) ClangLinkerWrapper.cpp:0:0
#<!-- -->25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#<!-- -->26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#<!-- -->27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)
 #<!-- -->0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #<!-- -->1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #<!-- -->2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #<!-- -->3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #<!-- -->4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #<!-- -->5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #<!-- -->6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #<!-- -->7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #<!-- -->8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #<!-- -->9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#<!-- -->10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#<!-- -->11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#<!-- -->12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (.part.0) SelectionDAGISel.cpp:0:0
#<!-- -->13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#<!-- -->14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&amp;) (.part.0) MachineFunctionPass.cpp:0:0
#<!-- -->15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#<!-- -->16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&amp;) CallGraphSCCPass.cpp:0:0
#<!-- -->17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#<!-- -->18 0x0000560af1ee0205 codegen(llvm::lto::Config const&amp;, llvm::TargetMachine*, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex const&amp;) LTOBackend.cpp:0:0
#<!-- -->19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&amp;, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#<!-- -->20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#<!-- -->21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, std::function&lt;llvm::Expected&lt;std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;&gt; (unsigned int, llvm::StringRef, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#<!-- -->22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::SmallVectorImpl&lt;llvm::StringRef&gt;&amp;, llvm::opt::ArgList const&amp;) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#<!-- -->23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int)::'lambda'(auto&amp;)::operator()&lt;llvm::SmallVector&lt;llvm::object::OffloadFile, 3u&gt;&gt;(auto&amp;) const ClangLinkerWrapper.cpp:0:0
#<!-- -->24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int) ClangLinkerWrapper.cpp:0:0
#<!-- -->25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#<!-- -->26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#<!-- -->27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)

The command used to compile this and hit the error (should just require having a Clang that's compiled to support AMDGPU and the dependent PR if it's not committed already): flang-new --offload-arch=gfx90a -fopenmp test.f90 -o test.out

From what I can gather, from digging a little into the issue, this comes from AMDGPU not supporting DYNAMIC_STACKALLOCA instructions. I think AMDGPU only performs static allocation, but someone with more understanding of that segment of the compiler will know far better than myself.

However, the generation of this code that's unfriendly for AMD GPU, appears to stem from the HLFIR AssignOp, which lowers to a Fortran runtime call, which likely brings in the instruction that requires a dynamic stack allocation instruction (I've unfortunately not found the exact problematic line, but there's a number of areas that might pose the problem).

The solution, that I can currently think of, is to opt out of the HLFIR AssignOp generation for AMD GPU devices or for OpenMP offload (or both) and utilise the old FIR flow, which does not depend on the runtime call. I am not sure how palatable that is for everyone though, as I imagine the intent was to discard this old FIR flow in the near future. I am more than open to other suggestions however! This is just the option I had in mind just now.

It also brings up the possible issue that errors like this are encountered for other cases where HLFIR operations lower to Fortran rutnime calls, but that may be hyperbole as this is the only case I've encountered so far.

@clementval
Copy link
Contributor

I don't think using the old lowering is a good solution in the long term. Did you compile the runtime for offload?

@agozillon
Copy link
Contributor Author

Perhaps I have not in this case, is there specific build commands required to do this? I'm a little unfamiliar with how the Fortran runtime works in this case admittedly, so I'd appreciate any pointers you can give.

@clementval
Copy link
Contributor

I have not done it myself but there is this post: https://discourse.llvm.org/t/rfc-building-flang-runtime-for-offload-devices/70787

@agozillon
Copy link
Contributor Author

agozillon commented Dec 6, 2023

Thank you @clementval I'll have a look into that thread, I do recollect a little about it, can't remember where we landed on it though but it does seem Slava made some good headway with an initial patch for CUDA/NVPTX.

@agozillon
Copy link
Contributor Author

agozillon commented Dec 7, 2023

I've had a little time to dig into this a little more, it seems that at least the above error has nothing to do with the definition itself as it's not linked in to the module at the time the error occurs, we only have the declaration.

It seems more likely that the allocations (box and scalar) generated for the reallocation case are causing some issues. I think (not completely sure, need to find it/talk to someone) there's an optimisation pass in the backend somewhere that automagically tidies them up where possible, but doesn't handle these in this case (it disappears the allocas we create for kernel arguments on lowering to replace with other instructions at least, but it requires a load access in that particular case). But in any case, if I raise the allocas to the kernel entry point (as when using the generalised LLVM dialect lowering to LLVM IR it injects them in the middle of the kernel currently), I think making them statically allocatable, it bypasses the previous error, however, it hits another wall in the instruction selector where it tries to directly use a FrameIndex (the allocation), which seems to also be a no-go for at least AMDGPU.

So I am still not entirely sure of a fix, but I did notice another possible workaround would be to deactivate the realloc portion of the hlfir::AssignOp operation for AMDGPU/OpenMP target offload by perhaps adding another logic check to https://github.com/llvm/llvm-project/blob/main/flang/lib/Lower/Bridge.cpp#L3491 here to toggle isWholeAllocatableAssignment false (or something along those lines), similar to my original suggestion but less drastic as it's not keeping around the old FIR flow I think.

I suppose one other, perhaps more drastic option would be to create a pass that raised the allocas out of the kernel and turned them into mapped arguments, but that is possibly quite overkill for something like AssignOp (e.g. assigning a single value to an allocatable).

@clementval
Copy link
Contributor

It seems very error prone to have specific lowering for different target. Especially in assignment where Fortran as lots of rule for allocatable and so on.

It feels like the adaptation should be done in a target specific pass like the target-rewrite pass to make the code workaround the error you are seeing.

@agozillon
Copy link
Contributor Author

Thank you @clementval I wasn't aware of this pass, I'll have a look into it!

@agozillon
Copy link
Contributor Author

agozillon commented Dec 14, 2023

I believe I have found a series of small fixes that will get this test working, I'll open a PR or two for it when I am back from vacation, unless someone else manages to get there first:

  • Put allocations inside of the target kernel into address space 5 (all device allocations should be in here) and create address space casts from them to the program address space (0 for AMDGPU). Sergio has a PR up that does at least some of this work (need to test his recent update to see if it works as is with no additional PR required), and Jan had a prior PR that did exactly this but it got stuck in limbo on phabricator, but it is necessary for AMDGPU.
  • Add optnone + noinline to kernels (and possibly all device functions, fortran runtime is already covered if it's compiled by Clang, but user functions from Fortran are not) as Clang currently does, without these the later passes are free to manipulate things and create non-functional IR for AMDGPU it seems. The alternative is to have some reformatting of the kernel once it's lowered to move all allocations to the top of the kernel in the entry block, but this might be a little overkill where optnone/noinline is fine and Clang uses it, it may prevent future issues similar to this down the line.
  • Create a driver change to allow optional include of GPU libc after libFortranRuntime to resolve memcpy etc. uses for device with device variations. It is possible there's a better command line command to force the library include after the fortran runtime inside of the compiler library ordering, but I'm unsure.

So the test will need two dependencies to work, the fortran runtime built for offload and a libc built for offload as well. Alongside some of the non-library related modifications to the IR.

agozillon added a commit to agozillon/llvm-project that referenced this issue Jan 20, 2024
…se function pass to finalize, utilised in convertTarget

This patch seeks to add a mechanism to raise constant
(not ConstantExpr or runtime/dynamic) sized allocations
into the entry block for select functions that have been
inserted into a list for processing. This processing occurs
during the finalize call, after OutlinedInfo regions have
completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for
TargetOp generation in the OpenMP MLIR dialect
lowering to LLVM-IR.

This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimisation
passes doing some unintentional malformed optimisaitions
for AMD kernels (unsure if it occurs for other vendors). If
the allocas are generated inside of the kernel and are
not in the entry block and are subsequently passed to a
function this can lead to required instructions being
erased or manipulated in a way that causes the kernel
to run into a HSA access error.

This fix is related to a series of problems found in:
llvm#74603

This problem primarily presents itself for Flang's HLFIR
AssignOp currently, when utilised with a scalar temporary
constant on the RHS and a descriptor type on the LHS. It
will generate a call to a runtime function, wrap the RHS
temporary in a newly allocated descriptor (an llvm
struct), and pass both the LHS and RHS descriptor into
the runtime function call. This will currently be
embedded into the middle of the target region in the
user entry block, which means the allocas are also
embedded in the middle, which seems to pose
issues when later passes are executed. This issue
may present itself in other HLFIR operations or
unrelated operations that generate allocas as a by
product, but for the moment, this one test case is
the only scenario i've found this problem.

Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive changeset. The other two, that come to mind (but I've
not fully looked into, the former I tried a little with blocks but it
had a few issues I'd need to think through):
*  Having a proper alloca only block (or region) generated for TargetOps
   that we could merge into the entry block that's generated by
   convertTarget's createOutlinedFunction.
* Or diverging a little from Clang's current target generation and using
  the CodeExtractor to generate the user code as an outlined function
  region invoked from the kernel we make, with our kernel arguments
  passed into it. Similar to the current parallel generation. I am not sure
  how well this would intermingle with the existing parallel generation
  though that's layered in.

Both of these methods seem like quite a divergeance from the current
status quo, which I am not entirely sure is meritted for the small test
this change aims to fix.
agozillon added a commit to agozillon/llvm-project that referenced this issue Feb 5, 2024
…se function pass to finalize, utilised in convertTarget

This patch seeks to add a mechanism to raise constant
(not ConstantExpr or runtime/dynamic) sized allocations
into the entry block for select functions that have been
inserted into a list for processing. This processing occurs
during the finalize call, after OutlinedInfo regions have
completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for
TargetOp generation in the OpenMP MLIR dialect
lowering to LLVM-IR.

This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimisation
passes doing some unintentional malformed optimisaitions
for AMD kernels (unsure if it occurs for other vendors). If
the allocas are generated inside of the kernel and are
not in the entry block and are subsequently passed to a
function this can lead to required instructions being
erased or manipulated in a way that causes the kernel
to run into a HSA access error.

This fix is related to a series of problems found in:
llvm#74603

This problem primarily presents itself for Flang's HLFIR
AssignOp currently, when utilised with a scalar temporary
constant on the RHS and a descriptor type on the LHS. It
will generate a call to a runtime function, wrap the RHS
temporary in a newly allocated descriptor (an llvm
struct), and pass both the LHS and RHS descriptor into
the runtime function call. This will currently be
embedded into the middle of the target region in the
user entry block, which means the allocas are also
embedded in the middle, which seems to pose
issues when later passes are executed. This issue
may present itself in other HLFIR operations or
unrelated operations that generate allocas as a by
product, but for the moment, this one test case is
the only scenario i've found this problem.

Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive changeset. The other two, that come to mind (but I've
not fully looked into, the former I tried a little with blocks but it
had a few issues I'd need to think through):
*  Having a proper alloca only block (or region) generated for TargetOps
   that we could merge into the entry block that's generated by
   convertTarget's createOutlinedFunction.
* Or diverging a little from Clang's current target generation and using
  the CodeExtractor to generate the user code as an outlined function
  region invoked from the kernel we make, with our kernel arguments
  passed into it. Similar to the current parallel generation. I am not sure
  how well this would intermingle with the existing parallel generation
  though that's layered in.

Both of these methods seem like quite a divergeance from the current
status quo, which I am not entirely sure is meritted for the small test
this change aims to fix.
agozillon added a commit that referenced this issue Feb 23, 2024
…se function pass to finalize, utilised in convertTarget (#78818)

This patch seeks to add a mechanism to raise constant (not ConstantExpr
or runtime/dynamic) sized allocations into the entry block for select
functions that have been inserted into a list for processing. This
processing occurs during the finalize call, after OutlinedInfo regions
have completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for TargetOp generation in
the OpenMP MLIR dialect lowering to LLVM-IR.

This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimization passes doing
some unintentional malformed optimizations for AMD kernels (unsure if it
occurs for other vendors). If the allocas are generated inside of the
kernel and are not in the entry block and are subsequently passed to a
function this can lead to required instructions being erased or
manipulated in a way that causes the kernel to run into a HSA access
error.

This fix is related to a series of problems found in:
#74603

This problem primarily presents itself for Flang's HLFIR AssignOp
currently, when utilised with a scalar temporary constant on the RHS and
a descriptor type on the LHS. It will generate a call to a runtime
function, wrap the RHS temporary in a newly allocated descriptor (an
llvm struct), and pass both the LHS and RHS descriptor into the runtime
function call. This will currently be
embedded into the middle of the target region in the user entry block,
which means the allocas are also embedded in the middle, which seems to
pose
issues when later passes are executed. This issue may present itself in
other HLFIR operations or unrelated operations that generate allocas as
a by product, but for the moment, this one test case is the only
scenario I've found this problem.

Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive change set. The other two, that come to mind (but I've not
fully looked into, the former I tried a little with blocks but it had a
few issues I'd need to think through):

- Having a proper alloca only block (or region) generated for TargetOps
that we could merge into the entry block that's generated by
convertTarget's createOutlinedFunction.
- Or diverging a little from Clang's current target generation and using
the CodeExtractor to generate the user code as an outlined function
region invoked from the kernel we make, with our kernel arguments passed
into it. Similar to the current parallel generation. I am not sure how
well this would intermingle with the existing parallel generation though
that's layered in.

Both of these methods seem like quite a divergence from the current
status quo, which I am not entirely sure is merited for the small test
this change aims to fix.
@agozillon
Copy link
Contributor Author

This particular case should now be resolved as the PRs that help address it have now landed.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 23, 2024

@llvm/issue-subscribers-openmp

Author: None (agozillon)

This issue was found during the implementation of the following PR (and is dependent on it): https://github.com//pull/71766

The following example which attempts to map and assign a value to an allocatable variable on device compiles and works for the deprecated FIR flow, but will fail using the new HLFIR flow:

program main
    integer, allocatable :: test
    allocate(test)
    test = 10

!$omp target map(tofrom:test)
    test = 50
!$omp end target

    print *, test

    deallocate(test)
end program

Yielding the following ICE error:

LLVM ERROR: Cannot select: t20: i64,ch = dynamic_stackalloc t16:1, Constant:i64&lt;16&gt;, Constant:i64&lt;0&gt;
  t19: i64 = Constant&lt;16&gt;
  t6: i64 = Constant&lt;0&gt;
In function: __omp_offloading_fd00_4b200ae__QQmain_l5
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --linker-path=/work/agozillo/git/flang-dev/llvm-main-project/build/bin/ld.lld -- -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o single-value-alloca.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/lib -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/ -L/home/agozillo/git/flang-dev/llvm-main-project/build/projects/openmp/libomptarget/DeviceRTL -L/etc/alternatives/rocm/lib /tmp/single-value-alloca-825105.o -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib --whole-archive -lFortran_main --no-whole-archive -lFortranRuntime -lFortranDecimal -lm -lomp -lomptarget -lomptarget.devicertl -L/work/agozillo/git/flang-dev/llvm-main-project/build/lib -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /lib/x86_64-linux-gnu/crtn.o
1.	Running pass 'CallGraph Pass Manager' on module 'ld-temp.o'.
2.	Running pass 'AMDGPU DAG-&gt;DAG Pattern Instruction Selection' on function '@<!-- -->__omp_offloading_fd00_4b200ae__QQmain_l5'
 #<!-- -->0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #<!-- -->1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #<!-- -->2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #<!-- -->3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #<!-- -->4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #<!-- -->5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #<!-- -->6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #<!-- -->7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #<!-- -->8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #<!-- -->9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#<!-- -->10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#<!-- -->11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#<!-- -->12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (.part.0) SelectionDAGISel.cpp:0:0
#<!-- -->13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#<!-- -->14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&amp;) (.part.0) MachineFunctionPass.cpp:0:0
#<!-- -->15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#<!-- -->16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&amp;) CallGraphSCCPass.cpp:0:0
#<!-- -->17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#<!-- -->18 0x0000560af1ee0205 codegen(llvm::lto::Config const&amp;, llvm::TargetMachine*, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex const&amp;) LTOBackend.cpp:0:0
#<!-- -->19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&amp;, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#<!-- -->20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#<!-- -->21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, std::function&lt;llvm::Expected&lt;std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;&gt; (unsigned int, llvm::StringRef, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#<!-- -->22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::SmallVectorImpl&lt;llvm::StringRef&gt;&amp;, llvm::opt::ArgList const&amp;) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#<!-- -->23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int)::'lambda'(auto&amp;)::operator()&lt;llvm::SmallVector&lt;llvm::object::OffloadFile, 3u&gt;&gt;(auto&amp;) const ClangLinkerWrapper.cpp:0:0
#<!-- -->24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int) ClangLinkerWrapper.cpp:0:0
#<!-- -->25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#<!-- -->26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#<!-- -->27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)
 #<!-- -->0 0x0000560af190f69f llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x147969f)
 #<!-- -->1 0x0000560af190ce84 SignalHandler(int) Signals.cpp:0:0
 #<!-- -->2 0x00007fd3d84a9420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #<!-- -->3 0x00007fd3d7f4600b raise /build/glibc-BHL3KM/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #<!-- -->4 0x00007fd3d7f25859 abort /build/glibc-BHL3KM/glibc-2.31/stdlib/abort.c:81:7
 #<!-- -->5 0x0000560af071eb98 llvm::ConvertUTF8toUTF32(unsigned char const**, unsigned char const*, unsigned int**, unsigned int*, llvm::ConversionFlags) (.cold) ConvertUTF.cpp:0:0
 #<!-- -->6 0x0000560af22b02bd llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1a2bd)
 #<!-- -->7 0x0000560af22b2a19 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e1ca19)
 #<!-- -->8 0x0000560af0e98117 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa02117)
 #<!-- -->9 0x0000560af22ad240 llvm::SelectionDAGISel::DoInstructionSelection() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e17240)
#<!-- -->10 0x0000560af22ba62e llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e2462e)
#<!-- -->11 0x0000560af22bd758 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1e27758)
#<!-- -->12 0x0000560af22bf446 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (.part.0) SelectionDAGISel.cpp:0:0
#<!-- -->13 0x0000560af0ea1349 AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xa0b349)
#<!-- -->14 0x0000560af1a1e351 llvm::MachineFunctionPass::runOnFunction(llvm::Function&amp;) (.part.0) MachineFunctionPass.cpp:0:0
#<!-- -->15 0x0000560af1285b71 llvm::FPPassManager::runOnFunction(llvm::Function&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdefb71)
#<!-- -->16 0x0000560af15206e7 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&amp;) CallGraphSCCPass.cpp:0:0
#<!-- -->17 0x0000560af1286652 llvm::legacy::PassManagerImpl::run(llvm::Module&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0xdf0652)
#<!-- -->18 0x0000560af1ee0205 codegen(llvm::lto::Config const&amp;, llvm::TargetMachine*, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex const&amp;) LTOBackend.cpp:0:0
#<!-- -->19 0x0000560af1ee080d llvm::lto::backend(llvm::lto::Config const&amp;, std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, unsigned int, llvm::Module&amp;, llvm::ModuleSummaryIndex&amp;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a4a80d)
#<!-- -->20 0x0000560af1ed6c65 llvm::lto::LTO::runRegularLTO(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a40c65)
#<!-- -->21 0x0000560af1ed72b8 llvm::lto::LTO::run(std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;, std::function&lt;llvm::Expected&lt;std::function&lt;llvm::Expected&lt;std::unique_ptr&lt;llvm::CachedFileStream, std::default_delete&lt;llvm::CachedFileStream&gt;&gt;&gt; (unsigned int, llvm::Twine const&amp;)&gt;&gt; (unsigned int, llvm::StringRef, llvm::Twine const&amp;)&gt;) (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x1a412b8)
#<!-- -->22 0x0000560af07d1a5d (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::SmallVectorImpl&lt;llvm::StringRef&gt;&amp;, llvm::opt::ArgList const&amp;) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#<!-- -->23 0x0000560af07d881a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int)::'lambda'(auto&amp;)::operator()&lt;llvm::SmallVector&lt;llvm::object::OffloadFile, 3u&gt;&gt;(auto&amp;) const ClangLinkerWrapper.cpp:0:0
#<!-- -->24 0x0000560af07dee05 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl&lt;llvm::object::OffloadFile&gt;&amp;, llvm::opt::InputArgList const&amp;, char**, int) ClangLinkerWrapper.cpp:0:0
#<!-- -->25 0x0000560af07248e0 main (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x28e8e0)
#<!-- -->26 0x00007fd3d7f27083 __libc_start_main /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:342:3
#<!-- -->27 0x0000560af07c119e _start (/work/agozillo/git/flang-dev/llvm-main-project/build/bin/clang-linker-wrapper+0x32b19e)

The command used to compile this and hit the error (should just require having a Clang that's compiled to support AMDGPU and the dependent PR if it's not committed already): flang-new --offload-arch=gfx90a -fopenmp test.f90 -o test.out

From what I can gather, from digging a little into the issue, this comes from AMDGPU not supporting DYNAMIC_STACKALLOCA instructions. I think AMDGPU only performs static allocation, but someone with more understanding of that segment of the compiler will know far better than myself.

However, the generation of this code that's unfriendly for AMD GPU, appears to stem from the HLFIR AssignOp, which lowers to a Fortran runtime call, which likely brings in the instruction that requires a dynamic stack allocation instruction (I've unfortunately not found the exact problematic line, but there's a number of areas that might pose the problem).

The solution, that I can currently think of, is to opt out of the HLFIR AssignOp generation for AMD GPU devices or for OpenMP offload (or both) and utilise the old FIR flow, which does not depend on the runtime call. I am not sure how palatable that is for everyone though, as I imagine the intent was to discard this old FIR flow in the near future. I am more than open to other suggestions however! This is just the option I had in mind just now.

It also brings up the possible issue that errors like this are encountered for other cases where HLFIR operations lower to Fortran rutnime calls, but that may be hyperbole as this is the only case I've encountered so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants