Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP] Different fail modes for memory_manager.cpp OpenMP test #65077

Open
doru1004 opened this issue Aug 29, 2023 · 6 comments
Open

[OpenMP] Different fail modes for memory_manager.cpp OpenMP test #65077

doru1004 opened this issue Aug 29, 2023 · 6 comments

Comments

@doru1004
Copy link
Contributor

I investigated the memory_manager.cpp test in OpenMP and it looks like, on AMD GPUs, it fails in different ways for different optimization levels.

For -O2 and -O3 the test passes consistently.
For -O1 or no optimization level specified it fails occasionally with a GPU memory error.
For -O0 it doesn't even compile, the trace is below:

clang-linker-wrapper: /home/dobercea/upstream/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp:154: virtual bool llvm::AMDGPUResourceUsageAnalysis::runOnModule(llvm::Module&): Assertion `MF && "function must have been generated already"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper --opt-level=O0 --host-triple=x86_64-unknown-linux-gnu --linker-path=/home/dobercea/upstream/llvm-project/build/./bin/ld.lld -- -pie -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o /home/dobercea/upstream/llvm-project/build/runtimes/runtimes-bins/openmp/libomptarget/test/amdgcn-amd-amdhsa/offloading/Output/memory_manager.cpp.tmp /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/home/dobercea/upstream/llvm-project/build/runtimes/runtimes-bins/openmp/libomptarget -L/home/dobercea/upstream/llvm-project/build/./lib -L/home/dobercea/upstream/llvm-project/build/runtimes/runtimes-bins/openmp/runtime/src -L/home/dobercea/upstream/llvm-project/build/lib/clang/18/lib/x86_64-unknown-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/lib -L/usr/lib -rpath /home/dobercea/upstream/llvm-project/build/runtimes/runtimes-bins/openmp/libomptarget -rpath /home/dobercea/upstream/llvm-project/build/runtimes/runtimes-bins/openmp/runtime/src -rpath /home/dobercea/upstream/llvm-project/build/./lib /tmp/memory_manager-1120de.o -lstdc++ -lm -lomp -lomptarget -lomptarget.devicertl -L/home/dobercea/upstream/llvm-project/build/lib -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /lib/x86_64-linux-gnu/crtn.o
1.      Running pass 'Function register usage analysis' on module 'ld-temp.o'.
 #0 0x00005606ddf20d54 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x00005606ddf1e584 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fd634757420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #3 0x00007fd6341f400b raise /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #4 0x00007fd6341d3859 abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81:7
 #5 0x00007fd6341d3729 get_sysdep_segment_value /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:509:8
 #6 0x00007fd6341d3729 _nl_load_domain /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:970:34
 #7 0x00007fd6341e4fd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
 #8 0x00005606dd257095 (/home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper+0x75e095)
 #9 0x00005606dd8a9252 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper+0xdb0252)
#10 0x00005606de4d83a5 codegen(llvm::lto::Config const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) LTOBackend.cpp:0:0
#11 0x00005606de4d897d llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper+0x19df97d)
#12 0x00005606de4cf004 llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper+0x19d6004)
#13 0x00005606de4cf678 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::function<llvm::Expected<std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>> (unsigned int, llvm::StringRef, llvm::Twine const&)>) (/home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper+0x19d6678)
#14 0x00005606dce28708 (anonymous namespace)::linkBitcodeFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::SmallVectorImpl<llvm::StringRef>&, llvm::opt::ArgList const&) (.constprop.0) ClangLinkerWrapper.cpp:0:0
#15 0x00005606dce2f35a llvm::Error (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::opt::InputArgList const&, char**, int)::'lambda'(auto&)::operator()<llvm::SmallVector<llvm::object::OffloadFile, 3u>>(auto&) const ClangLinkerWrapper.cpp:0:0
#16 0x00005606dce357a6 (anonymous namespace)::linkAndWrapDeviceFiles(llvm::SmallVectorImpl<llvm::object::OffloadFile>&, llvm::opt::InputArgList const&, char**, int) ClangLinkerWrapper.cpp:0:0
#17 0x00005606dcd7d709 main (/home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper+0x284709)
#18 0x00007fd6341d5083 __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:342:3
#19 0x00005606dce17eee _start (/home/dobercea/upstream/llvm-project/build/./bin/clang-linker-wrapper+0x31eeee)
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 29, 2023

@llvm/issue-subscribers-openmp

@shiltian
Copy link
Contributor

This doesn't look like a openmp issue. A AMDGPU backend related issue instead.

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 29, 2023

@llvm/issue-subscribers-backend-amdgpu

@doru1004
Copy link
Contributor Author

doru1004 commented Aug 29, 2023

This doesn't look like a openmp issue. A AMDGPU backend related issue instead.

It may be an OpenMP issue for O1 and default optimization level.

@jdoerfert
Copy link
Member

This doesn't look like a openmp issue. A AMDGPU backend related issue instead.

It may be an OpenMP issue for O1 and default optimization level.

It might be. It seems like 3 different problems. Let's fix them one by one. The crash in the backend is the easiest.
Once that runs we should figure our why no O flag is not equivalent to O0.
Then, we can dive into the O1 test case trying to understand why it crashes.

@doru1004
Copy link
Contributor Author

Update: adding device(0) clauses fixes the intermittent fails for -O1.
For no optimization level the intermittent fails are reduced from 4/5 fails every 100 tries to 1/2 every 500 tries which suggest there may be more than one issue causing intermittent fails so at least one remains.
The compilation fail at -O0 is unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants