Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Offload] libomptarget force dlopen vendor libraries by default. #92788

Merged
merged 1 commit into from
May 22, 2024

Conversation

ye-luo
Copy link
Contributor

@ye-luo ye-luo commented May 20, 2024

Since #87009, libomptarget directly links all the plugins statically. All the dependencies of plugins got exposed to libomptarget. The CUDA plugin depends on libcuda and the amdgpu plugin depends on libhsa if not forced using dlopen. On a cluster with different compute node architectures, libomptarget can be built and run on different nodes. In the build stage, if cmake founds libcuda and LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA=OFF, libomptarget links libcuda.so directly and the result libomptarget may not run a node without a NVIDIA driver for example a CPU or AMD GPU only machine with a complaint that libcuda.so not found.

The solution is setting LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA and LIBOMPTARGET_FORCE_DLOPEN_LIBHSA ON. Preferably this should be default to maximize the usability of libomptarget. If cmake detects NVIDIA or AMD software on an OS imaging building node, the resulted libomptarget may not be able to function on the user side due to the requirement the existence of vendor runtime libraries.

Since llvm#87009, libomptarget directly links all the plugins statically. All the dependencies of plugins got exposed to libomptarget.
The CUDA plugin depends on libcuda and the amdgpu plugin depends on libhsa if not forced using dlopen.
On a cluster with different compute node architectures, libomptarget can be built and run on different nodes.
In the build stage, if cmake founds libcuda and `LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA=OFF`, libomptarget links libcuda.so directly and the result libomptarget may not run a node without a NVIDIA driver for example a CPU or AMD GPU only machine with a complaint that libcuda.so not found.

The solution is setting  `LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA` and `LIBOMPTARGET_FORCE_DLOPEN_LIBHSA` `ON`.
Preferably this should be default. If cmake detects NVIDIA or AMD software on an OS imaging building node, the resulted libomptarget may not be able to function on the user side due to the requirement the existence of vendor runtime libraries.
@llvmbot
Copy link
Collaborator

llvmbot commented May 20, 2024

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-backend-amdgpu

Author: Ye Luo (ye-luo)

Changes

Since #87009, libomptarget directly links all the plugins statically. All the dependencies of plugins got exposed to libomptarget. The CUDA plugin depends on libcuda and the amdgpu plugin depends on libhsa if not forced using dlopen. On a cluster with different compute node architectures, libomptarget can be built and run on different nodes. In the build stage, if cmake founds libcuda and LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA=OFF, libomptarget links libcuda.so directly and the result libomptarget may not run a node without a NVIDIA driver for example a CPU or AMD GPU only machine with a complaint that libcuda.so not found.

The solution is setting LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA and LIBOMPTARGET_FORCE_DLOPEN_LIBHSA ON. Preferably this should be default to maximize the usability of libomptarget. If cmake detects NVIDIA or AMD software on an OS imaging building node, the resulted libomptarget may not be able to function on the user side due to the requirement the existence of vendor runtime libraries.


Full diff: https://github.com/llvm/llvm-project/pull/92788.diff

2 Files Affected:

  • (modified) offload/plugins-nextgen/amdgpu/CMakeLists.txt (+1-1)
  • (modified) offload/plugins-nextgen/cuda/CMakeLists.txt (+1-1)
diff --git a/offload/plugins-nextgen/amdgpu/CMakeLists.txt b/offload/plugins-nextgen/amdgpu/CMakeLists.txt
index 2f4057c0ae7ef..7630e3788dae0 100644
--- a/offload/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/offload/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -13,7 +13,7 @@ target_sources(omptarget.rtl.amdgpu PRIVATE src/rtl.cpp)
 target_include_directories(omptarget.rtl.amdgpu PRIVATE
                            ${CMAKE_CURRENT_SOURCE_DIR}/utils)
 
-option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" OFF)
+option(LIBOMPTARGET_FORCE_DLOPEN_LIBHSA "Build with dlopened libhsa" ON)
 if(hsa-runtime64_FOUND AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   message(STATUS "Building AMDGPU plugin linked against libhsa")
   target_link_libraries(omptarget.rtl.amdgpu PRIVATE hsa-runtime64::hsa-runtime64)
diff --git a/offload/plugins-nextgen/cuda/CMakeLists.txt b/offload/plugins-nextgen/cuda/CMakeLists.txt
index 10ff612848ad2..fa5559c5e7dcb 100644
--- a/offload/plugins-nextgen/cuda/CMakeLists.txt
+++ b/offload/plugins-nextgen/cuda/CMakeLists.txt
@@ -10,7 +10,7 @@ add_target_library(omptarget.rtl.cuda CUDA)
 
 target_sources(omptarget.rtl.cuda PRIVATE src/rtl.cpp)
 
-option(LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA "Build with dlopened libcuda" OFF)
+option(LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA "Build with dlopened libcuda" ON)
 if(LIBOMPTARGET_DEP_CUDA_FOUND AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA)
   message(STATUS "Building CUDA plugin linked against libcuda")
   target_link_libraries(omptarget.rtl.cuda PRIVATE CUDA::cuda_driver)

@shiltian shiltian requested a review from jhuber6 May 20, 2024 17:17
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me.

@jhuber6
Copy link
Contributor

jhuber6 commented May 20, 2024

Does #87009 really make a difference? I thought that in the old case we would simply dlopen on the old libomptarget.rtl.cuda.so that would then be linked against libcuda.so directly.

@ye-luo
Copy link
Contributor Author

ye-luo commented May 20, 2024

Does #87009 really make a difference? I thought that in the old case we would simply dlopen on the old libomptarget.rtl.cuda.so that would then be linked against libcuda.so directly.

In the old case, dlopen libomptarget.rtl.cuda.so fails but the user programs continue.

@jhuber6
Copy link
Contributor

jhuber6 commented May 20, 2024

In the old case, dlopen libomptarget.rtl.cuda.so fails but the user programs continue.

I see, I think this makes sense as a sane default. However I'll wait until we discuss it in the Wednesday meeting to make the switch.

@jhuber6
Copy link
Contributor

jhuber6 commented May 21, 2024

Would it be possible for you to do a quick check on one of your applications to see if there's any noticeable performance difference? I'm going to guess that calling dlsym ~100 times isn't going to show up as an appreciable difference.

@ye-luo
Copy link
Contributor Author

ye-luo commented May 21, 2024

Would it be possible for you to do a quick check on one of your applications to see if there's any noticeable performance difference? I'm going to guess that calling dlsym ~100 times isn't going to show up as an appreciable difference.

libomptarget initialization was never an issue on all the machine I have ever used. I don't see why dlopen is a concern. I meant I won't have good measurements for your concern. Maybe you try it out on a machine that you have concern?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good after discussion with others. I will probably rework the logic here later to make it more consistent, but this is fine for now.

@ye-luo ye-luo merged commit 831d143 into llvm:main May 22, 2024
7 checks passed
@ye-luo ye-luo deleted the dlopen-vendor branch May 22, 2024 14:40
@jplehr
Copy link
Contributor

jplehr commented May 22, 2024

This appears to have caused/exposed our buildbot to produce a lot more intermittent failures of two test cases compared to what it used to produce:
https://lab.llvm.org/buildbot/#/builders/193/builds/52340
https://lab.llvm.org/buildbot/#/builders/193/builds/52344
https://lab.llvm.org/buildbot/#/builders/193/builds/52348

Any ideas why that would be happening?

@jhuber6
Copy link
Contributor

jhuber6 commented May 22, 2024

This appears to have caused/exposed our buildbot to produce a lot more intermittent failures of two test cases compared to what it used to produce: https://lab.llvm.org/buildbot/#/builders/193/builds/52340 https://lab.llvm.org/buildbot/#/builders/193/builds/52344 https://lab.llvm.org/buildbot/#/builders/193/builds/52348

Any ideas why that would be happening?

Confusing considering that it's failing on x86 and not CUDA nor HSA which is what this directly affects.

@jplehr
Copy link
Contributor

jplehr commented May 22, 2024

yeah, i wonder if it simply exposes something that has been lingering for some time.

@jhuber6
Copy link
Contributor

jhuber6 commented May 22, 2024

yeah, i wonder if it simply exposes something that has been lingering for some time.

I can't reproduce it locally. Think you could try running it manually w/ debugging?

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request May 24, 2024
Disabled: [Offload] libomptarget force dlopen vendor libraries by default. (llvm#92788)
Change-Id: Ibe7da67ea294b76e9d159372e1d6151fe6b5cba2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants