Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU][OpenMP] Using HSA API calls for APU detection. #90186

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ThorBl
Copy link
Contributor

@ThorBl ThorBl commented Apr 26, 2024

A simplified APU detection algorithm will be enabled if ROCm 6.1 is available at the compiler's build time. This algorithm is more reliable and simplified, as it doesn't rely on string comparison of the GFX name. With this new algorithm, the system can effectively distinguish between an MI300A and an MI300X without the need for further checking.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 26, 2024

@llvm/pr-subscribers-offload

Author: Thorsten Blaß (ThorBl)

Changes

A simplified APU detection algorithm will be enabled if ROCm 6.1 is available at the compiler's build time. This algorithm is more reliable and simplified, as it doesn't rely on string comparison of the GFX name. With this new algorithm, the system can effectively distinguish between an MI300A and an MI300X without the need for further checking.


Full diff: https://github.com/llvm/llvm-project/pull/90186.diff

3 Files Affected:

  • (modified) offload/plugins-nextgen/amdgpu/CMakeLists.txt (+3-1)
  • (added) offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h (+16)
  • (modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+22-1)
diff --git a/offload/plugins-nextgen/amdgpu/CMakeLists.txt b/offload/plugins-nextgen/amdgpu/CMakeLists.txt
index f5f7096137c20f..5fb4a37b6604d2 100644
--- a/offload/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/offload/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -40,7 +40,9 @@ if(hsa-runtime64_FOUND AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   target_link_libraries(omptarget.rtl.amdgpu PRIVATE hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU plugin for dlopened libhsa")
-  target_include_directories(omptarget.rtl.amdgpu PRIVATE dynamic_hsa)
+  target_include_directories(omptarget.rtl.amdgpu 
+                            PRIVATE dynamic_hsa 
+                            PRIVATE dynamic_hip)
   target_sources(omptarget.rtl.amdgpu PRIVATE dynamic_hsa/hsa.cpp)
 endif()
 
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h b/offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h
new file mode 100644
index 00000000000000..496331cf86ec8f
--- /dev/null
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h
@@ -0,0 +1,16 @@
+#ifndef HIP_VERSION
+#define HIP_VERSION
+
+#define HIP_VERSION_MAJOR 0
+#define HIP_VERSION_MINOR 0
+#define HIP_VERSION_PATCH 0
+#define HIP_VERSION_GITHASH ""
+#define HIP_VERSION_BUILD_ID 0
+#define HIP_VERSION_BUILD_NAME ""
+#define HIP_VERSION                                                            \
+  (HIP_VERSION_MAJOR * 10000000 + HIP_VERSION_MINOR * 100000 +                 \
+   HIP_VERSION_PATCH)
+
+#define __HIP_HAS_GET_PCH 0
+
+#endif /* HIP_VERSION */
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 00650b801b4202..58323dbd7233d2 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -57,6 +57,12 @@
 #endif
 
 #if defined(__has_include)
+#if __has_include("hip/hip_version.h")
+#include "hip/hip_version.h"
+#else
+#include "hip_version.h"
+#endif
+
 #if __has_include("hsa/hsa.h")
 #include "hsa/hsa.h"
 #include "hsa/hsa_ext_amd.h"
@@ -2809,7 +2815,21 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
 
   /// Detect if current architecture is an APU.
   Error checkIfAPU() {
-    // TODO: replace with ROCr API once it becomes available.
+#if (HIP_VERSION_MAJOR >= 6 && HIP_VERSION_MINOR >= 1)
+
+    uint8_t MemoryProperties[8];
+
+    hsa_status_t Stat = hsa_agent_get_info(
+        getAgent(), (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES,
+        MemoryProperties);
+
+    if (auto Err = Plugin::check(Stat, "Error: Unable to fetch the memory "
+                                       "properties of the GPU. (%s)\n"))
+      return Err;
+
+    IsAPU = hsa_flag_isset64(MemoryProperties,
+                             HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU);
+#else
     llvm::StringRef StrGfxName(ComputeUnitKind);
     IsAPU = llvm::StringSwitch<bool>(StrGfxName)
                 .Case("gfx940", true)
@@ -2832,6 +2852,7 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
       IsAPU = true;
       return Plugin::success();
     }
+#endif
     return Plugin::success();
   }
 

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 26, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Thorsten Blaß (ThorBl)

Changes

A simplified APU detection algorithm will be enabled if ROCm 6.1 is available at the compiler's build time. This algorithm is more reliable and simplified, as it doesn't rely on string comparison of the GFX name. With this new algorithm, the system can effectively distinguish between an MI300A and an MI300X without the need for further checking.


Full diff: https://github.com/llvm/llvm-project/pull/90186.diff

3 Files Affected:

  • (modified) offload/plugins-nextgen/amdgpu/CMakeLists.txt (+3-1)
  • (added) offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h (+16)
  • (modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+22-1)
diff --git a/offload/plugins-nextgen/amdgpu/CMakeLists.txt b/offload/plugins-nextgen/amdgpu/CMakeLists.txt
index f5f7096137c20f..5fb4a37b6604d2 100644
--- a/offload/plugins-nextgen/amdgpu/CMakeLists.txt
+++ b/offload/plugins-nextgen/amdgpu/CMakeLists.txt
@@ -40,7 +40,9 @@ if(hsa-runtime64_FOUND AND NOT LIBOMPTARGET_FORCE_DLOPEN_LIBHSA)
   target_link_libraries(omptarget.rtl.amdgpu PRIVATE hsa-runtime64::hsa-runtime64)
 else()
   libomptarget_say("Building AMDGPU plugin for dlopened libhsa")
-  target_include_directories(omptarget.rtl.amdgpu PRIVATE dynamic_hsa)
+  target_include_directories(omptarget.rtl.amdgpu 
+                            PRIVATE dynamic_hsa 
+                            PRIVATE dynamic_hip)
   target_sources(omptarget.rtl.amdgpu PRIVATE dynamic_hsa/hsa.cpp)
 endif()
 
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h b/offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h
new file mode 100644
index 00000000000000..496331cf86ec8f
--- /dev/null
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hip/hip_version.h
@@ -0,0 +1,16 @@
+#ifndef HIP_VERSION
+#define HIP_VERSION
+
+#define HIP_VERSION_MAJOR 0
+#define HIP_VERSION_MINOR 0
+#define HIP_VERSION_PATCH 0
+#define HIP_VERSION_GITHASH ""
+#define HIP_VERSION_BUILD_ID 0
+#define HIP_VERSION_BUILD_NAME ""
+#define HIP_VERSION                                                            \
+  (HIP_VERSION_MAJOR * 10000000 + HIP_VERSION_MINOR * 100000 +                 \
+   HIP_VERSION_PATCH)
+
+#define __HIP_HAS_GET_PCH 0
+
+#endif /* HIP_VERSION */
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 00650b801b4202..58323dbd7233d2 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -57,6 +57,12 @@
 #endif
 
 #if defined(__has_include)
+#if __has_include("hip/hip_version.h")
+#include "hip/hip_version.h"
+#else
+#include "hip_version.h"
+#endif
+
 #if __has_include("hsa/hsa.h")
 #include "hsa/hsa.h"
 #include "hsa/hsa_ext_amd.h"
@@ -2809,7 +2815,21 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
 
   /// Detect if current architecture is an APU.
   Error checkIfAPU() {
-    // TODO: replace with ROCr API once it becomes available.
+#if (HIP_VERSION_MAJOR >= 6 && HIP_VERSION_MINOR >= 1)
+
+    uint8_t MemoryProperties[8];
+
+    hsa_status_t Stat = hsa_agent_get_info(
+        getAgent(), (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES,
+        MemoryProperties);
+
+    if (auto Err = Plugin::check(Stat, "Error: Unable to fetch the memory "
+                                       "properties of the GPU. (%s)\n"))
+      return Err;
+
+    IsAPU = hsa_flag_isset64(MemoryProperties,
+                             HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU);
+#else
     llvm::StringRef StrGfxName(ComputeUnitKind);
     IsAPU = llvm::StringSwitch<bool>(StrGfxName)
                 .Case("gfx940", true)
@@ -2832,6 +2852,7 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
       IsAPU = true;
       return Plugin::success();
     }
+#endif
     return Plugin::success();
   }
 

@ThorBl ThorBl changed the title Using HSA API calls for APU detection. [AMDGPU] Using HSA API calls for APU detection. Apr 26, 2024
@ThorBl ThorBl changed the title [AMDGPU] Using HSA API calls for APU detection. [AMDGPU][OpenMP] Using HSA API calls for APU detection. Apr 26, 2024
Comment on lines +2826 to +2827
if (auto Err = Plugin::check(Stat, "Error: Unable to fetch the memory "
"properties of the GPU. (%s)\n"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a better / not generic error from Stat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware there is a way to get a more precise error description than what is encoded in the hsa_status_t enum. I will have another look to see if I can find anything, though.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this require HIP at all? We really don't want to link against the HIP runtime.

#else
#include "hip_version.h"
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that HSA didn't bump up a HSA_AMD_INTERFACE_VERSION_MINOR for this feature or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right! They didn't bump up HSA_AMD_INTERFACE_VERSION_MINOR. Despite new features, the HSA versions shipped with (at least) ROCm 6.0.2 and ROCm 6.1 have the same HSA_AMD_INTERFACE_VERSION_MINOR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we need to be more vocal to the HSA developers that we really need the API to be versioned whenever a new interface is added. I really don't want us to depend on HIP here just for a version check that should be in HSA anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to one of the HSA guys. HSA_AMD_INTERFACE_VERSION_MINOR will be bumped up and made available with the 6.2 release. An internal PR for this has already been created. So, I tend to ride it out. I will adjust the #if condition to check for HSA_AMD_INTERFACE_VERSION_MINOR >= 5.

Copy link
Contributor

@jhuber6 jhuber6 Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so we'll automatically pick this up once the ROCm installation upgrades? How about in the dlopen(libhsa-runtime64.so case? I think we want an actual function to get the versions as well in those cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if ROCm 6.2 is eventually available during the build process, the new implementation will be used.

Regarding your second question. Are you referring to a method that determines the HSA version at runtime?

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that HSA bumped up to 6.1 and introduced new runtime functions but neglected to bump up any sort of version indicator that would let us detect if they are available. So now, even though these are HSA functions, we're including a completely unrelated library just to check if they exist. This is a really annoying situation and I wish HSA would actually version things.

@carlobertolli carlobertolli self-requested a review April 26, 2024 15:45
@jhuber6
Copy link
Contributor

jhuber6 commented Apr 28, 2024

Honestly I think we should probably just add a version for this into hsa_amd_ext and wait for 6.2.

A simplified APU detection algorithm will be enabled if ROCm 6.1 is available at the compiler's build time. This algorithm is more reliable and simplified, as it doesn't rely on string comparison of the GFX name. With this new algorithm, the system can effectively distinguish between an MI300A and an MI300X without the need for further checking.
@ThorBl ThorBl force-pushed the thoblass/UseHsaForAPUDetection branch from e9635da to 127c59e Compare April 30, 2024 10:18
Copy link

github-actions bot commented Apr 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants