Skip to content

Conversation

@a4lg
Copy link
Contributor

@a4lg a4lg commented Dec 4, 2025

This PR adds new AMD's ROCm distribution ― TheRock-based device library path to Clang.

a4lg added 2 commits December 4, 2025 00:11
TheRock has slightly different path (relative to ROCM_PATH)
for device libraries.  This commit adds search path for device libraries
on a TheRock-based distribution.

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
@a4lg
Copy link
Contributor Author

a4lg commented Dec 4, 2025

Background

This is originally submitted as ROCm#739 to the amd-staging branch.

AMD is experimenting with TheRock, new HIP/ROCm build system and distribution.

With TheRock, we can install full ROCm SDK as Python packages (and multiple versions can coexist through venv).

To fully utilize TheRock-based toolchain, we normally set the environment variable ${ROCM_PATH} to $(rocm-sdk --path root) (rocm-sdk is TheRock-specific command to prepare/test ROCm SDK) to make sure that external programs/libraries can locate TheRock-based ROCm SDK.

One of the problems is: some programs (such as vLLM) don't use hipcc but instead, attempt to use $(rocm-sdk path --root)/lib/llvm/bin/clang++ directly. Second, TheRock has slightly different toolchain layout than the release versions of ROCm SDK.

ROCm SDK Device libraries path
Production Release (7.1) ${ROCM_PATH}/amdgcn/bitcode
TheRock (7.9 / Nightly) ${ROCM_PATH}/lib/llvm/amdgcn/bitcode

Clang does not search the latter by default, causing a compiler error and fails to configure the ROCm toolchain on such Clang-dependent programs.

Note that, this error does not occur if ROCM_PATH is not set because in the fallback path, a ROCm installation candidate path equivalent to ${ROCM_PATH}/lib/llvm is generated. However, it prevents configuring ROCm-dependent programs using standard ROCM_PATH environment variable.

The workaround below is working yet inconvenient so I hope this issue is resolved in either:

  1. TheRock distribution (separate PR required): Change the path of device libraries or make links/copies to the same path as the production releases.
  2. LLVM (this PR): Add TheRock search path

Workaround (without this PR)

With the current state, we can avoid device library issues by:

export HIP_DEVICE_LIB_PATH=$(rocm-sdk path --root)/lib/llvm/amdgcn/bitcode

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Dec 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 4, 2025

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-backend-amdgpu

Author: Tsukasa OI (a4lg)

Changes

This PR adds new AMD's ROCm distribution ― TheRock-based device library path to Clang.


Full diff: https://github.com/llvm/llvm-project/pull/170590.diff

5 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/AMDGPU.cpp (+8-2)
  • (added) clang/test/Driver/Inputs/rocm-therock/include (+1)
  • (added) clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn (+1)
  • (added) clang/test/Driver/Inputs/rocm-therock/share/hip/version (+1)
  • (modified) clang/test/Driver/hip-device-libs.hip (+9-1)
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 87ccd40372681..69ada73342127 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -436,15 +436,21 @@ void RocmInstallationDetector::detectDeviceLibrary() {
   if (HasDeviceLibrary)
     return;
 
-  // Find device libraries in a legacy ROCm directory structure
-  // ${ROCM_ROOT}/amdgcn/bitcode/*
+  // Find device libraries in a ROCm directory structure
   auto &ROCmDirs = getInstallationPathCandidates();
   for (const auto &Candidate : ROCmDirs) {
+    // Legacy: ${ROCM_PATH}/amdgcn/bitcode/*
     LibDevicePath = Candidate.Path;
     llvm::sys::path::append(LibDevicePath, "amdgcn", "bitcode");
     HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
     if (HasDeviceLibrary)
       return;
+    // TheRock: ${ROCM_PATH}/lib/llvm/amdgcn/bitcode/*
+    LibDevicePath = Candidate.Path;
+    llvm::sys::path::append(LibDevicePath, "lib", "llvm", "amdgcn", "bitcode");
+    HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
+    if (HasDeviceLibrary)
+      return;
   }
 }
 
diff --git a/clang/test/Driver/Inputs/rocm-therock/include b/clang/test/Driver/Inputs/rocm-therock/include
new file mode 120000
index 0000000000000..13265e5ed3db8
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/include
@@ -0,0 +1 @@
+../rocm/include
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
new file mode 120000
index 0000000000000..79d18ba840474
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
@@ -0,0 +1 @@
+../../../rocm/amdgcn
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/share/hip/version b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
new file mode 120000
index 0000000000000..62ff49a023cb9
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
@@ -0,0 +1 @@
+../../../rocm/bin/.hipVersion
\ No newline at end of file
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index effce40d67ebd..f5813c06ae600 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -9,7 +9,7 @@
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
 
 
-// Test subtarget with flushing off by ddefault.
+// Test subtarget with flushing off by default.
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:  --cuda-gpu-arch=gfx900 \
 // RUN:  --rocm-path=%S/Inputs/rocm \
@@ -85,6 +85,13 @@
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
 
+// Test TheRock toolchain layout
+// RUN: %clang -### --target=x86_64-linux-gnu \
+// RUN:   --offload-arch=gfx803 -nogpuinc \
+// RUN:   --rocm-path=%S/Inputs/rocm-therock \
+// RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR-THEROCK
+
 // Test finding device lib in resource dir
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:   --offload-arch=gfx803 -nogpuinc \
@@ -210,6 +217,7 @@
 
 // RESDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm_resource_dir(/|\\\\)lib(64)?(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
 // ROCMDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
+// ROCMDIR-THEROCK-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm-therock(/|\\\\)lib(/|\\\\)llvm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
 
 // ALL-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR]]ockl.bc"
 

@llvmbot
Copy link
Member

llvmbot commented Dec 4, 2025

@llvm/pr-subscribers-clang

Author: Tsukasa OI (a4lg)

Changes

This PR adds new AMD's ROCm distribution ― TheRock-based device library path to Clang.


Full diff: https://github.com/llvm/llvm-project/pull/170590.diff

5 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/AMDGPU.cpp (+8-2)
  • (added) clang/test/Driver/Inputs/rocm-therock/include (+1)
  • (added) clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn (+1)
  • (added) clang/test/Driver/Inputs/rocm-therock/share/hip/version (+1)
  • (modified) clang/test/Driver/hip-device-libs.hip (+9-1)
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 87ccd40372681..69ada73342127 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -436,15 +436,21 @@ void RocmInstallationDetector::detectDeviceLibrary() {
   if (HasDeviceLibrary)
     return;
 
-  // Find device libraries in a legacy ROCm directory structure
-  // ${ROCM_ROOT}/amdgcn/bitcode/*
+  // Find device libraries in a ROCm directory structure
   auto &ROCmDirs = getInstallationPathCandidates();
   for (const auto &Candidate : ROCmDirs) {
+    // Legacy: ${ROCM_PATH}/amdgcn/bitcode/*
     LibDevicePath = Candidate.Path;
     llvm::sys::path::append(LibDevicePath, "amdgcn", "bitcode");
     HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
     if (HasDeviceLibrary)
       return;
+    // TheRock: ${ROCM_PATH}/lib/llvm/amdgcn/bitcode/*
+    LibDevicePath = Candidate.Path;
+    llvm::sys::path::append(LibDevicePath, "lib", "llvm", "amdgcn", "bitcode");
+    HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
+    if (HasDeviceLibrary)
+      return;
   }
 }
 
diff --git a/clang/test/Driver/Inputs/rocm-therock/include b/clang/test/Driver/Inputs/rocm-therock/include
new file mode 120000
index 0000000000000..13265e5ed3db8
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/include
@@ -0,0 +1 @@
+../rocm/include
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
new file mode 120000
index 0000000000000..79d18ba840474
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
@@ -0,0 +1 @@
+../../../rocm/amdgcn
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/share/hip/version b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
new file mode 120000
index 0000000000000..62ff49a023cb9
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
@@ -0,0 +1 @@
+../../../rocm/bin/.hipVersion
\ No newline at end of file
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index effce40d67ebd..f5813c06ae600 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -9,7 +9,7 @@
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
 
 
-// Test subtarget with flushing off by ddefault.
+// Test subtarget with flushing off by default.
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:  --cuda-gpu-arch=gfx900 \
 // RUN:  --rocm-path=%S/Inputs/rocm \
@@ -85,6 +85,13 @@
 // RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
 
+// Test TheRock toolchain layout
+// RUN: %clang -### --target=x86_64-linux-gnu \
+// RUN:   --offload-arch=gfx803 -nogpuinc \
+// RUN:   --rocm-path=%S/Inputs/rocm-therock \
+// RUN:   %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR-THEROCK
+
 // Test finding device lib in resource dir
 // RUN: %clang -### --target=x86_64-linux-gnu \
 // RUN:   --offload-arch=gfx803 -nogpuinc \
@@ -210,6 +217,7 @@
 
 // RESDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm_resource_dir(/|\\\\)lib(64)?(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
 // ROCMDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
+// ROCMDIR-THEROCK-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm-therock(/|\\\\)lib(/|\\\\)llvm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
 
 // ALL-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR]]ockl.bc"
 

// Find device libraries in a ROCm directory structure
auto &ROCmDirs = getInstallationPathCandidates();
for (const auto &Candidate : ROCmDirs) {
// Legacy: ${ROCM_PATH}/amdgcn/bitcode/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The true path is $ROCM/lib/llvm/lib/clang/NN/amdgcn/bitcode . I'm not clear on the reason for moving it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"clang -print-resource-dir" will get you up to the amdgcn directory.

Copy link
Contributor Author

@a4lg a4lg Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b-summer Wait, in my environment (ROCm 7.1.0), I only find device libraries of the ROCm SDK release at /opt/rocm/amdgcn/bitcode. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, seems distribution-dependency.
On rocm/vllm-dev:rocm7.1.1_navi_ubuntu24.04_py3.12_pytorch_2.8_vllm_0.10.2rc1 container with Ubuntu 24.04,
/opt/rocm/amdgcn is a symbolic link to /opt/rocm/lib/llvm/lib/clang/20/lib/amdgcn.

Would you enlighten me what do I really need to change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be manually reconstructing the path to the resource directory, which you shouldn't need to do? Is the problem you are scraping the resource directory of a different clang?

We really, really should not have to do this. These should be treated as an integral part of the compiler, taken solely from the resource directory of the current build, and nowhere else. We're going to be stuck handling these cross build uses for a while though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we still have the symlink, I'd probably prefer to just leave it as-is and not directly look into the resource directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants