-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[Clang][AMDGPU] Search TheRock-based device libraries #170590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
TheRock has slightly different path (relative to ROCM_PATH) for device libraries. This commit adds search path for device libraries on a TheRock-based distribution. Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
BackgroundThis is originally submitted as ROCm#739 to the AMD is experimenting with TheRock, new HIP/ROCm build system and distribution. With TheRock, we can install full ROCm SDK as Python packages (and multiple versions can coexist through To fully utilize TheRock-based toolchain, we normally set the environment variable One of the problems is: some programs (such as vLLM) don't use
Clang does not search the latter by default, causing a compiler error and fails to configure the ROCm toolchain on such Clang-dependent programs. Note that, this error does not occur if The workaround below is working yet inconvenient so I hope this issue is resolved in either:
Workaround (without this PR)With the current state, we can avoid device library issues by: export HIP_DEVICE_LIB_PATH=$(rocm-sdk path --root)/lib/llvm/amdgcn/bitcode |
|
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-backend-amdgpu Author: Tsukasa OI (a4lg) ChangesThis PR adds new AMD's ROCm distribution ― TheRock-based device library path to Clang. Full diff: https://github.com/llvm/llvm-project/pull/170590.diff 5 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 87ccd40372681..69ada73342127 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -436,15 +436,21 @@ void RocmInstallationDetector::detectDeviceLibrary() {
if (HasDeviceLibrary)
return;
- // Find device libraries in a legacy ROCm directory structure
- // ${ROCM_ROOT}/amdgcn/bitcode/*
+ // Find device libraries in a ROCm directory structure
auto &ROCmDirs = getInstallationPathCandidates();
for (const auto &Candidate : ROCmDirs) {
+ // Legacy: ${ROCM_PATH}/amdgcn/bitcode/*
LibDevicePath = Candidate.Path;
llvm::sys::path::append(LibDevicePath, "amdgcn", "bitcode");
HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
if (HasDeviceLibrary)
return;
+ // TheRock: ${ROCM_PATH}/lib/llvm/amdgcn/bitcode/*
+ LibDevicePath = Candidate.Path;
+ llvm::sys::path::append(LibDevicePath, "lib", "llvm", "amdgcn", "bitcode");
+ HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
+ if (HasDeviceLibrary)
+ return;
}
}
diff --git a/clang/test/Driver/Inputs/rocm-therock/include b/clang/test/Driver/Inputs/rocm-therock/include
new file mode 120000
index 0000000000000..13265e5ed3db8
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/include
@@ -0,0 +1 @@
+../rocm/include
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
new file mode 120000
index 0000000000000..79d18ba840474
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
@@ -0,0 +1 @@
+../../../rocm/amdgcn
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/share/hip/version b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
new file mode 120000
index 0000000000000..62ff49a023cb9
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
@@ -0,0 +1 @@
+../../../rocm/bin/.hipVersion
\ No newline at end of file
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index effce40d67ebd..f5813c06ae600 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -9,7 +9,7 @@
// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
-// Test subtarget with flushing off by ddefault.
+// Test subtarget with flushing off by default.
// RUN: %clang -### --target=x86_64-linux-gnu \
// RUN: --cuda-gpu-arch=gfx900 \
// RUN: --rocm-path=%S/Inputs/rocm \
@@ -85,6 +85,13 @@
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
+// Test TheRock toolchain layout
+// RUN: %clang -### --target=x86_64-linux-gnu \
+// RUN: --offload-arch=gfx803 -nogpuinc \
+// RUN: --rocm-path=%S/Inputs/rocm-therock \
+// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR-THEROCK
+
// Test finding device lib in resource dir
// RUN: %clang -### --target=x86_64-linux-gnu \
// RUN: --offload-arch=gfx803 -nogpuinc \
@@ -210,6 +217,7 @@
// RESDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm_resource_dir(/|\\\\)lib(64)?(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
// ROCMDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
+// ROCMDIR-THEROCK-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm-therock(/|\\\\)lib(/|\\\\)llvm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
// ALL-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR]]ockl.bc"
|
|
@llvm/pr-subscribers-clang Author: Tsukasa OI (a4lg) ChangesThis PR adds new AMD's ROCm distribution ― TheRock-based device library path to Clang. Full diff: https://github.com/llvm/llvm-project/pull/170590.diff 5 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 87ccd40372681..69ada73342127 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -436,15 +436,21 @@ void RocmInstallationDetector::detectDeviceLibrary() {
if (HasDeviceLibrary)
return;
- // Find device libraries in a legacy ROCm directory structure
- // ${ROCM_ROOT}/amdgcn/bitcode/*
+ // Find device libraries in a ROCm directory structure
auto &ROCmDirs = getInstallationPathCandidates();
for (const auto &Candidate : ROCmDirs) {
+ // Legacy: ${ROCM_PATH}/amdgcn/bitcode/*
LibDevicePath = Candidate.Path;
llvm::sys::path::append(LibDevicePath, "amdgcn", "bitcode");
HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
if (HasDeviceLibrary)
return;
+ // TheRock: ${ROCM_PATH}/lib/llvm/amdgcn/bitcode/*
+ LibDevicePath = Candidate.Path;
+ llvm::sys::path::append(LibDevicePath, "lib", "llvm", "amdgcn", "bitcode");
+ HasDeviceLibrary = CheckDeviceLib(LibDevicePath, Candidate.StrictChecking);
+ if (HasDeviceLibrary)
+ return;
}
}
diff --git a/clang/test/Driver/Inputs/rocm-therock/include b/clang/test/Driver/Inputs/rocm-therock/include
new file mode 120000
index 0000000000000..13265e5ed3db8
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/include
@@ -0,0 +1 @@
+../rocm/include
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
new file mode 120000
index 0000000000000..79d18ba840474
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/lib/llvm/amdgcn
@@ -0,0 +1 @@
+../../../rocm/amdgcn
\ No newline at end of file
diff --git a/clang/test/Driver/Inputs/rocm-therock/share/hip/version b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
new file mode 120000
index 0000000000000..62ff49a023cb9
--- /dev/null
+++ b/clang/test/Driver/Inputs/rocm-therock/share/hip/version
@@ -0,0 +1 @@
+../../../rocm/bin/.hipVersion
\ No newline at end of file
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index effce40d67ebd..f5813c06ae600 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -9,7 +9,7 @@
// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
-// Test subtarget with flushing off by ddefault.
+// Test subtarget with flushing off by default.
// RUN: %clang -### --target=x86_64-linux-gnu \
// RUN: --cuda-gpu-arch=gfx900 \
// RUN: --rocm-path=%S/Inputs/rocm \
@@ -85,6 +85,13 @@
// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR
+// Test TheRock toolchain layout
+// RUN: %clang -### --target=x86_64-linux-gnu \
+// RUN: --offload-arch=gfx803 -nogpuinc \
+// RUN: --rocm-path=%S/Inputs/rocm-therock \
+// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ALL,FLUSHD,ROCMDIR-THEROCK
+
// Test finding device lib in resource dir
// RUN: %clang -### --target=x86_64-linux-gnu \
// RUN: --offload-arch=gfx803 -nogpuinc \
@@ -210,6 +217,7 @@
// RESDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm_resource_dir(/|\\\\)lib(64)?(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
// ROCMDIR-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
+// ROCMDIR-THEROCK-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR:[^"]+(/|\\\\)rocm-therock(/|\\\\)lib(/|\\\\)llvm(/|\\\\)amdgcn(/|\\\\).*]]ocml.bc"
// ALL-SAME: "-mlink-builtin-bitcode" "[[DEVICELIB_DIR]]ockl.bc"
|
| // Find device libraries in a ROCm directory structure | ||
| auto &ROCmDirs = getInstallationPathCandidates(); | ||
| for (const auto &Candidate : ROCmDirs) { | ||
| // Legacy: ${ROCM_PATH}/amdgcn/bitcode/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The true path is $ROCM/lib/llvm/lib/clang/NN/amdgcn/bitcode . I'm not clear on the reason for moving it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"clang -print-resource-dir" will get you up to the amdgcn directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b-summer Wait, in my environment (ROCm 7.1.0), I only find device libraries of the ROCm SDK release at /opt/rocm/amdgcn/bitcode. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, seems distribution-dependency.
On rocm/vllm-dev:rocm7.1.1_navi_ubuntu24.04_py3.12_pytorch_2.8_vllm_0.10.2rc1 container with Ubuntu 24.04,
/opt/rocm/amdgcn is a symbolic link to /opt/rocm/lib/llvm/lib/clang/20/lib/amdgcn.
Would you enlighten me what do I really need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be manually reconstructing the path to the resource directory, which you shouldn't need to do? Is the problem you are scraping the resource directory of a different clang?
We really, really should not have to do this. These should be treated as an integral part of the compiler, taken solely from the resource directory of the current build, and nowhere else. We're going to be stuck handling these cross build uses for a while though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still have the symlink, I'd probably prefer to just leave it as-is and not directly look into the resource directory
This PR adds new AMD's ROCm distribution ― TheRock-based device library path to Clang.