[ROCm] Custom gfx1100 kernel sample fails to build (clang-offload-bundler not found) #16899

kuhar · 2024-03-26T03:24:56Z

Error:

➜ ninja all iree-test-deps && ctest -j32 --label-exclude '^driver=cuda|metal' --output-on-failure 
[0/2] Re-checking globbed directories...
[53/53] Generating kernels_gfx1100.co
FAILED: samples/custom_dispatch/hip/kernels/kernels_gfx1100.co /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co 
cd /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels && /home/jakub/iree/build/relass/llvm-project/bin/clang-18 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/opt/rocm -fuse-cuid=none -O3 /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co
clang-18: error: unable to execute command: Executable "clang-offload-bundler" doesn't exist!
clang-18: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)

My rocm installation is under /opt/rocm, the version is 5.7.1.

The text was updated successfully, but these errors were encountered:

benvanik · 2024-03-26T03:45:51Z

is it a complete install? my windows SDK has it:

kuhar · 2024-03-26T03:48:38Z

Yes, I even used the cursed amdgpu-pro installer.

➜ ls /opt/rocm/bin 
amdclang     amdclang-cpp  hipcc      hipcc_cmake_linker_helper  hipconfig.pl               hipdemangleatp      hipfc         hipvars.pm    roc-obj-extract  rocm_agent_enumerator
amdclang++   amdflang      hipcc.bin  hipconfig                  hipconvertinplace-perl.sh  hipexamine-perl.sh  hipify-clang  offload-arch  roc-obj-ls       rocminfo
amdclang-cl  amdlld        hipcc.pl   hipconfig.bin              hipconvertinplace.sh       hipexamine.sh       hipify-perl   roc-obj       rocm-smi

raikonenfnu · 2024-03-26T18:12:08Z

I think @sogartar faced something similar?
can you try with my build script here https://gist.github.com/raikonenfnu/7d2843107929b161b12e56c057e8735d to see if the issue persist?

kuhar · 2024-03-26T18:37:04Z

@raikonenfnu can you first confirm where the clang-offload-bundler binary should be? Do you have it under /opt/rocm like Ben or installed system-wide?

kuhar · 2024-03-26T18:37:26Z

We may need to check for this during the cmake configuration step.

raikonenfnu · 2024-03-26T18:58:23Z

I only have it on /opt/rocm/llvm/bin/ not system wide. IIRC the clang commands to generate the bitcode should not need clang-offload-bundler at all.

I also do not have clang-offload-bundler on my env and was able to compile.

raikonenfnu · 2024-03-26T19:02:32Z

Oh wait you are talking about macrokernel not microkernel, so my previous assumption/comments might be correct here. The previous comments were more about microkernel. I need to check a bit more about samples macrokernel.

I think it may be the --rocm-path option? I was able to compile hsaco/co with https://github.com/raikonenfnu/macroHipKernel/blob/main/generate_hsaco.sh#L2-L4

Perhaps missing a nogpulib option?

raikonenfnu · 2024-03-26T19:15:32Z

@kuhar Was able to repro your issue on my system as well. But if I specify export IREE_ROCM_PATH=/opt/rocm, then my error would be:

(EDIT: Deleted log from using -nogpublib )

(EDIT: this one actually works if we point to where the clang-offload-bundler live which is /opt/rocm/llvm/bin)
Seems like if we append rocm llvm path for this it will compile OK:

PATH=$PATH:/opt/rocm/llvm/bin /home/stanley/nod/iree-build-notrace/llvm-project/bin/clang-19 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/opt/rocm -fuse-cuid=none -O3 /home/stanley/nod/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/stanley/nod/iree-build-notrace/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co

kuhar · 2024-03-27T04:56:04Z

Thanks, with this set export PATH="$PATH:/opt/rocm/llvm/bin" it makes more progress and then errors out with:

➜ ninja all                                                                                                               
[0/2] Re-checking globbed directories...
[57/332] Generating rocm_executable_cache_test.bin from executable_cache_test.mlir
FAILED: runtime/plugins/hal/drivers/rocm/cts/rocm_executable_cache_test.bin /home/jakub/iree/build/relass/runtime/plugins/hal/drivers/rocm/cts/rocm_executable_cache_test.bin 
cd /home/jakub/iree/build/relass/runtime/plugins/hal/drivers/rocm/cts && /home/jakub/iree/build/relass/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --compile-mode=hal-executable --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx908 /home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir -o rocm_executable_cache_test.bin --iree-hal-executable-object-search-path=\"/home/jakub/iree/build/relass\"
/home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir:15:1: error: cannot find ROCM bitcode files. Check your installation consistency and in the worst case, set --iree-rocm-bc-dir= to a path on your system.
hal.executable.source public @executable {
^
/home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir:15:1: error: failed to serialize executable for target backend rocm
hal.executable.source public @executable {
^
/home/jakub/iree/iree/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir:15:1: error: failed to serialize executables
hal.executable.source public @executable {
^
[58/332] Generating rocm_command_buffer_dispatch_test.bin from command_buffer_dispatch_test.mlir

I set both IREE_ROCM_PATH as the cmake variable and exported it as an env var. What am I missing @raikonenfnu?

Separately from solving this, why do we even build this test data in the all target? I'd assume it should only be a dependency for iree-test-deps, no?

kuhar · 2024-03-28T20:42:22Z

OK it does work after switching from the rocm installation from the amdgpu-pro installer to https://github.com/nod-ai/TheRock/releases/tag/nightly-staging-20240328.41 , setting -DIREE_ROCM_PATH, and doing a clean bulid.

kuhar · 2024-03-28T20:43:19Z

The last remaining issue is the following error:

➜  ninja iree-test-deps       
[0/2] Re-checking globbed directories...
[1266/1266] Generating kernels_gfx1100.co
FAILED: samples/custom_dispatch/hip/kernels/kernels_gfx1100.co /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co 
cd /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels && /home/jakub/iree/build/relass/llvm-project/bin/clang-19 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/home/jakub/bin/therock -fuse-cuid=none -O3 /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co
In file included from /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu:7:
In file included from /home/jakub/bin/therock/include/hip/hip_runtime.h:62:
In file included from /home/jakub/bin/therock/include/hip/amd_detail/amd_hip_runtime.h:432:
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:194:27: error: use of undeclared identifier 'max'; did you mean 'fmax'?
  194 |   double __logbw = _LOGBd(_fmaxd(_ABSd(__c), _ABSd(__d)));
      |                           ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:45:16: note: expanded from macro '_fmaxd'
   45 | #define _fmaxd max
      |                ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_math_forward_declares.h:73:19: note: 'fmax' declared here
   73 | __DEVICE__ double fmax(double, double);
      |                   ^
In file included from /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu:7:
In file included from /home/jakub/bin/therock/include/hip/hip_runtime.h:62:
In file included from /home/jakub/bin/therock/include/hip/amd_detail/amd_hip_runtime.h:432:
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:227:26: error: use of undeclared identifier 'max'; did you mean 'fmax'?
  227 |   float __logbw = _LOGBf(_fmaxf(_ABSf(__c), _ABSf(__d)));
      |                          ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_complex_builtins.h:46:16: note: expanded from macro '_fmaxf'
   46 | #define _fmaxf max
      |                ^
/home/jakub/iree/build/relass/llvm-project/lib/clang/19/include/__clang_cuda_math_forward_declares.h:74:18: note: 'fmax' declared here
   74 | __DEVICE__ float fmax(float, float);
      |                  ^
2 errors generated when compiling for gfx1100.
ninja: build stopped: subcommand failed

kuhar · 2024-03-28T20:45:24Z

@raikonenfnu @antiagainst should we disable these rocm kernels and make them experimental? They don't seem to work out of the box on a typical linux installation but are included in the main ninja targets all (sic!) and iree-test-deps.

kuhar · 2024-04-11T21:23:02Z

Ping. This still doesn't build for me. After manually patching the cuda kernel, I'm hitting an issue with another tool missing from path:

➜  ninja all iree-test-deps          
[0/2] Re-checking globbed directories...
[638/2136] Generating kernels_gfx1100.co
FAILED: samples/custom_dispatch/hip/kernels/kernels_gfx1100.co /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co 
cd /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels && /home/jakub/iree/build/relass/llvm-project/bin/clang-19 -x hip --offload-device-only --offload-arch=gfx1100 --rocm-path=/home/jakub/bin/therock -fuse-cuid=none -O3 /home/jakub/iree/iree/samples/custom_dispatch/hip/kernels/kernels.cu -o /home/jakub/iree/build/relass/samples/custom_dispatch/hip/kernels/kernels_gfx1100.co
/home/jakub/bin/therock/bin/clang-offload-bundler: error: unable to find 'llvm-objcopy' in path
clang-19: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
[641/2136] Building CXX object tracy/CMakeFiles/IREETracyProfiler.dir/__/__/__/third_party/tracy/profiler/src/main.cpp.o
ninja: build stopped: subcommand failed.

Seems like this needs a very specific system-wide installation.

kuhar added the codegen/rocm ROCm code generation compiler backend label Mar 26, 2024

kuhar assigned raikonenfnu Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Custom gfx1100 kernel sample fails to build (clang-offload-bundler not found) #16899

[ROCm] Custom gfx1100 kernel sample fails to build (clang-offload-bundler not found) #16899

kuhar commented Mar 26, 2024

benvanik commented Mar 26, 2024

kuhar commented Mar 26, 2024

raikonenfnu commented Mar 26, 2024

kuhar commented Mar 26, 2024

kuhar commented Mar 26, 2024

raikonenfnu commented Mar 26, 2024 •

edited

raikonenfnu commented Mar 26, 2024 •

edited

raikonenfnu commented Mar 26, 2024 •

edited

kuhar commented Mar 27, 2024 •

edited

kuhar commented Mar 28, 2024

kuhar commented Mar 28, 2024

kuhar commented Mar 28, 2024

kuhar commented Apr 11, 2024

[ROCm] Custom gfx1100 kernel sample fails to build (clang-offload-bundler not found) #16899

[ROCm] Custom gfx1100 kernel sample fails to build (clang-offload-bundler not found) #16899

Comments

kuhar commented Mar 26, 2024

benvanik commented Mar 26, 2024

kuhar commented Mar 26, 2024

raikonenfnu commented Mar 26, 2024

kuhar commented Mar 26, 2024

kuhar commented Mar 26, 2024

raikonenfnu commented Mar 26, 2024 • edited

raikonenfnu commented Mar 26, 2024 • edited

raikonenfnu commented Mar 26, 2024 • edited

kuhar commented Mar 27, 2024 • edited

kuhar commented Mar 28, 2024

kuhar commented Mar 28, 2024

kuhar commented Mar 28, 2024

kuhar commented Apr 11, 2024

raikonenfnu commented Mar 26, 2024 •

edited

raikonenfnu commented Mar 26, 2024 •

edited

raikonenfnu commented Mar 26, 2024 •

edited

kuhar commented Mar 27, 2024 •

edited