Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with ROCm/HIP fails on a system without GPU #969

Open
lahwaacz opened this issue Apr 28, 2024 · 5 comments
Open

Building with ROCm/HIP fails on a system without GPU #969

lahwaacz opened this issue Apr 28, 2024 · 5 comments

Comments

@lahwaacz
Copy link

The cuda_lt.sh script contains a --offload-arch=native flag for amdclang:

ucc/cuda_lt.sh

Line 31 in c1734db

cmd="${@:3:2} -x hip -target x86_64-unknown-linux-gnu --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=native ${@:5} -fPIC -O3 -o ${pic_filepath}"

This should select the native architecture of the GPU present in the build system. However, if the build system does not have any GPU, the command fails:

$ /opt/rocm/lib/llvm/bin/amdclang -c -x hip -target x86_64-unknown-linux-gnu --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=native ec_rocm_executor_kernel.cu -I/usr/include/ -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include/hip -I/opt/rocm/include -I/opt/rocm/llvm/include -I/opt/rocm/include/hsa -I/opt/rocm/include -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src/components/ec/rocm -fPIC -O3 -o ./.libs/ec_rocm_executor_kernel.o
/opt/rocm/lib/llvm/bin/amdclang -c -x hip -target x86_64-unknown-linux-gnu --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 --offload-arch=native ec_rocm_reduce.cu -I/usr/include/ -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include/hip -I/opt/rocm/include -I/opt/rocm/llvm/include -I/opt/rocm/include/hsa -I/opt/rocm/include -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0 -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src -I/build/openucc/src/ucc-1.3.0/src/components/ec/rocm -fPIC -O3 -o ./.libs/ec_rocm_reduce.o
clang: error: cannot determine amdgcn architecture: /opt/rocm/lib/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'
clang: error: cannot determine amdgcn architecture: /opt/rocm/lib/llvm/bin/amdgpu-arch: ; consider passing it via '--offload-arch'
@edgargabriel
Copy link
Contributor

edgargabriel commented Apr 29, 2024

@lahwaacz thank you for the bug report, we will look into this. The UCC CI checker runs through exactly the same scenario (i.e. compiling UCC with the ROCm stack installed but without an AMD GPU being available). However, because of an issue that we faced with clang-tidy newer than clang-12, we fixed the ROCm version in the UCC CI to ROCm 5.7.1 - which still uses hipcc to compile the kernels vs. the new clang --offload-arch=... approach that we use with ROCm > 6.0.

For now, I think your best options are either to compile on a platform with an AMD GPU present, or change the cuda_lt.sh file and remove the --offload-arch=native argument.

@lahwaacz
Copy link
Author

@edgargabriel Thanks, I've patched it for the Arch Linux package: https://gitlab.archlinux.org/archlinux/packaging/packages/openucc/-/commit/f5618b46d08fa2c41f218366871d17133145cde9#9b9baac1eb9b72790eef5540a1685306fc43fd6c_50_42

@romintomasetti
Copy link

Hi @edgargabriel !

We also encountered the same issue (cannot determine amdgcn architecture), since we're building ucc from inside a docker build step (such that devices like GPUs are not exposed).

It would be nice that we can provide the list of architectures that we want to compile for at the configuration step. For now, there are many ROCm architectures listed in cuda_lt.sh, but we only need a few of them (not to mention those not listed). An option like --offload-arch=A,B,C would be welcome. It could default to what's already in cuda_lt.sh if not provided, for backward compatibility. The same remark can be made for the enabled CUDA architectures. It would be nice if only a chosen subset could be passed when compiling ucc (it would help us reduce the compile time and size).

Note that we circumvented the problem by patching cuda_lt.sh to remove the native offloading.

@edgargabriel
Copy link
Contributor

@romintomasetti thank you, it is on our list, we definitely plan to have it fixed for the next release. I think the fix is not entirely trivial since cuda_lt.sh is used by both cuda and rocm component, so it might require a deeper rework of that part of the code section.

@romintomasetti
Copy link

Also, please note that there is already the option --with-nvcc-gencode for CUDA:

ucc/config/m4/cuda.m4

Lines 109 to 118 in 1522ccf

AS_IF([test "x$cuda_happy" = "xyes"],
[AS_IF([test "x$with_nvcc_gencode" = "xdefault"],
[AS_IF([test $CUDA_MAJOR_VERSION -eq 12],
[NVCC_ARCH="${ARCH7} ${ARCH8} ${ARCH9} ${ARCH10} ${ARCH110} ${ARCH111} ${ARCH120}"],
[AS_IF([test $CUDA_MAJOR_VERSION -eq 11],
[AS_IF([test $CUDA_MINOR_VERSION -lt 1],
[NVCC_ARCH="${ARCH7} ${ARCH8} ${ARCH9} ${ARCH10} ${ARCH110}"],
[NVCC_ARCH="${ARCH7} ${ARCH8} ${ARCH9} ${ARCH10} ${ARCH110} ${ARCH111}"])])])],
[NVCC_ARCH="$with_nvcc_gencode"])
AC_SUBST([NVCC_ARCH], ["$NVCC_ARCH"])])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants