Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation of GPU kernel code generates warnings #732

Open
edgargabriel opened this issue Feb 17, 2023 · 5 comments
Open

Compilation of GPU kernel code generates warnings #732

edgargabriel opened this issue Feb 17, 2023 · 5 comments

Comments

@edgargabriel
Copy link
Contributor

When compiling ucc with support for GPUs, two different compilers might be used: the compiler used for compiling the host code (e.g. gcc), and the compiler used for compiling kernel code (e.g. hipcc, nvcc). The two compilers do not necessarily have identical features sets. Configure at the moment only captures the capabilities and features of the host compiler. This can lead to some warning when compiling the GPU kernel code.

One example is shown below, where hipcc does not recognize the attribute "optimize" that is set in arch/cpu.h

/usr/bin/hipcc -c ec_rocm_reduce.cu -I/home/egabriel/UCX/include/ -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include/hip -I/opt/rocm/include -I/opt/rocm/include/hsa -I/opt/rocm/include -I/home/egabriel/ucc -I/home/egabriel/ucc -I/home/egabriel/ucc/src -I/home/egabriel/ucc/src -I/home/egabriel/ucc/src/components/ec/rocm -fPIC -o ./.libs/ec_rocm_reduce.o
In file included from ec_rocm_reduce.cu:8:
In file included from /home/egabriel/ucc/src/components/ec/rocm/ec_rocm.h:11:
In file included from /home/egabriel/ucc/src/components/ec/base/ucc_ec_base.h:11:
In file included from /home/egabriel/ucc/src/utils/ucc_component.h:12:
In file included from /home/egabriel/ucc/src/utils/ucc_parser.h:16:
In file included from /home/egabriel/ucc/src/utils/arch/cpu.h:102:
/home/egabriel/ucc/src/utils/arch/x86_64/cpu.h:26:43: warning: unknown attribute 'optimize' ignored [-Wunknown-attributes]
ucc_cpu_model_t  ucc_arch_get_cpu_model() UCC_F_NOOPTIMIZE;
                                          ^~~~~~~~~~~~~~~~
/home/egabriel/ucc/src/utils/ucc_compiler_def.h:53:41: note: expanded from macro 'UCC_F_NOOPTIMIZE'
#define UCC_F_NOOPTIMIZE __attribute__((optimize("O0")))
@torehl
Copy link

torehl commented Jul 6, 2023

Which version of ROCm? Didn't see this with 5.2.3. I used

$ ../configure --prefix=/cm/shared/apps/ucc/1.2.0 --with-avx --with-sse42 --with-ucx=/cm/shared/apps/ucx/1.14.1 --with-cuda=/cm/shared/apps/cuda11.8/toolkit/11.8.0 --with-nccl --with-profiling --with-rocm=/cm/shared/apps/amd/rocm/5.2.3 --with-rccl

@edgargabriel
Copy link
Contributor Author

Its a very fundamental problem that was also discussed in the UCC developers meetings. I can't recall a ROCm version where I did not see this warning, but the most recent ones I have used it against include 5.4.3, 5.5.1, and 5.6.0

@torehl
Copy link

torehl commented Jan 22, 2024

I see this with AMD ROCM 5.7.1 and ucc 1.2.0 with config

../configure --prefix=/cm/shared/apps/ucc/1.2.0 --with-avx2 --with-sse42 --with-ucx --with-cuda=/cm/shared/apps/cuda12.3/toolkit/12.3.2 --with-nccl --with-profiling --with-valgrind --with-avx --with-rocm=/cm/shared/apps/amd/rocm/5.7.1 --with-mpi=/cm/shared/apps/openmpi4-cuda11.8-ofed5-gcc11/4.1.4 --enable-gtest

and snippet

/cm/shared/apps/amd/rocm/5.7.1/bin/hipcc -c ../../../../../../src/components/ec/rocm/kernel/ec_rocm_executor_kernel.cu -D__HIP_PLATFORM_AMD__ -I/cm/shared/apps/amd/rocm/5.7.1/include/hip -I/cm/shared/apps/amd/rocm/5.7.1/include -I/cm/shared/apps/amd/rocm/5.7.1/include/hsa -I/cm/shared/apps/amd/rocm/5.7.1/include -I/home/torel/workspace/UCC/ucc-1.2.0/Build-x86_64 -I/home/torel/workspace/UCC/ucc-1.2.0 -I/home/torel/workspace/UCC/ucc-1.2.0/src -I/home/torel/workspace/UCC/ucc-1.2.0/Build-x86_64/src -I/home/torel/workspace/UCC/ucc-1.2.0/src/components/ec/rocm -fPIC -o ./.libs/ec_rocm_executor_kernel.o In file included from /home/torel/workspace/UCC/ucc-1.2.0/src/utils/arch/rocm_def.h:16, from /home/torel/workspace/UCC/ucc-1.2.0/src/components/ec/rocm/ec_rocm.h:15, from ../../../../../../src/components/ec/rocm/kernel/ec_rocm_executor_kernel.cu:8: /cm/shared/apps/amd/rocm/5.7.1/include/hip/hip_runtime_api.h:8486:2: error: #error ("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__"); 8486 | #error("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__"); | ^~~~~ In file included from /home/torel/workspace/UCC/ucc-1.2.0/src/utils/arch/rocm_def.h:16, from /home/torel/workspace/UCC/ucc-1.2.0/src/components/ec/rocm/ec_rocm.h:15, from ../../../../../../src/components/ec/rocm/kernel/ec_rocm_reduce.cu:8: /cm/shared/apps/amd/rocm/5.7.1/include/hip/hip_runtime_api.h:8486:2: error: #error ("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__"); 8486 | #error("Must define exactly one of __HIP_PLATFORM_AMD__ or __HIP_PLATFORM_NVIDIA__"); | ^~~~~

@torehl
Copy link

torehl commented Jan 22, 2024

Is there any way to get around this?

@edgargabriel
Copy link
Contributor Author

I am confused: you are talking about ROCm 5.7.1 but than you set --with-cuda=... (instead of --with-rocm=...) and -with-nccl ( insetead of --with-rccl=...). cuda/nccl are for NVidia GPUs, rocm/rccl are for AMD GPUs, you cannot interchange them.

The output that you get is an actual error (not a warning)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants