Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm conflicts with compilers that do not support rocm #43638

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

becker33
Copy link
Member

In #41375 we introduced a bug that rocm packages using CachedCMakePackage cannot build with non-hipcc compiler executables. Reading the discussion there and in discussions with @white238, it seems that the line causing the bug was added to address the problem that most compilers do not support the hip language.

In this PR, we introduce a conflict between the +rocm variant in RocmPackage and each of the compilers that does not support the hip language. That allows us to remove the line causing the problems without reintroducing the bug that it fixed.

@adrienbernede does this address the concerns you had with mixing compilers in #41375 sufficiently?

@spackbot-app spackbot-app bot added build-systems conflicts core PR affects Spack core functionality labels Apr 12, 2024
@pranav-sivaraman
Copy link
Contributor

Need to add clang and aocc to the list of supported compilers for ROCm, based on the other compilers supported in Spack.

@becker33
Copy link
Member Author

@pranav-sivaraman my understanding is that aocc is a cpu-only compiler and cannot compile hip code.

@spackbot-app spackbot-app bot added the gitlab Issues related to gitlab integration label Apr 16, 2024
@pranav-sivaraman
Copy link
Contributor

@becker33 I guess its implied from the AMD compiler disambiguation reference. But I can confirm that I can compile HIP applications with aocc either via CMAKE_HIP_COMPILER=clang++ or just doing clang -x hip --hip-link myapp.cpp.

tgamblin
tgamblin previously approved these changes Apr 17, 2024
@tgamblin tgamblin enabled auto-merge (squash) April 17, 2024 05:26
tgamblin
tgamblin previously approved these changes Apr 17, 2024
tgamblin
tgamblin previously approved these changes Apr 17, 2024
Comment on lines +148 to +153
compilers_supporting_rocm = ("cce", "rocmcc", "clang", "aocc")
conflicts("amdgpu_target=none", when="+rocm")
# If this variable shadows a property, it overrides it
for cmp_name in spack.compilers.supported_compilers():
if cmp_name not in compilers_supporting_rocm:
conflicts(f"%{cmp_name}", when="+rocm")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a mistake to enforce for all ROCmPackages. This breaks projects that use CMake's enable_language(HIP) properly, i.e. use e.g. gcc for host compilation and HIP for device code; exactly like for CUDA. If there are some projects that require using hipcc or similar to compile the whole project, the conflicts could be in those packages instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently not a single package in Spack using CMAKE_HIP_COMPILER to properly separate cxx from hip compilation. I 100% agree that we need to support this better in the future, but currently my understanding is all of these packages are instead lying to Spack about which compiler is used and then just using the hip compiler for all compilation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to do something quickly that gets our CI back to correctly running, and then we can refine it later. But if my characterization of the state of the packages is incorrect we can try to do something more clever now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLA-Future is one of the packages that uses CMAKE_HIP_COMPILER/enable_language(HIP): https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/dla-future/package.py. It's not visible explicitly in the package, but with HIP in the build environment CMake finds it and sets CMAKE_HIP_COMPILER correctly. pika used to set CMAKE_CXX_COMPILER to hipcc, but no longer does it and now also relies on CMAKE_HIP_COMPILER being set:

# HIP support requires compiling with hipcc for < 0.8.0
if spec.satisfies("@:0.7 +rocm"):
args.append(self.define("CMAKE_CXX_COMPILER", spec["hip"].hipcc))
if spec.satisfies("^cmake@3.21.0:3.21.2"):
args.append(self.define("__skip_rocmclang", True))
if spec.satisfies("@0.8: +rocm"):
rocm_archs = spec.variants["amdgpu_target"].value
if "none" not in rocm_archs:
rocm_archs = ";".join(rocm_archs)
args.append(self.define("CMAKE_HIP_ARCHITECTURES", rocm_archs))
.

I understand the need to get the CI working as soon as possible. If this goes in, how temporary of a solution do you think this will be? I can certainly patch this change out for myself if/when it gets merged, but that feels a bit backwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess part of the issue here is that only CachedCMakePackages have had the implicit CMAKE_CXX_COMPILER=hipcc so far and those are the packages that may have been relying on it inadvertently. Regular CMakePackages have had to do it correctly, or have a local fix. Putting these conflicts into ROCmPackage makes it affect also the CMakePackages that were doing it correctly in the first place.

Could the added conflicts live in CachedCMakePackage instead, and replace

        # We defined hipcc as top-level compiler for packages when +rocm.
        # This avoid problems coming from rocm flags being applied to another compiler.
        if "+rocm" in spec:
            entries.insert(0, cmake_cache_path("CMAKE_CXX_COMPILER", self.spec["hip"].hipcc))

? It's still semantically the wrong place, but it's as localized as the previous fix was and won't affect regular CMakePackages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msimberg we're going to change how we approach this, thanks for bringing this up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @becker33!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-systems conflicts core PR affects Spack core functionality gitlab Issues related to gitlab integration update-package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants