-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocm conflicts with compilers that do not support rocm #43638
base: develop
Are you sure you want to change the base?
Conversation
a5c2b0d
to
0b1f74b
Compare
Need to add |
@pranav-sivaraman my understanding is that |
compilers_supporting_rocm = ("cce", "rocmcc", "clang", "aocc") | ||
conflicts("amdgpu_target=none", when="+rocm") | ||
# If this variable shadows a property, it overrides it | ||
for cmp_name in spack.compilers.supported_compilers(): | ||
if cmp_name not in compilers_supporting_rocm: | ||
conflicts(f"%{cmp_name}", when="+rocm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a mistake to enforce for all ROCmPackage
s. This breaks projects that use CMake's enable_language(HIP)
properly, i.e. use e.g. gcc for host compilation and HIP for device code; exactly like for CUDA. If there are some projects that require using hipcc or similar to compile the whole project, the conflicts could be in those packages instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently not a single package in Spack using CMAKE_HIP_COMPILER
to properly separate cxx from hip compilation. I 100% agree that we need to support this better in the future, but currently my understanding is all of these packages are instead lying to Spack about which compiler is used and then just using the hip compiler for all compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to do something quickly that gets our CI back to correctly running, and then we can refine it later. But if my characterization of the state of the packages is incorrect we can try to do something more clever now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DLA-Future is one of the packages that uses CMAKE_HIP_COMPILER
/enable_language(HIP)
: https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/dla-future/package.py. It's not visible explicitly in the package, but with HIP in the build environment CMake finds it and sets CMAKE_HIP_COMPILER
correctly. pika used to set CMAKE_CXX_COMPILER
to hipcc, but no longer does it and now also relies on CMAKE_HIP_COMPILER
being set:
spack/var/spack/repos/builtin/packages/pika/package.py
Lines 210 to 219 in 9e2558b
# HIP support requires compiling with hipcc for < 0.8.0 | |
if spec.satisfies("@:0.7 +rocm"): | |
args.append(self.define("CMAKE_CXX_COMPILER", spec["hip"].hipcc)) | |
if spec.satisfies("^cmake@3.21.0:3.21.2"): | |
args.append(self.define("__skip_rocmclang", True)) | |
if spec.satisfies("@0.8: +rocm"): | |
rocm_archs = spec.variants["amdgpu_target"].value | |
if "none" not in rocm_archs: | |
rocm_archs = ";".join(rocm_archs) | |
args.append(self.define("CMAKE_HIP_ARCHITECTURES", rocm_archs)) |
I understand the need to get the CI working as soon as possible. If this goes in, how temporary of a solution do you think this will be? I can certainly patch this change out for myself if/when it gets merged, but that feels a bit backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess part of the issue here is that only CachedCMakePackage
s have had the implicit CMAKE_CXX_COMPILER=hipcc
so far and those are the packages that may have been relying on it inadvertently. Regular CMakePackage
s have had to do it correctly, or have a local fix. Putting these conflicts into ROCmPackage
makes it affect also the CMakePackage
s that were doing it correctly in the first place.
Could the added conflicts live in CachedCMakePackage
instead, and replace
# We defined hipcc as top-level compiler for packages when +rocm.
# This avoid problems coming from rocm flags being applied to another compiler.
if "+rocm" in spec:
entries.insert(0, cmake_cache_path("CMAKE_CXX_COMPILER", self.spec["hip"].hipcc))
? It's still semantically the wrong place, but it's as localized as the previous fix was and won't affect regular CMakePackage
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msimberg we're going to change how we approach this, thanks for bringing this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks @becker33!
In #41375 we introduced a bug that rocm packages using
CachedCMakePackage
cannot build with non-hipcc compiler executables. Reading the discussion there and in discussions with @white238, it seems that the line causing the bug was added to address the problem that most compilers do not support the hip language.In this PR, we introduce a conflict between the
+rocm
variant inRocmPackage
and each of the compilers that does not support the hip language. That allows us to remove the line causing the problems without reintroducing the bug that it fixed.@adrienbernede does this address the concerns you had with mixing compilers in #41375 sufficiently?