Add variant to allow unsupported compiler & CUDA combinations #19736

davidbeckingsale · 2020-11-04T18:44:00Z

I would be totally fine changing the variant name, and how it gets applied (e.g. should it remove all compiler version checking) - it's currently just doing ones I have tried that work with a patched CUDA header like we have at LLNL.

davidbeckingsale · 2020-11-04T18:44:14Z

CC @white238

adamjstewart · 2020-11-04T18:48:33Z

@ax3l @Rombur

lib/spack/spack/build_systems/cuda.py

Rombur · 2020-11-05T14:10:02Z

how it gets applied (e.g. should it remove all compiler version checking) - it's currently just doing ones I have tried that work with a patched CUDA header like we have at LLNL

Yes, I think we should remove the compiler check. It's kind of strange to say that we allow unsupported compiler but we still check them. If someone wants to use this option, it should be their responsibility to make sure it works.

davidbeckingsale · 2020-11-05T16:31:47Z

Got it, I will make that change.

davidbeckingsale · 2020-11-18T23:36:15Z

I've added the variant everywhere. The only question I have is on the name of it - happy to take suggestions for something more descriptive (and potentially shorter) 😄

alalazo · 2020-11-19T10:04:34Z

Just my two cents, but I think this PR needs to add patching of the cuda headers in the cuda package otherwise, if I recall correctly:

foo+cuda+allow-unsupported-compilers ^cuda

will hit an #error directive at build-time. What we could do is have a variant also in cuda and something like:

depends_on('cuda+unsafe', when='+allow-unsupported-compilers')

in CudaPackage. Does that make sense to you?

Regarding the naming, my proposal would be to use unsafe also in CudaPackage:

 variant('unsafe', default=False,
        description='Allow combinations of host compiler and CUDA that are not officially supported')

but feel free to discard it if it sounds too negative and patching vendor headers is a common practice.

ax3l · 2020-11-30T22:40:24Z

shorter

How about check-host-cxx or supported-host-cxx? ;-) Also omits the negation.

naromero77 · 2020-12-01T17:33:21Z

Here is another idea for naming the variant

variant('patch', default='False', 
            description='Allow combinations of host compiler and CUDA that are not officially supported using a patched header file')

unsafe is OK, but a bit ambigous IMO. I would also be OK with naming the variant unsupported.

@alalazo is correct, you would probably hit a compile-time error from the CUDA header files.

It would be desirable to get the patched header file, but I don't think its critical -- since, we have already said that it is not officially supported.

ax3l · 2021-01-08T17:58:06Z

Let's not forget to document the new capability in the docs alongside the improved wording from #20742 once this PR is ready.

ax3l · 2021-01-08T17:59:29Z

path: naming could be too unspecific, this will land as a variant in all downstream packages that pull this mixin class :)

becker33 · 2021-04-28T14:35:15Z

We should keep the current name -- it's the name of the matching option to nvcc.

ax3l · 2021-06-12T20:17:46Z

PR looks ok to me - it will just be confusing for packages that support HIP, SYCL, CUDA, etc. at the same time and now see an allow-unsupported-compilers option that only applies to CUDA.

becker33 · 2021-06-16T18:20:21Z

I think this is ready to go once the style failure is resolved. @davidbeckingsale can you fix the style check?

becker33 · 2021-06-16T18:20:58Z

Looks like the issue is just a bunch of overly long lines.

davidbeckingsale · 2021-06-16T18:28:24Z

@becker33 I'll wait until after our meeting tomorrow. I think @tgamblin had some other potential solutions to this issue that would avoid the variant.

becker33 · 2021-06-16T20:25:00Z

@davidbeckingsale ok. I think most likely any answer starts with "merge this and then ...", based on conversation with @tgamblin today.

…sted-cuda-versions

alalazo · 2021-07-29T14:01:37Z

Not sure what is the state of this PR, but in case there's intention to push it forward we have a when context manager that might be of use e.g.:

with when('~allow-unsupported-compilers'):
    conflicts('%gcc@5:', when='+cuda ^cuda@:7.5 target=x86_64:')
    ...

davidbeckingsale · 2021-08-02T15:49:42Z

Not sure what is the state of this PR, but in case there's intention to push it forward we have a when context manager that might be of use e.g.:
with when('~allow-unsupported-compilers'):
    conflicts('%gcc@5:', when='+cuda ^cuda@:7.5 target=x86_64:')
    ...

It's almost done, but I went on vacation :D I'll try the context manager again - I did give it a shot in commit 82cd07a but I think I might have not expressed the condition quite right as I couldn't get it working.

…sted-cuda-versions

davidbeckingsale · 2021-09-01T15:52:23Z

Hey all, afaik this is now ready. Please let me know if there are any final changes you would like to see.

haampie · 2021-10-13T19:39:45Z

Unfortunately we should rethink this approach, because the unintended side effect that clingo outsmarts us by simply toggling this variant to avoid the conflicts, which the user will only notice after a multi-GB cuda download & installation has finished :(

Case in point:

spack spec sirius %gcc@9: +cuda ^cuda@:9

gives

    ^cuda@9.2.88%gcc@10.3.0+allow-unsupported-compilers~dev arch=linux-ubuntu20.04-zen2

alalazo · 2021-10-14T09:09:56Z

What would people think about:

packages:
  all:  # this could be "sirius" or anything if we want to be more specific
    conflicts:
    - spec: '^cuda+allowed-unsupported-compilers'
      msg: 'Unsupported compilers for CUDA are prohibited'

to add extra user defined conflicts to specs? I am currently trying to think of a way that is:

Backward compatible
Minimal in code changes

since we want to tag v0.17 soon. Doing something like this I think permits to reuse all the code we already have for conflicts declared in packages. The idea is that conflicts are a way to prevent clingo to look in certain places of the search space, which is what we want here. The drawback would probably be the backward logic: if I want to use only vetted compilers I have to not allow unsupported (double negation), but that is partially due to the modelling in cuda.

alalazo · 2021-10-14T09:12:32Z

I guess the question is: would the extension be worthwhile beside this single use case?

haampie · 2021-10-14T09:31:24Z

I guess the question is: would the extension be worthwhile beside this single use case?

This is important. IIUC the reason allow-unsupported-compilers was added to cuda/package.py was because compilers were used that don't really identify as compilers known to Spack (clang on an untagged commit), and then with this variant, there was single point of reference to enable/disable constraints put on all dependents of cuda, which is convenient.

However, if that's the use case, then the risk is that this opens the floodgates of adding allow-unsupported-compilers to many packages that also define constraints on the compiler. Sure, that won't happen, and maybe you just need it on cuda, but still, the toggle on the cuda package may not be the right solution for this problem? I've suggested a hacky workaround to register your untagged clang 10-and-a-half compiler both as clang@10 and clang@11 (so, 2 compiler specs, same underlying compiler paths), so that you don't have to commit to a particular version, but use say %clang@10:11, and that way you don't end up in concretization errors on clang@:10 and clang@11: constraints, as clingo may change compilers on a per package basis to satisfy constraints.

The other reason why allow-unsupported-compilers may have been introduced is: an advanced user knows what they are doing, they compile cuda code, they mix it with an incompatible host compiler, but just happen to know that how they do it is justified, therefore that package does not need the compiler constraint. However, instead of depending on ^cuda +allow-supported-compilers, this package can decide just not to inherit from CudaPackage because it is too stringent. Or we can modify CudaPackage to not add compiler constraints unconditionally. Or we could make CudaPackage inherit from CudaPackageWithoutTheConstraints to offer the user a way to get some of the cuda-template-goodness without the strictness. There is no need to put a variant on cuda/package.py in any of these solutions, because the constraints are defined by the dependent.

What would people think about

I think it's kinda similar to what has been proposed for environments before too, where environments are kinda interpreted as declarative package definitions (in particular so when concrezation: together), where the specs-section is like declarative depends_on statements, and it wouldn't hurt to be able to specify conflicts there too, as well as when clauses in the list of specs.

davidbeckingsale · 2021-10-14T15:11:50Z

I like general approach of constraining the clingo search space. Since disabling host compiler checks for CUDA is an "expert user" type thing, maybe the default Spack configuration should include the snippet @alalazo posted.

I think @haampie has done a good job explaining the use cases, but to add a more concrete example: We have CUDA+compiler combinations on machines at LLNL that absolutely work, but are caught by the conflicts. Our CUDA installs are patched to remove the compiler version checks and allow this to work. The way I view the issue is that if there is no way to remove constraints, then Spack packages/build_systems (like CUDA) cannot aggressively add constraints that they have no way of verifying.

I think the way this is modeled is confusing because the variant ends up on the cuda build system, but the conflict checking has to go in each CUDA package - even though these conflicts have nothing to do with the specific packages. Having conflicts on a per-package basis I am not sure how it works because if one package decides to disable the conflicts you have to impose that same disabling on every downstream CUDA package.

becker33 · 2021-10-14T16:22:47Z

In cuda 11+, there is also a compiler flag to nvcc to disable the same checks. It's not just relevant for the old hacked llnl installs. So if you use that flag, you don't have the same conflicts (although we might eventually discover conflicts where NVIDIA is actually correct that it doesn't work)

haampie · 2021-10-14T17:58:34Z

Just to list yet another solution (i know it's a hack but just to have mentioned it): virtual cuda, 2 providers of which 1 strict and 1 not strict, cudapakage's conflicts are on the strict cuda package, and default/packages.yaml only lists cuda-strict as a provider. Then you can opt out of all constraints by overriding the provider list with [cuda-nonstrict].

(For the time being, should we merge #26721? I've had bad experiences building an environment from scratch only to realize that the root spec did not build because the cuda conflict was not satisified on cray for gcc 10 as the conflict only applied to linux back then -- it wasn't just a rebuild of one package, it had to recompile the entire environment with gcc 9.)

alalazo · 2022-01-27T09:59:42Z

We discussed this issue again yesterday at the weekly telco https://github.com/spack/spack/wiki/Telcon%3A-2022-01-26 @bvanessen has a need for this feature on CUDA to be able to run on machines where the tested compiler + CUDA version is not among the officially supported ones.

I'll submit a PR that will add a sticky argument to variants:

variant('allow-unsupported-compilers', default=False, sticky=True,
        description='Allow unsupported host compiler and CUDA version combinations')

When sticky=True any variant that is not explicitly set in a literal spec takes its default value (without the possibility for clingo to change it because of conflicts).

davidbeckingsale added 4 commits October 2, 2020 11:56

Add allow-untested-versions variant

709b46f

Fixup testing spec

f96e256

Merge branch 'develop' into feature/allow-untested-cuda-versions

471a179

Rename variant

4ab2454

adamjstewart added cuda new-variant labels Nov 4, 2020

white238 reviewed Nov 5, 2020

View reviewed changes

lib/spack/spack/build_systems/cuda.py Outdated Show resolved Hide resolved

Add variant to all conflicts

a991a5f

ax3l requested review from ax3l and naromero77 November 30, 2020 22:43

Merge branch 'develop' into feature/allow-untested-cuda-versions

1c92d83

davidbeckingsale force-pushed the feature/allow-untested-cuda-versions branch from cce80fd to 1c92d83 Compare June 10, 2021 23:24

romerojosh mentioned this pull request Jun 28, 2021

Add host compiler compatibility check to CUDA package. #24540

Closed

ax3l previously approved these changes Jun 28, 2021

View reviewed changes

davidbeckingsale added 7 commits July 1, 2021 15:47

Fix botched merge

a218e64

Switch to with when() contexts

7358f4f

Merge remote-tracking branch 'origin/develop' into feature/allow-unte…

49b67a6

…sted-cuda-versions

Fix typo

f314428

Merge remote-tracking branch 'origin/develop' into feature/allow-unte…

8c678cb

…sted-cuda-versions

Import spack.multimethod to get access to when

82cd07a

Remove when usage

46b22d0

davidbeckingsale added 4 commits August 3, 2021 14:55

Merge remote-tracking branch 'origin/develop' into feature/allow-unte…

13df335

…sted-cuda-versions

Merge branch 'develop' into feature/allow-untested-cuda-versions

e7543e9

Add when context

eb43da2

Fix style issues

6aa7595

davidbeckingsale requested review from white238, Rombur and ax3l September 1, 2021 15:52

tgamblin approved these changes Sep 1, 2021

View reviewed changes

tgamblin merged commit aabece4 into spack:develop Sep 1, 2021

ax3l mentioned this pull request Oct 18, 2021

Revert #19736 because conflicts are avoided by clingo by default #26721

Merged

alalazo mentioned this pull request Jan 27, 2022

Add a "sticky" property to variants #28630

Merged

4 tasks

adrienbernede mentioned this pull request Jan 31, 2023

New shared compiler specs LLNL/radiuss-shared-ci#15

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add variant to allow unsupported compiler & CUDA combinations #19736

Add variant to allow unsupported compiler & CUDA combinations #19736

davidbeckingsale commented Nov 4, 2020 •

edited

davidbeckingsale commented Nov 4, 2020

adamjstewart commented Nov 4, 2020

Rombur commented Nov 5, 2020

davidbeckingsale commented Nov 5, 2020

davidbeckingsale commented Nov 18, 2020

alalazo commented Nov 19, 2020

ax3l commented Nov 30, 2020 •

edited

naromero77 commented Dec 1, 2020

ax3l commented Jan 8, 2021

ax3l commented Jan 8, 2021

becker33 commented Apr 28, 2021

ax3l commented Jun 12, 2021

becker33 commented Jun 16, 2021

becker33 commented Jun 16, 2021

davidbeckingsale commented Jun 16, 2021

becker33 commented Jun 16, 2021

alalazo commented Jul 29, 2021 •

edited

davidbeckingsale commented Aug 2, 2021

davidbeckingsale commented Sep 1, 2021

haampie commented Oct 13, 2021

alalazo commented Oct 14, 2021

alalazo commented Oct 14, 2021

haampie commented Oct 14, 2021 •

edited

davidbeckingsale commented Oct 14, 2021

becker33 commented Oct 14, 2021

haampie commented Oct 14, 2021 •

edited

alalazo commented Jan 27, 2022

Add variant to allow unsupported compiler & CUDA combinations #19736

Add variant to allow unsupported compiler & CUDA combinations #19736

Conversation

davidbeckingsale commented Nov 4, 2020 • edited

davidbeckingsale commented Nov 4, 2020

adamjstewart commented Nov 4, 2020

Rombur commented Nov 5, 2020

davidbeckingsale commented Nov 5, 2020

davidbeckingsale commented Nov 18, 2020

alalazo commented Nov 19, 2020

ax3l commented Nov 30, 2020 • edited

naromero77 commented Dec 1, 2020

ax3l commented Jan 8, 2021

ax3l commented Jan 8, 2021

becker33 commented Apr 28, 2021

ax3l commented Jun 12, 2021

becker33 commented Jun 16, 2021

becker33 commented Jun 16, 2021

davidbeckingsale commented Jun 16, 2021

becker33 commented Jun 16, 2021

alalazo commented Jul 29, 2021 • edited

davidbeckingsale commented Aug 2, 2021

davidbeckingsale commented Sep 1, 2021

haampie commented Oct 13, 2021

alalazo commented Oct 14, 2021

alalazo commented Oct 14, 2021

haampie commented Oct 14, 2021 • edited

davidbeckingsale commented Oct 14, 2021

becker33 commented Oct 14, 2021

haampie commented Oct 14, 2021 • edited

alalazo commented Jan 27, 2022

davidbeckingsale commented Nov 4, 2020 •

edited

ax3l commented Nov 30, 2020 •

edited

alalazo commented Jul 29, 2021 •

edited

haampie commented Oct 14, 2021 •

edited

haampie commented Oct 14, 2021 •

edited