Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add variant to allow unsupported compiler & CUDA combinations #19736

Merged

Conversation

davidbeckingsale
Copy link
Contributor

@davidbeckingsale davidbeckingsale commented Nov 4, 2020

I would be totally fine changing the variant name, and how it gets applied (e.g. should it remove all compiler version checking) - it's currently just doing ones I have tried that work with a patched CUDA header like we have at LLNL.

@davidbeckingsale
Copy link
Contributor Author

CC @white238

@adamjstewart
Copy link
Member

@ax3l @Rombur

@Rombur
Copy link
Contributor

Rombur commented Nov 5, 2020

how it gets applied (e.g. should it remove all compiler version checking) - it's currently just doing ones I have tried that work with a patched CUDA header like we have at LLNL

Yes, I think we should remove the compiler check. It's kind of strange to say that we allow unsupported compiler but we still check them. If someone wants to use this option, it should be their responsibility to make sure it works.

@davidbeckingsale
Copy link
Contributor Author

Got it, I will make that change.

@davidbeckingsale
Copy link
Contributor Author

I've added the variant everywhere. The only question I have is on the name of it - happy to take suggestions for something more descriptive (and potentially shorter) 😄

@alalazo
Copy link
Member

alalazo commented Nov 19, 2020

Just my two cents, but I think this PR needs to add patching of the cuda headers in the cuda package otherwise, if I recall correctly:

foo+cuda+allow-unsupported-compilers ^cuda

will hit an #error directive at build-time. What we could do is have a variant also in cuda and something like:

depends_on('cuda+unsafe', when='+allow-unsupported-compilers')

in CudaPackage. Does that make sense to you?

Regarding the naming, my proposal would be to use unsafe also in CudaPackage:

 variant('unsafe', default=False,
        description='Allow combinations of host compiler and CUDA that are not officially supported')

but feel free to discard it if it sounds too negative and patching vendor headers is a common practice.

@ax3l
Copy link
Member

ax3l commented Nov 30, 2020

shorter

How about check-host-cxx or supported-host-cxx? ;-) Also omits the negation.

@ax3l ax3l requested review from ax3l and naromero77 November 30, 2020 22:43
@naromero77
Copy link
Contributor

Here is another idea for naming the variant

variant('patch', default='False', 
            description='Allow combinations of host compiler and CUDA that are not officially supported using a patched header file')

unsafe is OK, but a bit ambigous IMO. I would also be OK with naming the variant unsupported.

@alalazo is correct, you would probably hit a compile-time error from the CUDA header files.

It would be desirable to get the patched header file, but I don't think its critical -- since, we have already said that it is not officially supported.

@ax3l
Copy link
Member

ax3l commented Jan 8, 2021

Let's not forget to document the new capability in the docs alongside the improved wording from #20742 once this PR is ready.

@ax3l
Copy link
Member

ax3l commented Jan 8, 2021

path: naming could be too unspecific, this will land as a variant in all downstream packages that pull this mixin class :)

@becker33
Copy link
Member

We should keep the current name -- it's the name of the matching option to nvcc.

@davidbeckingsale davidbeckingsale force-pushed the feature/allow-untested-cuda-versions branch from cce80fd to 1c92d83 Compare June 10, 2021 23:24
@ax3l
Copy link
Member

ax3l commented Jun 12, 2021

PR looks ok to me - it will just be confusing for packages that support HIP, SYCL, CUDA, etc. at the same time and now see an allow-unsupported-compilers option that only applies to CUDA.

@becker33
Copy link
Member

I think this is ready to go once the style failure is resolved. @davidbeckingsale can you fix the style check?

@becker33
Copy link
Member

Looks like the issue is just a bunch of overly long lines.

@davidbeckingsale
Copy link
Contributor Author

@becker33 I'll wait until after our meeting tomorrow. I think @tgamblin had some other potential solutions to this issue that would avoid the variant.

@becker33
Copy link
Member

@davidbeckingsale ok. I think most likely any answer starts with "merge this and then ...", based on conversation with @tgamblin today.

ax3l
ax3l previously approved these changes Jun 28, 2021
@alalazo
Copy link
Member

alalazo commented Jul 29, 2021

Not sure what is the state of this PR, but in case there's intention to push it forward we have a when context manager that might be of use e.g.:

with when('~allow-unsupported-compilers'):
    conflicts('%gcc@5:', when='+cuda ^cuda@:7.5 target=x86_64:')
    ...

@davidbeckingsale
Copy link
Contributor Author

Not sure what is the state of this PR, but in case there's intention to push it forward we have a when context manager that might be of use e.g.:

with when('~allow-unsupported-compilers'):
    conflicts('%gcc@5:', when='+cuda ^cuda@:7.5 target=x86_64:')
    ...

It's almost done, but I went on vacation :D I'll try the context manager again - I did give it a shot in commit 82cd07a but I think I might have not expressed the condition quite right as I couldn't get it working.

@davidbeckingsale
Copy link
Contributor Author

Hey all, afaik this is now ready. Please let me know if there are any final changes you would like to see.

@tgamblin tgamblin merged commit aabece4 into spack:develop Sep 1, 2021
@haampie
Copy link
Member

haampie commented Oct 13, 2021

Unfortunately we should rethink this approach, because the unintended side effect that clingo outsmarts us by simply toggling this variant to avoid the conflicts, which the user will only notice after a multi-GB cuda download & installation has finished :(

Case in point:

spack spec sirius %gcc@9: +cuda ^cuda@:9

gives

    ^cuda@9.2.88%gcc@10.3.0+allow-unsupported-compilers~dev arch=linux-ubuntu20.04-zen2

@alalazo
Copy link
Member

alalazo commented Oct 14, 2021

What would people think about:

packages:
  all:  # this could be "sirius" or anything if we want to be more specific
    conflicts:
    - spec: '^cuda+allowed-unsupported-compilers'
      msg: 'Unsupported compilers for CUDA are prohibited'

to add extra user defined conflicts to specs? I am currently trying to think of a way that is:

  1. Backward compatible
  2. Minimal in code changes

since we want to tag v0.17 soon. Doing something like this I think permits to reuse all the code we already have for conflicts declared in packages. The idea is that conflicts are a way to prevent clingo to look in certain places of the search space, which is what we want here. The drawback would probably be the backward logic: if I want to use only vetted compilers I have to not allow unsupported (double negation), but that is partially due to the modelling in cuda.

@alalazo
Copy link
Member

alalazo commented Oct 14, 2021

I guess the question is: would the extension be worthwhile beside this single use case?

@haampie
Copy link
Member

haampie commented Oct 14, 2021

I guess the question is: would the extension be worthwhile beside this single use case?

This is important. IIUC the reason allow-unsupported-compilers was added to cuda/package.py was because compilers were used that don't really identify as compilers known to Spack (clang on an untagged commit), and then with this variant, there was single point of reference to enable/disable constraints put on all dependents of cuda, which is convenient.

However, if that's the use case, then the risk is that this opens the floodgates of adding allow-unsupported-compilers to many packages that also define constraints on the compiler. Sure, that won't happen, and maybe you just need it on cuda, but still, the toggle on the cuda package may not be the right solution for this problem? I've suggested a hacky workaround to register your untagged clang 10-and-a-half compiler both as clang@10 and clang@11 (so, 2 compiler specs, same underlying compiler paths), so that you don't have to commit to a particular version, but use say %clang@10:11, and that way you don't end up in concretization errors on clang@:10 and clang@11: constraints, as clingo may change compilers on a per package basis to satisfy constraints.

The other reason why allow-unsupported-compilers may have been introduced is: an advanced user knows what they are doing, they compile cuda code, they mix it with an incompatible host compiler, but just happen to know that how they do it is justified, therefore that package does not need the compiler constraint. However, instead of depending on ^cuda +allow-supported-compilers, this package can decide just not to inherit from CudaPackage because it is too stringent. Or we can modify CudaPackage to not add compiler constraints unconditionally. Or we could make CudaPackage inherit from CudaPackageWithoutTheConstraints to offer the user a way to get some of the cuda-template-goodness without the strictness. There is no need to put a variant on cuda/package.py in any of these solutions, because the constraints are defined by the dependent.

What would people think about

I think it's kinda similar to what has been proposed for environments before too, where environments are kinda interpreted as declarative package definitions (in particular so when concrezation: together), where the specs-section is like declarative depends_on statements, and it wouldn't hurt to be able to specify conflicts there too, as well as when clauses in the list of specs.

@davidbeckingsale
Copy link
Contributor Author

I like general approach of constraining the clingo search space. Since disabling host compiler checks for CUDA is an "expert user" type thing, maybe the default Spack configuration should include the snippet @alalazo posted.

I think @haampie has done a good job explaining the use cases, but to add a more concrete example: We have CUDA+compiler combinations on machines at LLNL that absolutely work, but are caught by the conflicts. Our CUDA installs are patched to remove the compiler version checks and allow this to work. The way I view the issue is that if there is no way to remove constraints, then Spack packages/build_systems (like CUDA) cannot aggressively add constraints that they have no way of verifying.

I think the way this is modeled is confusing because the variant ends up on the cuda build system, but the conflict checking has to go in each CUDA package - even though these conflicts have nothing to do with the specific packages. Having conflicts on a per-package basis I am not sure how it works because if one package decides to disable the conflicts you have to impose that same disabling on every downstream CUDA package.

@becker33
Copy link
Member

In cuda 11+, there is also a compiler flag to nvcc to disable the same checks. It's not just relevant for the old hacked llnl installs. So if you use that flag, you don't have the same conflicts (although we might eventually discover conflicts where NVIDIA is actually correct that it doesn't work)

@haampie
Copy link
Member

haampie commented Oct 14, 2021

Just to list yet another solution (i know it's a hack but just to have mentioned it): virtual cuda, 2 providers of which 1 strict and 1 not strict, cudapakage's conflicts are on the strict cuda package, and default/packages.yaml only lists cuda-strict as a provider. Then you can opt out of all constraints by overriding the provider list with [cuda-nonstrict].

(For the time being, should we merge #26721? I've had bad experiences building an environment from scratch only to realize that the root spec did not build because the cuda conflict was not satisified on cray for gcc 10 as the conflict only applied to linux back then -- it wasn't just a rebuild of one package, it had to recompile the entire environment with gcc 9.)

@alalazo
Copy link
Member

alalazo commented Jan 27, 2022

We discussed this issue again yesterday at the weekly telco https://github.com/spack/spack/wiki/Telcon%3A-2022-01-26 @bvanessen has a need for this feature on CUDA to be able to run on machines where the tested compiler + CUDA version is not among the officially supported ones.

I'll submit a PR that will add a sticky argument to variants:

variant('allow-unsupported-compilers', default=False, sticky=True,
        description='Allow unsupported host compiler and CUDA version combinations')

When sticky=True any variant that is not explicitly set in a literal spec takes its default value (without the possibility for clingo to change it because of conflicts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants