-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add variant to allow unsupported compiler & CUDA combinations #19736
Add variant to allow unsupported compiler & CUDA combinations #19736
Conversation
CC @white238 |
Yes, I think we should remove the compiler check. It's kind of strange to say that we allow unsupported compiler but we still check them. If someone wants to use this option, it should be their responsibility to make sure it works. |
Got it, I will make that change. |
I've added the variant everywhere. The only question I have is on the name of it - happy to take suggestions for something more descriptive (and potentially shorter) 😄 |
Just my two cents, but I think this PR needs to add patching of the cuda headers in the
will hit an depends_on('cuda+unsafe', when='+allow-unsupported-compilers') in Regarding the naming, my proposal would be to use variant('unsafe', default=False,
 description='Allow combinations of host compiler and CUDA that are not officially supported') but feel free to discard it if it sounds too negative and patching vendor headers is a common practice. |
How about |
Here is another idea for naming the variant
@alalazo is correct, you would probably hit a compile-time error from the CUDA header files. It would be desirable to get the patched header file, but I don't think its critical -- since, we have already said that it is not officially supported. |
Let's not forget to document the new capability in the docs alongside the improved wording from #20742 once this PR is ready. |
|
We should keep the current name -- it's the name of the matching option to |
cce80fd
to
1c92d83
Compare
PR looks ok to me - it will just be confusing for packages that support HIP, SYCL, CUDA, etc. at the same time and now see an |
I think this is ready to go once the style failure is resolved. @davidbeckingsale can you fix the style check? |
Looks like the issue is just a bunch of overly long lines. |
@davidbeckingsale ok. I think most likely any answer starts with "merge this and then ...", based on conversation with @tgamblin today. |
…sted-cuda-versions
…sted-cuda-versions
Not sure what is the state of this PR, but in case there's intention to push it forward we have a with when('~allow-unsupported-compilers'):
conflicts('%gcc@5:', when='+cuda ^cuda@:7.5 target=x86_64:')
... |
It's almost done, but I went on vacation :D I'll try the context manager again - I did give it a shot in commit 82cd07a but I think I might have not expressed the condition quite right as I couldn't get it working. |
Hey all, afaik this is now ready. Please let me know if there are any final changes you would like to see. |
Unfortunately we should rethink this approach, because the unintended side effect that clingo outsmarts us by simply toggling this variant to avoid the conflicts, which the user will only notice after a multi-GB cuda download & installation has finished :( Case in point:
gives
|
What would people think about: packages:
all: # this could be "sirius" or anything if we want to be more specific
conflicts:
- spec: '^cuda+allowed-unsupported-compilers'
msg: 'Unsupported compilers for CUDA are prohibited' to add extra user defined conflicts to specs? I am currently trying to think of a way that is:
since we want to tag v0.17 soon. Doing something like this I think permits to reuse all the code we already have for conflicts declared in packages. The idea is that conflicts are a way to prevent |
I guess the question is: would the extension be worthwhile beside this single use case? |
This is important. IIUC the reason However, if that's the use case, then the risk is that this opens the floodgates of adding The other reason why
I think it's kinda similar to what has been proposed for environments before too, where environments are kinda interpreted as declarative package definitions (in particular so when concrezation: together), where the specs-section is like declarative depends_on statements, and it wouldn't hurt to be able to specify conflicts there too, as well as when clauses in the list of specs. |
I like general approach of constraining the clingo search space. Since disabling host compiler checks for CUDA is an "expert user" type thing, maybe the default Spack configuration should include the snippet @alalazo posted. I think @haampie has done a good job explaining the use cases, but to add a more concrete example: We have CUDA+compiler combinations on machines at LLNL that absolutely work, but are caught by the conflicts. Our CUDA installs are patched to remove the compiler version checks and allow this to work. The way I view the issue is that if there is no way to remove constraints, then Spack packages/build_systems (like CUDA) cannot aggressively add constraints that they have no way of verifying. I think the way this is modeled is confusing because the variant ends up on the cuda build system, but the conflict checking has to go in each CUDA package - even though these conflicts have nothing to do with the specific packages. Having conflicts on a per-package basis I am not sure how it works because if one package decides to disable the conflicts you have to impose that same disabling on every downstream CUDA package. |
In cuda 11+, there is also a compiler flag to nvcc to disable the same checks. It's not just relevant for the old hacked llnl installs. So if you use that flag, you don't have the same conflicts (although we might eventually discover conflicts where NVIDIA is actually correct that it doesn't work) |
Just to list yet another solution (i know it's a hack but just to have mentioned it): virtual cuda, 2 providers of which 1 strict and 1 not strict, cudapakage's conflicts are on the strict cuda package, and default/packages.yaml only lists cuda-strict as a provider. Then you can opt out of all constraints by overriding the provider list with [cuda-nonstrict]. (For the time being, should we merge #26721? I've had bad experiences building an environment from scratch only to realize that the root spec did not build because the cuda conflict was not satisified on cray for gcc 10 as the conflict only applied to linux back then -- it wasn't just a rebuild of one package, it had to recompile the entire environment with gcc 9.) |
We discussed this issue again yesterday at the weekly telco https://github.com/spack/spack/wiki/Telcon%3A-2022-01-26 @bvanessen has a need for this feature on CUDA to be able to run on machines where the tested compiler + CUDA version is not among the officially supported ones. I'll submit a PR that will add a variant('allow-unsupported-compilers', default=False, sticky=True,
description='Allow unsupported host compiler and CUDA version combinations') When |
I would be totally fine changing the variant name, and how it gets applied (e.g. should it remove all compiler version checking) - it's currently just doing ones I have tried that work with a patched CUDA header like we have at LLNL.