Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add +cuda modifier to nvhpc package #29155

Closed
wants to merge 7 commits into from

Conversation

wyphan
Copy link
Contributor

@wyphan wyphan commented Feb 23, 2022

Following up on the discussion during the Spack conference call to enable the CUDA version that is bundled with NVIDIA HPC SDK, here's what I've come up with. Would be happy if someone from NVIDIA can check that it serves the intended purpose, and suggest changes/additions to it.

@scheibelp scheibelp self-assigned this Feb 23, 2022
@samcmill
Copy link
Contributor

How does this get around the limitation that a virtual package cannot exist with the same name as a real package?

@wyphan
Copy link
Contributor Author

wyphan commented Feb 23, 2022

@samcmill Probably I just haven't tested it rigorously enough? I only tried concretizing the spec in #19365 (comment) , but I can test installation shortly with the same spec (magma cuda_arch=70 ^nvhpc). Will report back as soon as it's done.

Edit: here are the test results, looks like the nvhpc spec needs some more massaging to spew out cuBLAS and cuSPARSE from math_libs directory. Also, if +mpi is not explicitly given, Spack defaults to the non-MPI version, as expected. On this system I have cuda@11.6.112 (deb package version 11.6.1-1, managed by apt with automatic updates through the official CUDA repos for Ubuntu) as an external so Spack doesn't try to reinstall cuda when +cuda is not given to the nvhpc spec.
https://gist.github.com/wyphan/225751c0a96a435dde96508517272721

@wyphan
Copy link
Contributor Author

wyphan commented Feb 23, 2022

Just found out that the CUDA installation in NVHPC is organized slightly differently in the nvhpc bundle. For instance, libcublas.so and libcusparse.so both live in math_libs/lib64 instead of in cuda/<version>/lib64, which is not the case in CUDA Toolkit where both of these are in the same lib64 subdirectory. I guess the next commit will expose these libraries living in math_libs.

@wyphan
Copy link
Contributor Author

wyphan commented Mar 1, 2022

After some testing and debugging with @samcmill over MS Teams, we found out that since now nvhpc provides cuda, it will trump the CUDA Toolkit spec in the concretizer, i.e. any package/spec that depends_on('cuda') prefers to use the CUDA version bundled with NVHPC nvhpc+cuda instead of the actual CUDA Toolkit (CTK) cuda spec. That means this PR is blocked until #19365 is resolved, as it removes the option to choose the CUDA implementation between the bundled NVHPC version and the CTK version.

Also, NVHPC now provides a CMake config starting from version 22.1 in ${NVHPC_ROOT}/<platform>/<version>/cmake

@wyphan
Copy link
Contributor Author

wyphan commented Mar 16, 2022

@fspiga I'm in the process of switching the download links to the single-CUDA versions. Do you know where I can access the SHA256 checksums?

@ax3l
Copy link
Member

ax3l commented Mar 17, 2022

Doesn't NVHPC provide at least three different CUDA versions per release?

@wyphan
Copy link
Contributor Author

wyphan commented Mar 17, 2022

@ax3l That is correct. However we currently are in discussion for stripping nvhpc to just the compilers.
Regardless, using the single-CUDA versions instead of the multi-CUDA ones will save both bandwidth and disk space 😂

@fspiga
Copy link
Contributor

fspiga commented Mar 28, 2022

@fspiga I'm in the process of switching the download links to the single-CUDA versions. Do you know where I can access the SHA256 checksums?

Sorry I did not reply early, somehow I do not get email notifications anymore. I do compute myself the SHA256 checksums after verify locally all three packages.

@haampie
Copy link
Member

haampie commented Mar 29, 2022

I think you could drop the +cuda bit, and add a bunch of unconditional provides(...) for cuda:

provides('cuda@10.2.?,11.0.?,11.6.?', when='@22.3')

the patch version number is found in <nvhpc prefix>/*/*/cuda/*/version.{json,txt} it seems (json for 11.x, txt for 10.x). For the 11.x versions you can diff <cuda prefix>/version.json <nvhpc prefix>/*/*/cuda/11.*/version.json. Or maybe just stick to minor versions if that's easier to get this PR in.

@wyphan
Copy link
Contributor Author

wyphan commented Mar 29, 2022

@haampie Interesting, if it is a JSON or text file, might as well define a Python function to extract the needed info. Gimme several days to design and implement this.

@wyphan
Copy link
Contributor Author

wyphan commented Mar 29, 2022

Btw, the file looks like this for nvhpc-slim 22.2 x86_64 and CUDA 11.6:

{
   "cuda" : {
      "name" : "CUDA SDK",
      "version" : "11.6.20220110"
   },
   "cuda_cccl" : {
      "name" : "CUDA C++ Core Compute Libraries",
      "version" : "11.6.55"
   },
   "cuda_cudart" : {
      "name" : "CUDA Runtime (cudart)",
      "version" : "11.6.55"
   },
   "cuda_cuobjdump" : {
      "name" : "cuobjdump",
      "version" : "11.6.55"
   },
   "cuda_cupti" : {
      "name" : "CUPTI",
      "version" : "11.6.55"
   },
   "cuda_cuxxfilt" : {
      "name" : "CUDA cu++ filt",
      "version" : "11.6.55"
   },
   "cuda_demo_suite" : {
      "name" : "CUDA Demo Suite",
      "version" : "11.6.55"
   },
   "cuda_gdb" : {
      "name" : "CUDA GDB",
      "version" : "11.6.55"
   },
   "cuda_memcheck" : {
      "name" : "CUDA Memcheck",
      "version" : "11.6.55"
   },
   "cuda_nsight" : {
      "name" : "Nsight Eclipse Plugins",
      "version" : "11.6.55"
   },
   "cuda_nvcc" : {
      "name" : "CUDA NVCC",
      "version" : "11.6.55"
   },
   "cuda_nvdisasm" : {
      "name" : "CUDA nvdisasm",
      "version" : "11.6.55"
   },
   "cuda_nvml_dev" : {
      "name" : "CUDA NVML Headers",
      "version" : "11.6.55"
   },
   "cuda_nvprof" : {
      "name" : "CUDA nvprof",
      "version" : "11.6.55"
   },
   "cuda_nvprune" : {
      "name" : "CUDA nvprune",
      "version" : "11.6.55"
   },
   "cuda_nvrtc" : {
      "name" : "CUDA NVRTC",
      "version" : "11.6.55"
   },
   "cuda_nvtx" : {
      "name" : "CUDA NVTX",
      "version" : "11.6.55"
   },
   "cuda_nvvp" : {
      "name" : "CUDA NVVP",
      "version" : "11.6.58"
   },
   "cuda_samples" : {
      "name" : "CUDA Samples",
      "version" : "11.6.101"
   },
   "cuda_sanitizer_api" : {
      "name" : "CUDA Compute Sanitizer API",
      "version" : "11.6.55"
   },
   "libcublas" : {
      "name" : "CUDA cuBLAS",
      "version" : "11.8.1.74"
   },
   "libcufft" : {
      "name" : "CUDA cuFFT",
      "version" : "10.7.0.55"
   },
   "libcurand" : {
      "name" : "CUDA cuRAND",
      "version" : "10.2.9.55"
   },
   "libcusolver" : {
      "name" : "CUDA cuSOLVER",
      "version" : "11.3.2.55"
   },
   "libcusparse" : {
      "name" : "CUDA cuSPARSE",
      "version" : "11.7.1.55"
   },
   "libnpp" : {
      "name" : "CUDA NPP",
      "version" : "11.6.0.55"
   },
   "libnvjpeg" : {
      "name" : "CUDA nvJPEG",
      "version" : "11.6.0.55"
   },
   "nsight_compute" : {
      "name" : "Nsight Compute",
      "version" : "2022.1.0.12"
   },
   "nsight_systems" : {
      "name" : "Nsight Systems",
      "version" : "2021.5.2.53"
   },
   "nvidia_driver" : {
      "name" : "NVIDIA Linux Driver",
      "version" : "510.39.01"
   }
}

I wonder which field is the "canonical" CUDA version that Spack detects.

@haampie haampie mentioned this pull request Mar 29, 2022
@pauleonix
Copy link
Contributor

pauleonix commented Mar 30, 2022

I fear none of them. Just from the names I would have thought that the first one (CUDA SDK) would be right, but it just uses a date instead of the update number (11.6.2 is the latest CUDA Toolkit release and I don't think the 2 has anything to do with a date).
I get:

cuda-11.6.0: 11.6.20220110
cuda-11.6.1: 11.6.20220214
cuda-11.6.2: 11.6.20220318
nvhpc-22.2-cuda-11.6: 11.6.20220110
nvhpc-22.3-cuda-11.6: 11.6.20220214

So according to this one can map 22.2 -> 11.6.0 and 22.3 -> 11.6.1, but I don't see an easy way of determining the mapping automatically.

@haampie
Copy link
Member

haampie commented Mar 30, 2022

I just downloaded all CUDA 10 and 11 releases, and all nvhpc releases, and then checksum'd all version.{json,txt} files.

It's already in the #29782 PR which combines the multiple open PRs to make cuda a virtual package.

$ find . \( -iname version.txt -o -iname version.json \) -exec md5sum {} \; | sort 
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./cudas/cuda_11.0.3_450.51.06_linux/builds/cuda_documentation/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/20.11/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/20.9/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/21.11/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/21.1/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/21.2/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/21.3/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/21.5/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/21.7/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/21.9/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/22.1/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/22.2/cuda/11.0/version.txt
0e2b0d9a6e7b6e3cd4d4ffd2c47a4b0a  ./nvhpcs/<snip>/22.3/cuda/11.0/version.txt

1a03101ba4a5a8d60bcfa43e6e14e157  ./cudas/cuda_11.6.1_510.47.03_linux/builds/version.json
1a03101ba4a5a8d60bcfa43e6e14e157  ./nvhpcs/<snip>/22.3/cuda/11.6/version.json

1ac718f7a379638c28817ad7c231d8a5  ./cudas/cuda_11.4.1_470.57.02_linux/builds/version.json
1ac718f7a379638c28817ad7c231d8a5  ./nvhpcs/<snip>/21.9/cuda/11.4/version.json

2e9ccb968eeeb836356cdb00bd82d4d5  ./cudas/cuda_11.6.0_510.39.01_linux/builds/version.json
2e9ccb968eeeb836356cdb00bd82d4d5  ./nvhpcs/<snip>/22.2/cuda/11.6/version.json

3dbf018c4a063d1bee73ffc94e2315da  ./cudas/cuda_11.3.0_465.19.01_linux/builds/version.json
3dbf018c4a063d1bee73ffc94e2315da  ./nvhpcs/<snip>/21.5/cuda/11.3/version.json

47aa6a390567f3135d9f52145b007c1b  ./cudas/cuda_10.1.243_418.87.00_linux/builds/cuda-toolkit/version.txt
47aa6a390567f3135d9f52145b007c1b  ./nvhpcs/<snip>/20.7/cuda/10.1/version.txt
47aa6a390567f3135d9f52145b007c1b  ./nvhpcs/<snip>/20.9/cuda/10.1/version.txt

6a1ae7bf0a1d97717d7cbb3a4c07ad13  ./cudas/cuda_11.5.1_495.29.05_linux/builds/version.json
6a1ae7bf0a1d97717d7cbb3a4c07ad13  ./nvhpcs/<snip>/21.11/cuda/11.5/version.json
6a1ae7bf0a1d97717d7cbb3a4c07ad13  ./nvhpcs/<snip>/22.1/cuda/11.5/version.json

9234ba9af224d314084e00cd0bbed1a1  ./cudas/cuda_10.2.89_440.33.01_linux/builds/cuda-toolkit/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/20.11/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/20.7/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/20.9/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/21.11/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/21.1/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/21.2/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/21.3/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/21.5/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/21.7/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/21.9/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/22.1/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/22.2/cuda/10.2/version.txt
9234ba9af224d314084e00cd0bbed1a1  ./nvhpcs/<snip>/22.3/cuda/10.2/version.txt

ab9e99b72fe71006e3dae5e0b0f9d94d  ./cudas/cuda_11.2.1_460.32.03_linux/builds/version.json
ab9e99b72fe71006e3dae5e0b0f9d94d  ./nvhpcs/<snip>/22.2/cuda/11.2/version.json

d1cd3d4b88f7841c060c7a225e45ef75  ./cudas/cuda_11.0.2_450.51.05_linux/builds/cuda_documentation/version.txt
d1cd3d4b88f7841c060c7a225e45ef75  ./nvhpcs/<snip>/20.7/cuda/11.0/version.txt

db18b4b34d57ec4f910dc66f0d8bd704  ./cudas/cuda_11.4.0_470.42.01_linux/builds/version.json
db18b4b34d57ec4f910dc66f0d8bd704  ./nvhpcs/<snip>/21.7/cuda/11.4/version.json

@haampie
Copy link
Member

haampie commented Mar 31, 2022

Closed in favor of #29782

@haampie haampie closed this Mar 31, 2022
@wyphan wyphan deleted the wyphan/nvhpc-cuda branch March 31, 2022 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants