-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpich not configuring with clang compilers on Perlmutter #6954
Comments
Can you share the |
Here it is. |
|
Oh sorry. The clang install was deleted. I should have checked the log before raising the issue. Thanks for the help. |
Sorry I closed the issue a bit early. I had a different issue earlier too where the configure did not accept checking cuda_runtime_api.h usability... yes
checking cuda_runtime_api.h presence... yes
checking for cuda_runtime_api.h... yes
checking for cudaStreamSynchronize in -lcudart... yes
configure: WARNING: Using user-provided nvcc: 'clang++'
checking whether nvcc works... no
configure: error: CUDA was requested but it is not functional
configure: error: YAKSA configure failed FYI - The configure command worked with llvm/16 but not with llvm/17 and higher. |
Could you also send the |
I did some digging into the configure:17946: WARNING: Using user-provided nvcc: 'clang++'
configure:17961: checking whether nvcc works
configure:17974: clang++ -c conftest.cu >&5
clang++: warning: CUDA version is newer than the latest partially supported version 12.1 [-Wunknown-cuda-version]
clang++: error: GPU arch sm_35 is supported by CUDA versions between 7.0 and 11.8 (inclusive), but installation at /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 is ; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check' configure:17974: $? = 1
configure: failed program was: I tried the other options of passing the cuda information but those flags were not accepted. Attaching the respective config.log |
@rgayatri23 That looks like the main issue. Let us know if you can resolve it on your own. |
Yaksa tries to detect the NVIDIA GPU at configure time in order to select the right code generation flags. Is there a GPU in your build node? What type is it? |
@raffenet - Yes there is a GPU in my build node. It is NVIDIA A100 (so sm_80) with cudatoolkit/12.2 |
Try adding |
@raffenet - That did not resolve the issue. It still fails with the same error. I also tried passing the option |
OK I just realized that NVCC="clang++ --cuda-gpu-arch=sm_80" |
It looks like all these are unacceptable options configure: WARNING: unrecognized options: --with-craypmi, --with-cuda-sm configure: error: unrecognized option: `--cuda-gpu-arch=sm_80' |
These flags were ignored when the user specified a compiler other than the nvcc included in the CUDA installation. Make sure to include them for consistency. See pmodels/mpich#6954.
These flags were ignored when the user specified a compiler other than the nvcc included in the CUDA installation. Make sure to include them for consistency. See pmodels/mpich#6954. Signed-off-by: Ken Raffenetti <raffenet@mcs.anl.gov>
What is your configure line and what version of clang are you using? The config.log you shared suggested |
Thanks for the info on pmi options. I am using clang/18.0.1 with mpich/4.2. Here is my configure line: ./configure --prefix= --enable-fast=O2 --with-pm=no --with-pmi=pmi2 --with-pmi2=<path-to-cray-pmi> --with-xpmem=<path-to-xpmem> --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=<path-to-libfabric> --with-libfabric-include=<path-to-libfabric-include> --with-libfabric-lib=<path-to-libfabric-lib>--with-device=ch4:ofi --with-ch4-shmmods=posix,xpmem --enable-thread-cs=per-vci --with-cuda=<path-to-cuda> CPPFLAGS=-I<path-to-pmi-include> CC=clang CFLAGS= NVCC=clang++ --with-cuda-sm=80 NVCC_FLAGS=-allow-unsupported-compiler CXX=clang++ FC= FCFLAGS= F77= FFLAGS= 'LIBS=-lpmi -lpmi2 -Wl,--as-needed,-lcudart,--no-as-needed -lcuda' 'LDFLAGS=-L<path-to-pmi-lib> -L<path-to-libfabric-lib64> -L<path-to-cuda-lib64> MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC MPICHLIB_FFLAGS=-fPIC MPICHLIB_FCFLAGS=-fPIC
configure: WARNING: unrecognized options: --with-cuda-sm I am attaching the config.log for the latest without using the |
You need to add quotes around the full NVCC="clang++ --cuda-gpu-arch=sm_80" |
Here is the raw configure line. It is passing ./configure --prefix= --enable-fast=O2 --with-pm=no --with-pmi=pmi2 --with-pmi2=/opt/cray/pe/pmi/default --with-xpmem=/opt/cray/xpmem/default --with-wrapper-dl-type=rpath --enable-threads=multiple --enable-shared=yes --enable-static=no --with-namepublisher=file --with-libfabric=/opt/cray/libfabric/1.15.2.0 --with-libfabric-include=/opt/cray/libfabric/1.15.2.0/include --with-libfabric-lib=/opt/cray/libfabric/1.15.2.0/lib64 --with-device=ch4:ofi --with-ch4-shmmods=posix,xpmem --enable-thread-cs=per-vci --with-cuda=/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2 CPPFLAGS=-I/opt/cray/pe/pmi/default/include CC=clang CFLAGS= 'NVCC="clang++' '--cuda-gpu-arch=sm_80"' NVCC_FLAGS=-allow-unsupported-compiler CXX=clang++ FC= FCFLAGS= F77= FFLAGS= 'LIBS=-lpmi -lpmi2 -Wl,--as-needed,-lcudart,--no-as-needed -lcuda' 'LDFLAGS=-L/opt/cray/pe/pmi/default/lib -L/opt/cray/libfabric/1.15.2.0/lib64 -L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/cuda/12.2/lib64' MPICHLIB_CFLAGS=-fPIC MPICHLIB_CXXFLAGS=-fPIC MPICHLIB_FFLAGS=-fPIC MPICHLIB_FCFLAGS=-fPIC
configure: error: unrecognized option: `--cuda-gpu-arch=sm_80"'
Try `./configure --help' for more information |
What are these single quotes in the configure line? Can you remove them? 'NVCC="clang++' '--cuda-gpu-arch=sm_80"' |
Ok thanks. It's weird, I was doing the configuration through a bash script and it did not do what I thought it did. Thanks for pointing it out. In file included from /usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/cmath:47:
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:103:7: error: __float128 is not supported on this target
103 | abs(__float128 __x)
| ^
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:102:3: error: __float128 is not supported on this target
102 | __float128
| ^
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../include/c++/12/bits/std_abs.h:103:18: note: '__x' defined here
103 | abs(__float128 __x) |
This is what I saw on my system as well. This issue suggested adding |
Even though this gets past |
|
No timeline to supporting this configuration at the moment.
On the Polaris system here at ANL, I am able to build using |
Ok. But thanks, I see a few feature requests and patches based on this issue :-)
I was actually thinking in terms of |
The reason it doesn't work has to do with the fact that we override the host compiler for |
You should be able to workaround the host compiler issue by doing this in configure.
The configure script will see that you've supplied an NVCC and skip the host compiler override. I have tested it successfully on Polaris. |
Thanks this worked. I think this is what I will do for now. |
You can close this issue for now. But it would be great if I could be notified whenever its possible to use |
@rgayatri23 see pmodels/yaksa#251 for tracking supporting |
I am trying to install mpich/4.2.0 with clang/18.0.1 on Perlmutter and get the error that clang as the C compiler does not work:
Here is my configure line
I also tested this with
nvcc
as the CUDA compiler and still get the same error.Am I missing any particular flag to allow this configuration?
The text was updated successfully, but these errors were encountered: