-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mvapich2-tce application build fails when slurm is not installed #4455
Comments
Well this works:
and then when flux runs the executable, it sets LD_LIBRARY_PATH appropriately. |
How does slurm handle libpmi2, is it just always installed in /usr/lib64? I'm not sure how we usually do this, but my first thought would be to treat it as an alternative, in the update-alternatives sense, and symlink one or the other into place depending on which is set up. |
Yes.
Not a bad thought! I was thinking we could package a symlink in an RPM that is optionally installed on flux-only clusters. Sysadmins could also maintain the symlink with ansible, which is closer to the alternatives approach. |
I think like mpich, mvapich does not need to link directly with this library. In fact it should have the PMI 1 wire protocol built in so should not even need to dlopen any PMI dso. IOW a mvapich2 config issue. |
As noted in the jira ticket mentioned above (not public), the following config options result in an mvapich that works on a flux only system and on a system with both slurm and flux installed
|
I’m guessing the fortran flag is there because of the Fortran error,
if so, this gets around it without disabling fortran:
`FFLAGS='-fallow-argument-mismatch’`.
…On 10 Aug 2022, at 14:39, Jim Garlick wrote:
As noted in the jira ticket mentioned above (not public), the
following config options result in an mvapich that works on a flux
only system and on a system with both slurm and flux installed
```
module --force purge
./configure \
--enable-shared \
--enable-romio \
--disable-silent-rules \
--disable-new-dtags \
--enable-threads=multiple \
--with-ch3-rank-bits=32 \
--enable-wrapper-rpath=yes \
--disable-alloc \
--enable-fast=all \
--disable-cuda \
--enable-registration-cache \
--with-device=ch3:mrail \
--with-rdma=gen2 \
--disable-mcast \
--with-file-system=lustre+nfs+ufs \
--enable-llnl-site-specific-options \
--enable-debuginfo \
--with-pm=hydra \
--prefix=/g/g0/garlick/opt/mvapich2-2.3.7-1-hydra
# --enable-fortran=all \
# --with-pmi=pmi2 --with-pm=slurm --with-slurm=/usr
```
--
Reply to this email directly or view it on GitHub:
https://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/4455*issuecomment-1211306193__;Iw!!G2kpM7uM-TzIFchu!l7t5cgjPZsQCXQPeE_9mSnzpjrSFvsZoOmdOownyCdLTRvd31aO83sayCwcRa-Gsfg$
You are receiving this because you commented.
Message ID:
***@***.***>
|
I actually just omitted fortran to save time on the build since I wasn't going to test it. So I don't know whether I would have hit that or not. I'll go ahead and try. |
Adding |
On fluke, where only Flux is installed, trying to build a simple mpi hello world program fails with:
Edit: see also https://lc.llnl.gov/jira/browse/TCE-29 (not public)
The text was updated successfully, but these errors were encountered: