New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tighten up PMI implementation for OpenMPI #926
Conversation
Current coverage is 76.01% (diff: 50.00%)@@ master #926 diff @@
==========================================
Files 149 149
Lines 26006 26010 +4
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 19764 19772 +8
+ Misses 6242 6238 -4
Partials 0 0
|
@garlick, this all seems pretty straightforward. I think you just need to rebase on current master if everything else is good. |
Problem: OpenMPI flux component is calling PMI clique functions, but when acting as a simple v1 PMI client, they return PMI_ERR_INIT. Change the code to fall through to emulation via parsing PMI_process_mapping from the KVS.
Problem: a LOG_ERR message is generated when PMI_KVS_Get() fails with ENOENT. OpenMPI seems to like to get keys that don't exist. Suppress the log message if the error is ENOENT.
Problem: when Flux launches Flux, FLUX_JOB_ID and other environment variables are set in children of the second Flux, which makes it hard to tell whether they are just children of the second Flux or jobs spawned by the second Flux. After PMI bootstrap, the broker unsets FLUX_JOB_ID, FLUX_JOB_SIZE, and FLUX_JOB_NNODES.
Just rebased on current master. The only thing I'm uncertain about is setting TMPDIR to |
It does kind of seem strange to do this to all user's jobs by default, but I can't think of an actual problem with it. It would be nice if we could suggest to openmpi that they add node-rank to the shmem filename so that it works without outside intervention. |
Let's go with this for now. I've probably given Ralph enough headaches for one week. |
Does I do notice from the faq:
Can we try something like an env variable |
Excellent point! Yes it does get removed. If something like the above suggestion works that is much better! Let me poke at that. |
The new openmpi Flux component uses FLUX_PMI_LIBRARY_PATH at runtime to dlopen our PMI library, so LD_LIBRARY_PATH does not need to be set for it to work right in the default configuration. Therefore, don't change it in the openmpi lua script. Do set OMPI_MCA_orte_tmpdir_base to the rank-specific directory so that shared memory segment names do not collide when there are multiple Flux ranks on a (real) node launching an MPI program with multiple MPI ranks on a (flux) node.
OK just forced a push with that change. Seems to work! |
Cool! Merging |
This PR fixes a couple more issues that came up while building openmpi support for flux.
Some general cleanup, environment cleanup as discussed in #923, implementing the clique functions, and suppressing a log for failed
PMI_KVS_Get()
since that's not necessarily a flux error.All pretty minor.