Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve PMI tracing and support PMI-2 subset for OpenMPI #747
The main point of this PR is to support OpenMPI when configured for SLURM's pmi2 MPI personality. When configured this way, OpenMPI plugins are linked against SLURM's libpmi2.so.
Although SLURM's libpmi2.so is a wire protocol client, it does not honor the protocol version negotiation as noted in #746. To get OpenMPI working, Flux can provide a stripped down libpmi2.so, internally implemented on top of PMI-1, then set LD_LIBRARY_PATH to include this library.
This seems to circumvent the problem in tests with a hello world MPI program.
I just fixed the problem with the test failures, which actually were exposing a problem in the way the PMI client sized its buffers used for wire protocol messages. In addition, I added a callback for server tracing so instead of just calling fprintf() directly, the place where the PMI server is embedded (flux-start, wrexecd) gets a callback and can print or log or whatever from there. I was thinking this might be a place where we could do the sort of "workload capture" discussed in #599, but for now it improves the trace output a little - instead of displaying a pointer as a client identifier, it can decode and display the rank.
I'm not sure why coveralls thinks test coverage has decreased. Hmm.