Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PMI tracing and support PMI-2 subset for OpenMPI #747

Merged
merged 6 commits into from Aug 10, 2016

Conversation

Projects
None yet
3 participants
@garlick
Copy link
Member

garlick commented Aug 10, 2016

The main point of this PR is to support OpenMPI when configured for SLURM's pmi2 MPI personality. When configured this way, OpenMPI plugins are linked against SLURM's libpmi2.so.

Although SLURM's libpmi2.so is a wire protocol client, it does not honor the protocol version negotiation as noted in #746. To get OpenMPI working, Flux can provide a stripped down libpmi2.so, internally implemented on top of PMI-1, then set LD_LIBRARY_PATH to include this library.

This seems to circumvent the problem in tests with a hello world MPI program.

@garlick garlick added the review label Aug 10, 2016

@garlick garlick force-pushed the garlick:pmi_opal_tweaks branch from 7257647 to 4e28bdc Aug 10, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 10, 2016

Coverage Status

Coverage decreased (-0.3%) to 74.96% when pulling 4e28bdc on garlick:pmi_opal_tweaks into 2556295 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 10, 2016

Coverage Status

Coverage decreased (-0.4%) to 74.87% when pulling 4348439 on garlick:pmi_opal_tweaks into 2556295 on flux-framework:master.

@garlick garlick force-pushed the garlick:pmi_opal_tweaks branch from 4348439 to 733822d Aug 10, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 10, 2016

Coverage Status

Coverage decreased (-0.3%) to 74.923% when pulling 733822d on garlick:pmi_opal_tweaks into 2556295 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 10, 2016

I just fixed the problem with the test failures, which actually were exposing a problem in the way the PMI client sized its buffers used for wire protocol messages. In addition, I added a callback for server tracing so instead of just calling fprintf() directly, the place where the PMI server is embedded (flux-start, wrexecd) gets a callback and can print or log or whatever from there. I was thinking this might be a place where we could do the sort of "workload capture" discussed in #599, but for now it improves the trace output a little - instead of displaying a pointer as a client identifier, it can decode and display the rank.

I'm not sure why coveralls thinks test coverage has decreased. Hmm.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 10, 2016

Yeah, too bad coveralls doesn't support a "coverage diff", but codecov.io says it supports this! In fact it supposedly can put coverage diffs directly into PRs. Hmm... Might have to play with that one.

On topic -- this PR looks good to me! Ready for merge?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 10, 2016

OK.

garlick added some commits Aug 8, 2016

libpmi: cache rank early if debugging
Problem: debugging output is prefixed with "<rank>:", however the rank
is reported as -1 for all calls before the user calls PMI_Get_Rank().

Add code to call PMI_Get_Rank() inside PMI_Init() if debugging is
enabled so we can get better debug traces.
libpmi: add trace flag
Instead of testing an environment variable to enable tracing,
add a 'flags' argument to pmi_simple_server_create(), and a
PMI_SIMPLE_SERVER_TRACE flag.

Then add options to flux-start and flux-wreckrun:
   flux start --trace-pmi-server ...
   flux wreckrun --options=trace-pmi-server
libpmi: provide a subset of PMI-2 API externally
Problem: when OpenMPI is configured to use SLURM's pmi2 MPI
personality, it links against SLURM's libpmi2.so, which connects
to SLURM's PMI-2 server using the PMI-2 wire protocol.
Unfortunately, SLURM's implementation ignores the protocol version
handshake and does not negotiate down to version 1.1 to work
with Flux like MPICH does natively.

Solution: Emulate a subset of the PMI-2 API and provide it in
a Flux libpmi2.so.  Coerce OpenMPI programs to use it by setting
LD_LIBRARY_PATH.  This libpmi2.so is implemented in terms of the
PMI-1 API, which in turn invokes the Flux PMI-1.1 wire protocol.

Fixes #746
wreck/lua.d: add openmpi.lua
Add an "MPI personality" for OpenMPI that sets LD_LIBRARY_PATH
to find Flux libpmi2.so.
libpmi: conform client bufsize to server
Problem: simple_client and simple_server separately hardwired
the buffer size for wire protocol messages.  If the server's value
is larger than the client's, the client cannot handle key, value,
and name strings that are within limits obtained from the server
but too large to be marshalled by the client.

Drop the pmi_simple_server_get_maxrequest() server API accessor
and define SIMPLE_MAX_PROTO_LINE in simple_server.h.  This is an
internal header - no reason to hide the fact that this value is
static.  It allows a per-client malloc to be dropped in the
flux-start(1) PMI server code and the pmi_simple.t unit test.

Instead of a static buffer, the client goes through init using
SIMPLE_MAX_PROTO_LINE, then calculates a new buffer size based
on the "maxes" returned in the handshake and afterwards uses that.
libpmi: debug_trace callback to server
In the simple_server, tracing was accomplished by directly
calling fprintf (stderr, ...) to dump request/response messages.
Replace this with a callback, and implement callbacks in
flux-start(1) and wrexecd server instances.  Besides allowing
for more flexible disposition of trace data in the future,
this allows the rank for a message to be determined and added
to he output (the rank is not known inside the server implementation).

@garlick garlick force-pushed the garlick:pmi_opal_tweaks branch from 733822d to 2d1fd8e Aug 10, 2016

@garlick garlick referenced this pull request Aug 10, 2016

Closed

0.4.0 release notes #739

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 10, 2016

I rebased on the updated master and fixed the flux-start option conflict noted in #748. I think this is ready once travis passes.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 10, 2016

Coverage Status

Coverage decreased (-0.3%) to 75.049% when pulling 2d1fd8e on garlick:pmi_opal_tweaks into bc8ed62 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 10, 2016

Restarted the travis gcc build - that was #512, the known jstat failure.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 10, 2016

Ok, looks good!

@grondo grondo merged commit 1fd19b2 into flux-framework:master Aug 10, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.3%) to 75.049%
Details

@grondo grondo removed the review label Aug 10, 2016

@garlick garlick deleted the garlick:pmi_opal_tweaks branch Aug 11, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.