Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
libpmi: clean up simple server wire protocol code #2185
The "server" end of the PMI-1 code in libpmi was pretty crufty, lacked documentation, and was sparse on tests.
This PR implements a little refactoring that hopefully makes it clearer, and adds tests for code that wasn't being exercised.
It also removes the server code from the installed
Problem: ldd shows installed libpmi.so.0.0.0 depends on libczmq, libzmq, and their dependedencies, but libpmi was designed to be standalone. These dependencies are due to inclusion of simple_server.[ch] in the "noinst" internal libpmi.la, but the simple_server code is for internal use only and is not needed in the installed client-only library. In libpmi, build two noinst libraries: libpmi_client.la and libpmi_server.la, and only include the former in the installed libpmi.so.
Problem: the libpmi/simple_server.[ch] is hard to understand. Add the rank to pmi_simple_server_request() arguments and create per-client state, hashed by rank. The per-client state holds multi-line spawn command parsing state, and the void *client user-supplied handle. Simplify barrier handling so that a count rather than a list of client pointers is accumulated. On barrier exit, iterate over the client hash to generate responses. Simplify the multi-line spawn command parsing so that lines between 'mcmd' and 'endcmd' are discarded rather than stored, when we know the only multi-line command is spawn, which is not implemented. Add a documentation block at the top of simple_server.c that hopefully will assist the next person who has to understand this code. Update flux-start and test/server_thread.c for the pmi_simple_server_request() arg change.
Problem: there is no test coverage for server-side interpretation of publish, unpublish, lookup, and spawn requests. These requests all return PMI_FAIL and are not fully implemented on the server, but we should at least exercise the code and make sure it gets the response to the user. Add dummy client code to the test and make each request. Enable tracing on the server side and route trace telemetry through libtap diag(), then remove a few explicit diag() calls that are now redundant.
@@ Coverage Diff @@ ## master #2185 +/- ## ========================================== + Coverage 80.9% 81.06% +0.16% ========================================== Files 199 199 Lines 31715 31688 -27 ========================================== + Hits 25658 25687 +29 + Misses 6057 6001 -56