-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework local connector to be fully nonblocking + mod_main() prototype fix #289
Conversation
Oh, great improvement! Do the flood ping tests actually end up being good verification that the local-connector module doesn't ever block? Is there a test where you try doing "bad stuff" in the client and ensure you don't block the broker plugin? I feel a bit bad about this next comment, but... This probably isn't worth holding up this PR, but at the beginning of a new naming convention (as in My input would be that in a naming schema, the most general name should come first. It isn't that important, but it does have some nice side effects (like in This most general name first convention seems to be pretty standard across various projects (think Lua/Python package names, Go packages, etc. With Go you can even have qualifiers including a domain name, e.g. import {
"golang.org/x/net/context"
} It would be neat if you could (eventually) load broker modules in this style, e.g. if both LLNL and CEA had a scheduler you could refer to them as Sorry to co-opt this PR for my diatribe, but the new convention for naming connectors got me started... |
The flood ping tests are sort of a first order verification that we aren't losing messages while exercising the non-blocking reactor logic. The Regarding naming - the other hierarchy we might consider would reflect modules loading modules. For example, if "sched" loads a "backfill" module, a (future) recursive |
Well I'd prefer that slightly, however it seems like a lot of work to make that change. |
No it seems reasonable and it won't be a lot of work. I'll go ahead and do that. |
RFC 5 defines the mod_main() function prototype to be int mod_main (void *context, int argc, char **argv); We were still mod-transition from zhash_t arguments. Complete the transition by: - broker: call mod_main() with RFC 5 arguments - kvs module: parse key=val arguments - pymod module: convert internally to zhash_t - connector-local module: sockpath=path converted to fixed argv[0] argument. There are currently no users of this. - other in-tree modules: update mod_main() prototype, but no arguments are handled
These are simply flux_msg_t alternatives to zmsg_encode() and zmsg_decode().
Add replacements for zfd_send(), zfd_recv() that use flux_msg_t and handle non-blocking file descriptors properly: int flux_msg_sendfd (int fd, const flux_msg_t *msg, struct flux_msg_iobuf *iobuf); flux_msg_t *flux_msg_recvfd (int fd, struct flux_msg_iobuf *iobuf); On EOF, recvfd returns NULL with errno set to EPROTO. If iobuf is non-NULL, and EWOULDBLOCK/EAGAIN is returned, you may call the function again with identical arguments to continue reading/writing. The 'iobuf' may be NULL if the file descriptor is in blocking mode. If iobuf is NULL and an EWOULDBLOCK/EAGAIN is encountered, this is internally converted to an EPROTO error, after which the channel is in an undefined state. iobuf should be initialized with flux_msg_iobuf_init() before use. While EWOULDBLOCK/EAGAIN handling is in progress, iobuf contains internally allocated storage. If you need to dispose of this, call flux_msg_iobuf_clean(), which is a no-op if no internal storage is allocated.
Implement non-blocking reactor handling for client send/recv path. Add a message queue for each client, where messages for the client can be stored while the client is not ready for writing. Also: address possible starvation issue in the client read path by handling at most one message before letting the reactor run. Fixes flux-framework#83
Now flux_send() on this connector no longer ignores FLUX_O_NONBLOCK.
zfd_send() and zfd_recv() were never designed for nonblocking I/O. They also lived in libutil and used zmsg_t, and since libutil doesn't depend on anything in libflux, couldn't be converted to flux_msg_t. Better functions are now provided in libflux, and users converted, so these are euthanized.
This test wasn't driven by anything in the check target, and it was using the zfd class that was just removed. Vanquished!
Add basic tests for encode/decode and sendfd/recvfd. Update message test to flux_msg_t (mostly).
And update the dictionary.
In preparation for a sharness test that pumps some large messages through the local connector using flux-ping, add a bit more verification that data has not been mangled.
With --batch --count, responses are not processed until count requests have been sent. This may be useful for testing the non-blocking client write path in the local-connector module.
Run flood pings of various sizes and counts under timeout. If any message is mangled or lost, the test should fail beacuse flux-ping will hang and the sharness timer will catch it. This is a stress test of sorts on the local connector and the corresponding 'local-connector' comms module.
Just forced a push with local-connector renamed to connector-local. |
Looks good! Also checked out on my c9.io image. Merging... |
rework local connector to be fully nonblocking + mod_main() prototype fix
This change goes along with flux-framework/flux-core#289 where the prototype of mon_main() changed to use argc, argv style argument passing rather than a zhash_t.
In this PR the "api" module is renamed to "local-connector", is updated to use new reactor/message interfaces, and is restructured to implement non-blocking management of client (unix domain socket) file descriptors. In addition, the connector itself now honors FLUX_O_NONBLOCK if specified for its methods backing
flux_send()
andflux_recv()
. This fixes #83.flux_msg_t now has some new functions with unit tests and man pages:
flux_msg_encode()
to be used instead ofzmsg_encode()
flux_msg_decode()
to be used instead ofzmsg_decode()
flux_msg_sendfd()
to be used instead ofzfd_send()
flux_msg_recvfd()
to be used instead ofzfd_recv()
The zfd functions, now removed, were a hold out user of
zmsg_t
, and were designed for blocking operation and hacked to be "a little bit non-blocking". Gone!A sharness test
t0007-ping.t
was added along with some enhancements to theflux-ping
program to move messages through the connector in both directions to exercise the non-blocking client management.Finally, an unrelated change is
mod_main()
in all in-tree comms modules was updated to use argc, argv style arguments rather than a zhash_t as specified in RFC 5. (This will require a change to out of tree modules like those in flux-sched; I will submit a PR to flux-sched)