New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-start: add embedded server #3650
Conversation
This pull request introduces 1 alert when merging 5d3b389 into 166bff3 - view on LGTM.com new alerts:
|
fad351f
to
d94e8cc
Compare
This is pretty neat. My mind is starting to explore all the neat things for which this could be used. The flux-shell application is a good example. Does this open up the ability for programs to route messages as well? How do you derive the URI for one of these connections? Could there be a special URI to attach to a named process with |
librouter has other abstractions that make that relatively straightforward (e.g. see
Heh, do you mean like have a special URI path that instructs flux_open to internally substitute the final component of FLUX_URI? That's kind of a neat idea that would save a bit of code. Let me think about that one. Edit: |
Well there is some precedent for this (I'm thinking of something like Sorry if the idea got you a little off track. |
I'm not sure that would work for the shell, since if there are multiple jobs running under the same broker, the rundir directory is not unique. Also the shell may not have write permission there to create a socket (as a guest). There might be other approaches for "socket discovery" though that could make a flux ecosystem easier to navigate, but maybe that's a topic for another day? |
Just realized I had not documented another caveat - that the server |
Yeah, duh that is a problem. 🤦 Yeah, we'll just kick the can down the road awhile longer. It is not likely we'll add this capability to the shell anytime soon anyway, and no use discussing this for test-only flux-start use |
I forced a push a change to set a new environment variable |
Also, Edit: Realized I didn't make it clear - I'm fine with the implementation as it is now. The suggestion to put the server behind a I do think the |
As discussed at coffee, I don't think it's necessary to provide an option to enable/disable this as the service is low overhead and seems unlikely to get in the way. We could easily add it later if needed. We can't sanitize FLUX_START_URI in the broker because we need it to be passed through when a test program is run as the initial program. Should we sanitize it in the job shell, e.g. diff --git a/src/shell/task.c b/src/shell/task.c
index 5d1912db7..a90123d01 100644
--- a/src/shell/task.c
+++ b/src/shell/task.c
@@ -150,6 +150,9 @@ struct shell_task *shell_task_create (struct shell_info *info,
getenv ("FLUX_KVS_NAMESPACE")) < 0)
goto error;
}
+ /* Sanitize FLUX variables that should not propagate to jobs.
+ */
+ flux_cmd_unsetenv (task->cmd, "FLUX_START_URI");
return task;
error:
shell_task_destroy (task); It feels slightly wrong, like we're on a slippery slope to complex environment propagation rules. Maybe it's a better idea to revert the change that added the env var? |
I think you had convinced me that propagation of |
OK, then let's leave it in and press forward. Thanks for the discussion! I restarted CI - one builder got stuck on that shell input sharness test, which I think is a known one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, LGTM!
@@ -191,7 +191,7 @@ int flux_msg_set_flags (flux_msg_t *msg, uint8_t flags); | |||
/* Get/set string payload. | |||
* flux_msg_set_string() accepts a NULL 's' (no payload). | |||
* flux_msg_get_string() will set 's' to NULL if there is no payload | |||
* N.B. the raw paylaod includes C string \0 terminator. | |||
* N.B. the raw payload includes C string \0 terminator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit message typo? I see only one comment misspelling, not two :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, weird. I could have sworn there were two. Oh, found the same misspelled word in event.h. I'll tack that on and fix the commit message.
Problem: spelling errors in header comments Correct spelling.
Problem: some function protos spill over 80 columns, and others don't line up. Consistently indent parameters that span lines. Break long lines where needed. Drop duplicate inline comments that happened to run over 80 cols.
Problem: need to add "server" functionality to flux-start. Add a generic "usock_service_create()" function that establishes a usock (PF_LOCAL) socket listener, and returns a server-side flux_t handle. Clients may use the flux API to flux_open() local://${path} and access services registered on the server side flux_t handle in the usual way. Limitations: - guest users are rejected (message cred.userid != server userid) - event messages may not be published or subscribed to - clients may not register services - rank routing information in requests is ignored - a request with topic 'disconnect' is sent upon client disconnect, not "<service>.disconnect" as specified in RFC 6. - server flux_t handle requires async reactor operation (one cannot call flux_recv() in a loop and expect it to make progress) Add simple unit test.
Problem: tests that involve starting and stopping brokers are difficult to orchestrate using flux-start, but we will need support for running such tests in CI. Use the new usock_service to embed a server in flux-start. The server creates a socket named 'start' in the rundir, and sets FLUX_START_URI in the environment for clients. Currently the server has support for the following methods: start.status Return an array of procs that includes broker PIDs in rank order disconnect Log receipt of disconnect message. This is a placeholder for future streaming socket management. Add a test front end command that provides the client side for the start.status RPC, and is available to add sub-commands for simple, shell script driven testing. More sophisticated, event driven test programs would be written in python and combine broker and flux-start communication. Add a few tests to t0001-basic.t to exercise basic function.
Codecov Report
@@ Coverage Diff @@
## master #3650 +/- ##
==========================================
- Coverage 82.76% 82.74% -0.02%
==========================================
Files 325 326 +1
Lines 49025 49166 +141
==========================================
+ Hits 40576 40684 +108
- Misses 8449 8482 +33
|
OK, forced a push with a slightly expanded set of spelling corrections and a more vague header :-) Thanks for that. I'll set MWP. |
This adds a convenience function for setting up an embedded server that listens on a
local://
socket and routes messages between connected clients and a server sideflux_t
handle. It's a bit of an abomination if you think about it too hard, but it does let arbitrary programs offer "services" in the usual way by registering message handlers, and clients to connect to it and use RPCs the way they would do with flux, so pretty convenient.Maybe this will come in handy if the shell needs to offer a standalone service to applications, especially if that application is flux?
A server is then embedded in
flux-start
for the purpose of enabling system tests that restart brokers. In this PR, the only service method just lists broker pids. I thought maybe getting this much in with tests was maybe a good place to cut this PR and start the next one?