-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libsubprocess: pre-exec & post-fork hooks #2152
Conversation
Oh wait, for the PRE_EXEC hook, it's executed in the child. So the |
Good observations!
I think yes. One use case might be a post-fork hook in which the parent moves the child task to another cgroup, or adjusts the subprocess before it has a chance to call
Hm, yeah that is a bit tricky, but I think it would work for the time being. We could also handle this by allowing optional
Yeah, possibly POST_FORK is not necessary -- it would be a shame to add all this code only to have one place where it is required, so I think we should think about this further. I'll try to gather some existing use cases and we can maybe go through them to see if there is a much simpler way to handle all cases. PRE_EXEC is a bit more tricky because this needs to run in the child process not the parent, and I don't think it would be safe to try to use the reactor callbacks in that context. |
It is probably safe to use the |
Regarding this, a key requirement for post_fork is that this callback happens after the |
Maybe I'll see if an implementation of #1948 using |
I like the idea of a
We'll need POST_FORK then. By the time FLUX_SUBPROCESS_STARTED is sent to the callback, the exec may have already happened. |
It seems like this might be fixable for "local" exec. We pause the child process after If it ends up being too difficult to enable this intended behavior for the FLUX_SUBPROCESS_STARTED reactor callback, then we need POST_FORK as you say. However, then I'd suggest that there isn't a real use case for FLUX_SUBPROCESS_STARTED, and we should remove that reactor callback. |
Does it seem like we made a misstep in API design by not having a way to create a Also, I'm kind of questioning my original proposal for the design of these subprocess hooks (or maybe just one hook as the case may be) in #2008, especially now that we're discussing the actual implementation. I'm really open to any changes here if anyone has ideas. For instance, I'm not really sure the libsubprocess interface for these hooks needs to support a stack of functions. The use case I had in mind was to allow the job shell or other components managing processes to be able to support a configurable stack of plugins (e.g. like Slurm spank plugins). However, support for multiple plugins would be more easily supported on top of the |
I think it'd be hard at this point to change it. The state change callback mechanism is sort of independent of anything else going on, so it might be hard to synchronize things. And obviously for remote processes, the synchronizing would be even tougher.
I see your point. I think its primary purpose is that it is the "first state" in the state change DAG that the user sees.
INIT is never reported to the user, so without STARTED, the first state a user would see is RUNNING or EXEC_FAILED. Which I suppose could be fine. It seems that STARTED is used in one place in cron.
But this was simply done right after the |
Looking back at #1331, the original design did not have aux set/get involved, so I guess it was something added later. I remember discussing it at one point in the development, and since all callbacks were called after re-entering the reactor, it was considered "safe". But that was of course before #2008. If we supported a I hate to even suggest it, but would something like:
be a good idea? |
Hm, it does solve the immediate problem, though. Another idea would be to add Another question I had looking at this again is if libsuprocess should be updated to use the common |
I thought about that as well, but wasn't sure if we might need to add different auxiliary data for each subprocess? Then the API might be inconvenient.
Internally it already does. |
Oops! yeah, I was somehow on an old branch when looking at the code 😞
I was thinking that the aux data for the flux_cmd_t parent of a flux_subprocess_t would be overridden by the aux data for each subprocess. That is, kind of like the case with |
One subtlety I disliked about having another arg variable in
is that the user now has to manage the destruction of If we no longer need to support a stack of hooks, then
Edit: I now realize the above may not make much sense. What I meant was:
It's a little tricky to determine when its safe to destroy certain data. Do we pass in a |
912cf6c
to
d383f35
Compare
re-pushed trying the approach of having a struct like
available. I think the API and implementation should be cleaner and more obvious what's going on, as well as what is the responsibility of the caller. |
Did you consider making the hooks an optional part of the ops structure? Since the hooks are the uncommon case it would be nice not to add an extra NULL parameter for the very common case. (Note: I didn't look at this it could be a dumb idea) |
I hadn't thought of that It's a good idea. I guess the minor con of doing so is adding hooks into the I think it's a minor pro/con on either direction. |
while trying to add a simple unit test for the First I did something like:
i.e. have the hook increment a counter. But it didn't work, eventually realizing, "Oh yeah, it's updating a count value that's in the child process memory space". Ok .., lets instead do
having the hook just write some data to a pipe to be read back in the parent. This didn't work either and stumbled me for a bit, until I realized, "Oh yeah,
|
Use case for pre-exec hook is to have some state created for all children in the parent, then apply it in each child before exec. In all cases the "state" will be read-only and won't need to be modified in the parent. E.g. in the job shell use case the "state" might be a structure of assigned values for the child process, e.g. local taskid, global taskid, pre-assigned default cpu affinity, etc. A good test case would be to create a |
In that case, I guess the |
Just pushed a branch that I would consider a reasonable merge candidate, going with the new |
Thanks! Perhaps I'll try an implementation in |
Is this for #1948? I can take a crack at it as well, maybe see how it works. I lack some background for the issue in question though, so could chat about it tomorrow. |
re-pushed branch, I lazily added |
Yes. I already had most of the code written before I realized there was no way to implement it. |
One very minor thing I noticed at first: typedef struct {
flux_subprocess_hook_f pre_exec_cb;
void *pre_exec_arg;
flux_subprocess_hook_f post_fork_cb;
void *post_fork_arg;
} flux_subprocess_hooks_t; For consistency with |
Proof of concept at least seemed to work out well -- here's the most relevant bit: @@ -548,6 +631,15 @@ void pmi_server_finalize (void)
pmi_simple_server_destroy (ctx.pmi.srv);
}
+void pre_exec_hook (flux_subprocess_t *p, void *arg)
+{
+ struct client *cli = arg;
+ char *s = NULL;
+ hwloc_bitmap_list_asprintf (&s, cli->cpuset);
+ fprintf (stderr, "setting cpubind for client rank=%d: %s\n", cli->rank, s);
+ hwloc_set_cpubind (topology, cli->cpuset, 0);
+}
+
int client_run (struct client *cli)
{
flux_subprocess_ops_t ops = {
@@ -557,6 +649,10 @@ int client_run (struct client *cli)
.on_stdout = NULL,
.on_stderr = NULL,
};
+ flux_subprocess_hooks_t hooks = {
+ .pre_exec_cb = pre_exec_hook,
+ .pre_exec_arg = cli,
+ };
/* We want stdio fallthrough so subprocess can capture tty if
* necessary (i.e. an interactive shell)
*/
@@ -564,7 +660,7 @@ int client_run (struct client *cli)
FLUX_SUBPROCESS_FLAGS_STDIO_FALLTHROUGH,
cli->cmd,
&ops,
- NULL)))
+ &hooks)))
log_err_exit ("flux_exec");
if (flux_subprocess_aux_set (cli->p, "cli", cli, NULL) < 0)
log_err_exit ("flux_subprocess_aux_set");
grondo@asp:~/git/flux-core.git$ src/cmd/flux start -s 2
flux-start: warning: setting --bootstrap=selfpmi due to --size option
setting cpubind for client rank=0: 0-1
setting cpubind for client rank=1: 2-3
grondo@asp:~/git/flux-core.git$ flux hwloc info
2 Machines, 4 Cores, 4 PUs |
Add flux_subprocess_hooks_t argument to flux_exec() and flux_local_exec(), to allow caller to run special callbacks right before child executes command, in both child and parent. Also add `in_hook` flag to ensure that many libsubprocess API functions cannot be called inside a hook callback. Add unit tests and update callers appropriately. Fixes flux-framework#2008
Re-pushed fixing that nit. |
Codecov Report
@@ Coverage Diff @@
## master #2152 +/- ##
==========================================
- Coverage 80.47% 80.47% -0.01%
==========================================
Files 200 200
Lines 31817 31831 +14
==========================================
+ Hits 25604 25615 +11
- Misses 6213 6216 +3
|
You happy with this one @chu11? Happy to merge this as is, any minor tweaks can be done later. |
yeah, I'm happy with it, it seems to accomplish what is needed at this point. |
WIP for #2008 . Ignore most of this code & commits, which are garbage and I haven't even tested really. Mostly posting this to get initial thoughts on the API in
subprocess.h
. After going back and forth on a few ideas, I think this is not bad, but there's one hairy issue which I detail below.So basically, we have a new
flux_subprocess_hooks_t
data structure that users create/setup and pass intoflux_exec()
orflux_local_exec()
as a new arg. Pros / cons I found along the way:I considered adding "hooks" to
flux_cmd_t
, but I did not like the fact that hooks only work against local subprocess and not remote ones. So I felt the the hook API/infrastructure had to be outside of the thecmd
part.It's a tad inconsistent that we have
flux_cmd_t
which is a created/destroyed object,flux_subprocess_ops_t
, which is just a struct, and nowflux_subprocess_hooks_t
which is a created/destroyed object. I guess it's not a big deal, but that's there.I assume the hook callbacks need to be able to call accessors, such as
flux_subprocess_pid()
? I think there are racy bits here and there that need to be handled for that case. But that made me wonder, do hook callbacks need access to theflux_subprocess_t
structure? If no, that makes things easier.The callback set in
flux_subprocess_hooks_add()
uses the currentflux_subprocess_f
function prototype (which I picked half randomly), which is justvoid (*flux_subprocess_f) (flux_subprocess_t *p);
. So right now there is no way to pass an extra argument to the callback. Is this important?flux_subprocess_aux_set()
, b/c there's no way to know if a process has been forked yet or not by the time a user callsflux_subprocess_aux_set()
.subprocess_fork()
andsubprocess_exec()
that allowed callers to pause between hook points.libsubprocess
API in the general case b/c callbacks are handled via the reactor. So as long asflux_subprocess_aux_set()
is called before re-entering the reactor, you're guaranteed to get your aux pointer in any of theops
callbacks.POST_FORK
hook could be handled via a reactor callback, but I'm assuming thePRE_EXEC
cannot, but didn't look into this too much.void *arg
argument to the hook callback? Andflux_subprocess_hooks_add()
would be changed to something likeflux_subprocess_hooks_add (p, type, cb, arg, arg_free)
?