Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: flux-shell: support signal forwarding #2244

Merged
merged 21 commits into from Jul 25, 2019

Conversation

@grondo
Copy link
Contributor

commented Jul 17, 2019

This PR is in progress, but since the current code is functional, I'll post it for early review.
Still needs some form of testing, but other open questions (some of which already discussed with @garlick)

  • The job-kill event name has the jobid embedded currently, e.g. job-kill.<jobid>, so that the shell only needs to subscribe to events for the current jobid. In talks with @garlick, this may have been a false economy. Easy to change now if necessary.
  • The job-manager.kill RPC only supports signaling all tasks in a job. @garlick had wondered if it would be useful to allow signaling a subset of the tasks. This would be easy to implement (but would require a little extra work and more importantly extra tests)

Other TODO

  • add signal handler to job shell to forward some signals, allow special handling of other signals(?) (main use case is testing in standalone mode)
  • improve job exception handling in job-exec by use of job-kill event (?)

Fixes #2191

@grondo grondo force-pushed the grondo:job-kill branch from 35731b2 to dfbd597 Jul 17, 2019
@dongahn

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2019

The job-manager.kill RPC only supports signaling all tasks in a job. @garlick had wondered if it would be useful to allow signaling a subset of the tasks. This would be easy to implement (but would require a little extra work and more importantly extra tests)

If this is easy enough to do, I would go for it. For computational steering, I remember LLNL's common cases were to act on the master rank or all of the rank processes. I can imagine a scheme in between in the future with coupled and/or ensembles approach being increasingly common.

@garlick

This comment has been minimized.

Copy link
Member

commented Jul 18, 2019

I think the event topic is fine as is. We can always revist during scale testing if the number of subscriptions registered with the broker turns out to be a factor, which it probably is not.

In the interest of keeping our schedule, I'd suggest we open an issue on flux job kill IDSET with @dongahn's rationale quoted, and defer until we have time.

This looks great overall to me.

@grondo grondo force-pushed the grondo:job-kill branch from c37210a to 0bc0a57 Jul 19, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2019

Ok, I've pushed a set of incremental changes here, to be squashed later:

  • Moved the kill_all_tasks function to shell.h as flux_shell_killall() for use from multiple shell modules
  • Added local signal handlers to the shell for SIGTERM and SIGINT so that the shell itself can forward these signals to tasks instead of dying itself.
  • Added a bunch of tests for flux job kill and the job-manager.kill RPC, etc.

The only TODO left here is to modify job-exec exception handling to use SIGTERM instead of SIGKILL to terminate a job after an exception.

Copy link
Member

left a comment

This seems like it's close to ready. I added a couple minor suggestions inline.

if (!(rolemask & FLUX_ROLE_OWNER) && userid != getuid ()) {
log_msg ("kill: invalid userid %u", userid);
return;
}
if (flux_msg_unpack (msg, "{s:i}", "signum", &signum) < 0) {

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

Suggestion: flux_event_unpack().

This comment has been minimized.

Copy link
@grondo

grondo Jul 19, 2019

Author Contributor

Good suggestion. However, isn't flux_msg_unpack() == flux_event_unpack() if you don't pass storage for topic? (Here we don't need the topic, just the payload) And we already are guaranteed FLUX_MSGTYPE_EVENT because of FLUX_MATCH_EVENT used on msghandler.

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

True, just for clarity I suppose, or the general argument that the event format could change. Meh, not a big deal.

This comment has been minimized.

Copy link
@grondo

grondo Jul 19, 2019

Author Contributor

Hm, good point. If event format might change, though, we might want to have flux_event_t instead of flux_msg_t to avoid confusion down the road..

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

Not today, but point taken.

if (flux_msg_unpack (msg, "{s:i}", "signum", &signum) < 0) {
log_msg ("kill: invalid kill event msg payload");
const char *p;

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

I assume this was here to debug a problem, but it's perhaps overkill now?

This comment has been minimized.

Copy link
@grondo

grondo Jul 19, 2019

Author Contributor

Yes, sorry that crept in!

*
* (ignoring SIGRT*)
*/
static const char *sigmap[] = {

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

Duplicate of sys_siglist[]? (see psignal(3))

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

oh duh, not! (forehead slap)

This comment has been minimized.

Copy link
@grondo

grondo Jul 19, 2019

Author Contributor

I did feel this was a bit hacky, and should at least wrap each definition in #ifdef SIGXXX like is done in coreutils kill.c -- however, I felt maybe this was ok for now and could be "fixed" later?

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

Agreed.

@@ -277,6 +285,9 @@ int main (int argc, char *argv[])
if (!(shell.io = shell_io_create (shell.h, shell.info)))
log_err_exit ("shell_io_create");

if (shell_pre_exec (&shell) < 0)

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

Were you wanting to move all pre-exec initialization to this function? (Fine by me)

This comment has been minimized.

Copy link
@grondo

grondo Jul 19, 2019

Author Contributor

I wasn't thinking of moving all pre-exec initialization to this function, but was thinking the shell_pre_exec() function as a start of a hook for plugins. I was thinking that the signals and kill-event handlers setup fit well into this model as kind of internal plugins. The other initialization being done in the shell somehow seemed more fundamental, so didn't lend itself to easily move into this function. However, maybe at least pmi initialization should go in there...

How about I tack this on to the end once I've had a chance to fully complete this feature (starting to get lost with the number of fixup commits...)

This comment has been minimized.

Copy link
@garlick

garlick Jul 19, 2019

Member

Glad you're forward thinking about plugins! Sure, however you like.

@grondo grondo force-pushed the grondo:job-kill branch from 0bc0a57 to f78ebc5 Jul 19, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2019

Just squashed all the fixup! commits to make the PR a bit more sane. One or two commits still pending on the job-exec module...

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2019

Also - based on a suggestion by @garlick, the job-kill.<jobid> event will be renamed to shell-<jobid>.kill, and the shell will subscribe to shell-<jobid>.* events at the top level, so the kill module (and any other plugin that wants to receive events) only needs to set up the msg handler for events of interest.

@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 20, 2019

Some updates here:

  • move event subscription to shell initialization and rename event topic shell-<id>.
  • add a helper function for shell components that want to subscribe to subtopics and use this in the kill event code
  • avoid sending SIGKILL to job shells from job-exec module on job exceptions, instead send SIGTERM then SIGKILL after a configurable timeout.
@grondo grondo force-pushed the grondo:job-kill branch 2 times, most recently from 3310508 to 1b127e4 Jul 20, 2019
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 21, 2019

Fixed a spurious test failure on travis (probably a /bin/sh bash vs dash issue) and did a tiny bit of squashing (sorry if that disrupts someone)

If travis passes, later I will squash away the intermediate work that changed the kill event name to the more standard shell-<id>.kill topic name.

@grondo grondo force-pushed the grondo:job-kill branch from 338e0be to 1f724f3 Jul 22, 2019
grondo added 15 commits Jul 17, 2019
Add handler for "shell-<id>.kill" events to support forwarding
signals to job tasks via the shell.
Add a shell_task_kill () wrapper for flux_subprocess_kill ()
in shell/task.c.
Add a job-manager.kill RPC, which forwards signals for active
jobs via a `shell-<id>.kill` event.
Add flux_job_kill() which sends the job-manager.kill RPC.
Add `flux job kill` as an interface to deliver signals to active
jobs.
Add simple tests for flux-job kill command, mainly error handling.
Ensure job shell processes job-kill events as sent by flux-job kill
command (via job-manager.kill RPC).
Add a flux_shell_killall() function to send a signal to all tasks
currently running in the shell.
Setup signal watchers for SIGINT,SIGTERM delivered to the shell
so that these can be forwarded to tasks instead of directly terminating
the shell.

Adds a new signals.c module for one-stop shopping for all your
signals handling needs.
Ensure job shell forwards locally delivered SIGTERM.
Protect against killing a non-running job in the testexec mock
execution implementation. This is needed because jobinfo_tasks_complete()
is called directly from testexec kill implementation, unlike a real
exec implementation in which tasks would complete asynchronously.
(and thus kill of non-running job is just a no-op)
Problem: The job-exec module immediately sends SIGKILL to all
job shells in a job when an exception occurs. This doesn't give
the shells a chance to terminate the tasks of job and perform
other cleanup.

Instead, send SIGTERM (for which the job shell will have a handler)
and then follow up with SIGKILL after a configurable timeout. The
kill timeout is by default 5.0s, but is configurable via a broker
attribute: job-exec.kill_timeout.
Noww that job-exec sends SIGTERM before SIGTERM, updated
the testexec based tests to look for status code 15 instead
of 9.
In order to test SIGTERM/SIGKILL delivery by job-exec module to
shells on exception, add an optional trap handler in the dummy
shell if the "TRAPS" environment variable is set in the job.
Ensure job-exec module first sends SIGTERM, then SIGKILL after
job-exec.kill_timeout.

Also update the other exception test to check for status=15 instead
of 9.
grondo added 6 commits Jul 20, 2019
Identify job shell output lines with jobid instead of pid
(which is meaningless in this context)
Always subscribe to topic "shell-<id>.*" events and add a new
helper function `flux_shell_add_event_handler` which shell components
can use to add handlers for specific sub-topic events.

The msg_handler_t is not returned to the user and is instead tied
to the flux_t handle for destruction.
Problem: t2500-job-attach.t sometimes fails with flux-start
terminated a broker with SIGKILL.

Presumably this is due to a broker being slow to exit from
a job still running in the background. Cancel the running
job to ensure the test exits with a clean job queue.
Fix typo in final test name:  s/job-exec/job-shell/
Support hex input/output in flux-job id command.
Test flux-job id hex conversion capabilities.
@grondo grondo force-pushed the grondo:job-kill branch from 64d9e1c to d38fde4 Jul 25, 2019
@codecov-io

This comment has been minimized.

Copy link

commented Jul 25, 2019

Codecov Report

Merging #2244 into master will increase coverage by 0.02%.
The diff coverage is 85.63%.

@@            Coverage Diff             @@
##           master    #2244      +/-   ##
==========================================
+ Coverage   80.75%   80.77%   +0.02%     
==========================================
  Files         210      213       +3     
  Lines       33233    33397     +164     
==========================================
+ Hits        26836    26977     +141     
- Misses       6397     6420      +23
Impacted Files Coverage Δ
src/modules/job-manager/job-manager.c 66.03% <100%> (+2.03%) ⬆️
src/modules/job-exec/testexec.c 85.96% <100%> (+0.12%) ⬆️
src/shell/kill.c 100% <100%> (ø)
src/modules/job-exec/exec.c 74.72% <100%> (+0.28%) ⬆️
src/common/libjob/job.c 75.15% <60%> (-0.5%) ⬇️
src/modules/job-exec/job-exec.c 73.59% <70.58%> (-0.29%) ⬇️
src/shell/shell.c 80.51% <74.35%> (-1.54%) ⬇️
src/shell/task.c 90.52% <75%> (+0.87%) ⬆️
src/modules/job-manager/kill.c 91.89% <91.89%> (ø)
src/shell/signals.c 93.33% <93.33%> (ø)
... and 15 more
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2019

I think this one is close to ready. I switched the synchronization for an unreliable test to use flux kvs get --waitcreate on the job namespace instead of flux kvs watch -c1 --full --waitcreate --watch .. and the test seems to be reliable now.

In order to do the above I added a hex argument to flux job id --to/--from.

@garlick

This comment has been minimized.

Copy link
Member

commented Jul 25, 2019

This is great. Merging.

@garlick garlick merged commit be3d53b into flux-framework:master Jul 25, 2019
2 checks passed
2 checks passed
Summary 1 potential rule
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@grondo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2019

Thanks!

@grondo grondo deleted the grondo:job-kill branch Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.