Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: live output #2366

Merged
merged 8 commits into from Sep 21, 2019

Conversation

@chu11
Copy link
Contributor

commented Sep 11, 2019

builds on top of my misc cleanups PR in #2360.

  • have shell send output to KVS live as it happens
  • have shell emit a "output-ready" event to exec.eventlog
    • good name? I considered "output {"ready":"true"}" as well.
  • job attach checks for "output-ready" from exec.eventlog, and then reads / outputs from guest.output as it occurs.
  • Fix racy completion in shell as discussued in #2363
@chu11 chu11 force-pushed the chu11:shell_live_output branch from d354e82 to 67ef752 Sep 11, 2019
Copy link
Member

left a comment

First, nice job, this appears to work!

I made a couple of early comments requesting that we structure the code to plan for some future commit batching, as I think even for the wreck parity milestone we may want to dial this back a bit.

Also, we can push out the guest.exec event, the header, and the first chunk of output in a single KVS commit. There's no reason to emit those separately as we can't really do anything until we have all three.

flux_future_destroy (f);
}

static int shell_output_commit (struct shell_output *out, json_t *entry)

This comment has been minimized.

Copy link
@garlick

garlick Sep 11, 2019

Member

This code should probably not be limited to one eventlog entry per KVS commit, since that's just one extreme of what we will likely need to support. I would suggest continuing to keep a list of entries in struct shell_output, and write this function so it aggregates all of them in one KVS transaction and empties the list. For the initial implementation, we'll just call this function after every append, but later we can call it based on elapsed time and/or number of entries in the list.

}
return 0;
if (shell_output_commit (out, o) < 0)

This comment has been minimized.

Copy link
@garlick

garlick Sep 11, 2019

Member

Now that we have the guest eventlog event, header can go out in the same commit as the first data entry.

log_err ("flux_kvs_txn_put");
goto error;
}
if (!(out->output_ready_f = flux_kvs_commit (out->shell->h,

This comment has been minimized.

Copy link
@garlick

garlick Sep 11, 2019

Member

This should go in the same commit as the output header (and probably the first data entry).

@chu11 chu11 force-pushed the chu11:shell_live_output branch from 67ef752 to bc903c9 Sep 12, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2019

Re-pushed with changes per @garlick comments above.

  • A fair amount of code was re-worked as a consequence, so the PR looks quite different.

  • Changes in shell/output.c were far smaller, since we could keep more of the original code.

  • There is no longer a "output.ready" reference, b/c we've added the "output-ready" eventlog entry to an already existing "commit" transaction.

  • I'm not 100% certain with the error handling I did in shell_output_commit(). There are a number of "if an error happens here, what happens" question marks. Like if json_array_clear() doesn't work, would we duplicate output later? But I suspect we have this all over the place and its not worth thinking too hard over it right now.

@garlick

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

Quick comment on error handling in the shell: It should be OK to print to stderr and exit (1) for unrecoverable errors, and the job-exec module should clean up from that gracefully. Looks like a json_array_clear() failure only occurs if you pass it something that's not a json array, so it's probably one of those.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2019

It should be OK to print to stderr and exit (1) for unrecoverable errors,

Just a mental note: after tasks have been started, there is no way for job-exec to signal the tasks if the shell exits with an error. In the case of a system instance using containers, the epilog would remove the container, thus cleaning up tasks, but for a single user instance remaining tasks may just be orphaned.

@chu11 chu11 force-pushed the chu11:shell_live_output branch from bc903c9 to a7491e5 Sep 12, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2019

I'm not 100% certain with the error handling I did in shell_output_commit(). There are a number of "if an error happens here, what happens" question marks.

Looking through the code and thinking about it, I think the only error case I truly care about is if the output header and output-ready event succeed. Anything else, while undesirable, is perhaps survivable (e.g. double output, etc.). So I think I can just goto error on every error within shell_output_commit(). (Re-pushed with this change)

So on the very first call to shell_output_commit(), I'd like the shell to error out if there's an error. This is called via shell_output_init() -> shell_output_create() -> shell_output_header() -> shell_output_commit().

Unfortunately, plugstack_call() appears to return 0 even if an error occurs in one of the init functions. Should plugstack_call() not do that?

@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2019

Unfortunately, plugstack_call() appears to return 0 even if an error occurs in one of the init functions. Should plugstack_call() not do that?

This will be fixed as part of current shell plugin work.

During init no tasks have been launched so it is appropriate to log_err_exit here. We may require a more advanced "error and exit" function in the shell which cleans up tasks and exits in the future, but I think for now just try to only exit with error before tasks are started.

@chu11 chu11 force-pushed the chu11:shell_live_output branch from a7491e5 to e29c8b0 Sep 13, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2019

This will be fixed as part of current shell plugin work.

Ok, we could just accept that this will be fixed by your PR? Or shall we wait for your PR to go in first?

rebased & re-pushed, with minor tweaks to shell_output_commit() order of calls and added some comments.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2019

I think we can accept the bug for now and I'll be sure it gets fixed in my PR

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2019

rebased

Copy link
Member

left a comment

A couple of questions inline - not sure if they rise to the level of "review comments" :-)

* The leader shell implements an "shell-<id>.output" service that all
* ranks send task output to. Output objects accumulate in a json
* array on the leader. Output objects are written to the "output"
* key in the job's guest KVS namespace, or directly to stdout/stderr

This comment has been minimized.

Copy link
@garlick

garlick Sep 16, 2019

Member

Cite RFC 24?

if (shell_output_header_append (shell, out->output) < 0)
if (!(out->pending_commits = zlist_new ()))
goto error;
if (shell_output_header (out) < 0)

This comment has been minimized.

Copy link
@garlick

garlick Sep 16, 2019

Member

You mentioned you wanted failure to commit the header to be fatal, but it looks like if the commit RPC returns an error, it is non-fatal in shell_output_commit_completion()?

This comment has been minimized.

Copy link
@chu11

chu11 Sep 16, 2019

Author Contributor

ahh good catch. I was primarily concerned with fatal error on the request side, had forgotten the response side.

This comment has been minimized.

Copy link
@chu11

chu11 Sep 16, 2019

Author Contributor

hmmm, on such an error would it be better to run flux_shell_killall() rather than just exit? B/c all tasks have started.

This comment has been minimized.

Copy link
@grondo

grondo Sep 16, 2019

Contributor

I think we need a shell_fatal_error () or similar call that attempts to clean up before a fatal error when tasks have started, instead of issuing the killall at the site of the error. Or perhaps with some thinking we could come up with a cleaner way to handle fatal errors with proper cleanup -- haven't thought about it too much at this point.

This comment has been minimized.

Copy link
@chu11

chu11 Sep 16, 2019

Author Contributor

yeah, I was just starting to think that. Perhaps for now, log_err_exit() is fine and we can figure out the proper / best / cleanest way to deal with this in another issue.

@@ -204,6 +207,10 @@ static int shell_output_commit (struct shell_output *out)
out->output_ready_sent = true;
if (flux_future_then (f, -1, shell_output_commit_completion, out) < 0)
goto error;
if (flux_shell_add_completion_ref (out->shell, "output.commit") < 0) {

This comment has been minimized.

Copy link
@garlick

garlick Sep 16, 2019

Member

With the completion references taken for outstanding futures, will shell_output_destroy() ever be called with a non-empty out->pending_writes? (Maybe tracking futures in the list is not necessary anymore?)

This comment has been minimized.

Copy link
@chu11

chu11 Sep 16, 2019

Author Contributor

I actually had pondered that, and thought there were still circumstances it was needed, such as if the shell was signaled. But now that I look at it again, perhaps its not needed afterall. I'll remove.

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 16, 2019

rebased & re-pushed, addressing @garlick's comments above. Per some discussion above since there isn't a very clean way to "fatal error" after tasks have started, I lazily call log_err_exit() on the fatal error of the output header / output-ready not succeeding. Created a new issue #2378 for this particular case.

Removed the "pending_commits" list, as it was no longer necessary./

Found a corner case along the way, we shouldn't call shell_output_commit() for the output header in the standalone case. We should call shell_output_flush() instead. Technically, shell_output_flush() doesn't use the header at the moment, but that will change in the future.

@garlick

This comment has been minimized.

Copy link
Member

commented Sep 18, 2019

Having another look at this. It looks pretty close IMHO.

One thing I'm not too sure about is the decision to make a failed KVS commit after the header non-fatal. Losing output could be a serious thing, depending on how the job is using its I/O. My vote would be to make all related (unlikely) failures fatal to the shell to be sure the job fails. When we get a better interface that cleans up any tasks, we can switch to that, but I worry a bit about allowing a job to report success it's in fact misplaced the "42" that was the result of your long simulation (OK probably an extreme hypothetical!)

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

Re-based & re-pushed with suggested fix of making errors on commit (and I also did flushes to terminal) fatal and calling log_err_exit(). Very tiny fixes compared to what you just reviewed.

diff --git a/src/shell/output.c b/src/shell/output.c
index 7d82adc..a2229e5 100644
--- a/src/shell/output.c
+++ b/src/shell/output.c
@@ -141,13 +141,10 @@ static void shell_output_commit_completion (flux_future_t *f, void *arg)
 {
     struct shell_output *out = arg;
 
-    if (flux_future_get (f, NULL) < 0) {
-        if (flux_future_aux_get (f, "output::ready"))
-            /* failng to commit output-ready or header is a fatal
-             * error.  Should be cleaner in future. Issue #2378 */
-            log_err_exit ("shell_output_commit ready");
-        log_err ("shell_output_commit");
-    }
+    /* Error failing to commit is a fatal error.  Should be cleaner in
+     * future. Issue #2378 */
+    if (flux_future_get (f, NULL) < 0)
+        log_err_exit ("shell_output_commit");
     flux_future_destroy (f);
 
     if (flux_shell_remove_completion_ref (out->shell, "output.commit") < 0)
@@ -206,12 +203,8 @@ static int shell_output_commit (struct shell_output *out)
     }
     if (!(f = flux_kvs_commit (out->shell->h, NULL, 0, txn)))
         goto error;
-    if (!out->output_ready_sent) {
-        /* used as flag, content of set irrelevant */
-        if (flux_future_aux_set (f, "output::ready", f, NULL) < 0)
-            goto error;
+    if (!out->output_ready_sent)
         out->output_ready_sent = true;
-    }
     if (flux_future_then (f, -1, shell_output_commit_completion, out) < 0)
         goto error;
     if (flux_shell_add_completion_ref (out->shell, "output.commit") < 0) {
@@ -261,13 +254,15 @@ static void shell_output_write_cb (flux_t *h,
         errno = ENOMEM;
         goto error;
     }
+    /* Error failing to commit is a fatal error.  Should be cleaner in
+     * future. Issue #2378 */
     if (out->shell->standalone) {
         if (shell_output_flush (out) < 0)
-            log_err ("shell_output_flush");
+            log_err_exit ("shell_output_flush");
     }
     else {
         if (shell_output_commit (out) < 0)
-            log_err ("shell_output_commit");
+            log_err_exit ("shell_output_commit");
     }
     if (eof) {
         if (--out->eof_pending == 0) {

@chu11 chu11 force-pushed the chu11:shell_live_output branch from 8ec3546 to 4040bba Sep 19, 2019
@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2019

I accidentally ran an executable that didn't exist in this branch and it seems to cause a hang from which you can't escape:

$ flux srun nofile
flux-job: task(s) exited with exit code 1
^Cflux-job: one more ctrl-C within 2s to cancel or ctrl-Z to detach
^Cflux-job: cancel: unknown job id
^Cflux-job: cancel: unknown job id
^Cflux-job: cancel: unknown job id
^Cflux-job: cancel: unknown job id

This doesn't occur on master:

$ flux srun nofile
flux-job: task(s) exited with exit code 1
$
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

@grondo aw crud. I'm thinking there's a corner case there where the guest.exec.eventlog will never be created, thus a hang forever. The hang doesn't exist in my branch with this and the fix for #2338 (PR #2374).

Minimally a test should be added b/c we don't have a test for this. Think we should combine PR #2374 with this issue and make a test? Or try to fix in this PR? Or just deal with it in that PR? I'll see how easy/hard it might be to fix.

@garlick

This comment has been minimized.

Copy link
Member

commented Sep 19, 2019

Is it possible to reverse the order of this and PR #2374?

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

Basically in this case, the job has "started", but has an obvious error which leads to no output. flux-job sees that job has "started" and believes there is output, so begins watching guest.output. No EOFs ever come, thus the watch on guest.output hangs.

Potential solutions:

  • #2374 - the guest.output will return ENODATA at the of guest.output
  • output the output-ready event later, after the first "real" stdout/stderr occurs, or perhaps when the state in libsubprocess turns to RUNNING, so we know the subprocess actually ran for real.

Re-ordering #2374 first is totally doable/fine and would solve this, but I think the latter solution is more correct. I was previously outputting "output-ready" when we output the header (under the assumption that meant the job was running). But that no longer seems correct.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2019

output the output-ready event later, after the first "real" stdout/stderr occurs

For a job with little or no output, would this cause output-ready event to be delayed until the first output stream is closed? If the output-ready event is delayed until the first other stream event, then is there truly a need for the output-ready event?

BTW, I was actually surprised there was no output at all in this case. Even for an exec failure, shouldn't the child processes still generate some stderr of the form exec: No such file or directory or similar?

Plugins that run in the pre-exec hook may also generate errors to stderr, which we have to ensure the shell captures and sends to the output eventlog, even if the error is fatal.

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

For a job with little or no output, would this cause output-ready event to be delayed until the first output stream is closed? If the output-ready event is delayed until the first other stream event, then is there truly a need for the output-ready event?

Yeah. Outputting output-ready when the job is in the RUNNING state is better/correct.

BTW, I was actually surprised there was no output at all in this case. Even for an exec failure,
shouldn't the child processes still generate some stderr of the form exec: No such file or directory > or similar?

Plugins that run in the pre-exec hook may also generate errors to stderr, which we have to ensure > the shell captures and sends to the output eventlog, even if the error is fatal.

In these cases I believe the error would be in the broker log? guest.output should only be storing stdout/stderr from the actual exectuable? Not from the shell?

@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2019

In these cases I believe the error would be in the broker log? guest.output should only be storing stdout/stderr from the actual exectuable? Not from the shell?

I'm talking about in the process after fork(2). Any std output generated there should already be dup'd onto stdout/err of the job, right? (I could be wrong b/c I haven't looked at actual code)

It is much more useful to send errors from the child process to the stderr of the job, perhaps prefixed with flux-shell than sent back to job-exec module and the broker.log. This will be less of a concern when we send shell log messages to an output eventlog instead of stderr.

This is also how Slurm handles stdout/err after fork(2), e.g. you get:

$ srun foo
slurmstepd: error: execve(): foo: No such file or directory
srun: error: fluke3: task 0: Exited with exit code 2

Instead of just a report of a bad exit code with no context.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2019

Any std output generated there should already be dup'd onto stdout/err of the job, right?

Ah, sorry I wasn't thinking clearly. Since exec() is wrapped in libsubprocess the shell doesn't have a way to emit any errors from the child process.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2019

Actually I see libsubprocess local_child() can emit errors, but uses flux_log_error() on the parent's flux handle. I wonder if that is not appropriate and/or safe.

I'll try to open an issue

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

I'm talking about in the process after fork(2). Any std output generated there should already be dup'd onto stdout/err of the job, right?

I see your point. For the specific case here, an error on the exec() call leads to an error returned to the original caller of flux_exec(). Do you think there should be something printed to stderr as well?

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

after implementing a solution where I emit "output-ready" after a subprocess has begun running, I realized there was a different issue. I originally emit "output-ready" to indicate the "guest.output" directory has been created. But if I emit "output-ready" after a subprocess has begun running, technically, I don't know of the "guest.output" directory has been created yet. it's racy.

So I think PR #2374 is the right fix. I'll re-order some commits so that we can push that PR first.

chu11 added 6 commits Sep 5, 2019
Instead of saving all output data and writing it out when a job completes,
write output to the KVS (or stdout/stderr in standalone mode) as output
data arrives to the leader shell.

In this change, every output that is received by the leader shell is
appended to an output array before it is immediately written to the KVS (or
stdout/sterr in standalone mode).  This output array is not explicitly
needed but has been left in tact for future use in aggregating output.
Add missing test, in which flux-srun is based an invalid executable.
For consistency to other shell code, make type flux_shell_task_io_f
a pointer to a function, instead of a typedef to a function.
Alongside the output header, write the output-ready event to the
exec.eventlog directory.  This event will indicate that the
guest.output directory has been created.
Under racy scenarios, the shell could exit before several futures
completed within the shell output.  Add completion references to
ensure those futures complete within the shell output before
allowing the shell to exit.

As a side effect, the completion references ensure that output commits
will complete before the shell exits.  Therefore, the pending_commits
list is no longer needed.

For consistency, rename the old completion reference "io-leader" to
"output.write".

Fixes #2363
@chu11 chu11 force-pushed the chu11:shell_live_output branch from 4040bba to 8c92505 Sep 20, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2019

re-pushed and rebased on master.

chu11 added 2 commits Sep 6, 2019
Add several new tests to ensure output behavior of job attach
works.  Most notably that output can occur as job is running and
won't occur only after a job finishes.
Adjust attach for fact that output is now live and occurs as the
job is happening.  Watch exec.eventlog to determine when the guest.output
directory is ready for reading.  Once it is ready, watch the guest.output
and output as data is available, and not after the job completes.
@chu11 chu11 force-pushed the chu11:shell_live_output branch from 8c92505 to 2b3a8cd Sep 20, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2019

re-pushed, noticed I had an unused variable

@codecov-io

This comment has been minimized.

Copy link

commented Sep 20, 2019

Codecov Report

Merging #2366 into master will increase coverage by 0.01%.
The diff coverage is 79.21%.

@@            Coverage Diff            @@
##           master   #2366      +/-   ##
=========================================
+ Coverage   80.89%   80.9%   +0.01%     
=========================================
  Files         222     222              
  Lines       34901   34991      +90     
=========================================
+ Hits        28233   28310      +77     
- Misses       6668    6681      +13
Impacted Files Coverage Δ
src/shell/task.c 79.16% <ø> (ø) ⬆️
src/shell/output.c 73.33% <78.94%> (-3.3%) ⬇️
src/cmd/flux-job.c 84.93% <79.51%> (-0.59%) ⬇️
src/broker/modservice.c 70.67% <0%> (-0.76%) ⬇️
src/cmd/flux-module.c 83.96% <0%> (-0.24%) ⬇️
src/cmd/flux-event.c 77.97% <0%> (ø) ⬆️
src/shell/shell.c 80.63% <0%> (+0.45%) ⬆️
src/broker/content-cache.c 75.27% <0%> (+0.66%) ⬆️
... and 2 more
@garlick

This comment has been minimized.

Copy link
Member

commented Sep 21, 2019

OK, this LGTM. Thanks!

@garlick garlick merged commit 1e9f645 into flux-framework:master Sep 21, 2019
3 of 4 checks passed
3 of 4 checks passed
codecov/patch 79.21% of diff hit (target 80.89%)
Details
Summary 1 potential rule
Details
codecov/project 80.9% (+0.01%) compared to e03e2c8
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garlick garlick referenced this pull request Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.