Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
flux-shell: add output handling #2246
This PR adds the most basic output handling to the shell as proposed in #2190.
Task I/O is sent to a service on the leader shell (shell rank 0). Each line of I/O is represented as a JSON object. The lines are accumulated in a JSON array which is committed to the job's kvs guest namespace upon completion.
A reusable I/O forwarding service library has been proposed in #2208. This PRwas worked on in parallel with that, so I will look at converting the quick and dirty "service" in here over to that (this one, for example, doesn't base64-encode output data, and doesn't provide a way to read the data back remotely).
Hmm, I did have one spurious failure in
Sure! Won't get to this later anyway…
On Sat, Jul 20, 2019, 10:16 AM Jim Garlick ***@***.***> wrote: Thanks! I can squash the fixups now if you prefer. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2246?email_source=notifications&email_token=AAFVEUTFYC7OWNRIDNJBUV3QANB6JA5CNFSM4IFA3CAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2NSMWQ#issuecomment-513484378>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFVEURGRCUK5OV4FA6P4Q3QANB6JANCNFSM4IFA3CAA> .
I added some tests to exercise output capture a bit more and began seeing a frequent (but not always) failure of
Also: flux attach doesn't report anything on its stderr when this happens; it just exits with a nozero code. It definitely seems like it should be printing some error in that case, although I'm not sure it should always print something when it gets back a nonzero finish status (flux srun
I'm going to push my test which will likely make travis sad, and probably work on it tomorrow.
Sorry if I introduced this in the last PR!
I've been able to run job-shell under valgrind with a wrapper script, e.g. (at the top-level of my flux-core working copy)
$ cat valgrind-shell.sh #!/bin/sh libtool e valgrind $(dirname $0)/src/shell/flux-shell "$@" $ flux setattr job-exec.job-shell=$(pwd)/valgrind-shell.sh $ flux srun hostname # Because valgrind's output is on shell's stderr, we have to get it from dmesg: $ flux dmesg | sed -n 's/.*STDERR://p' valgrind: /home/grondo/git/f.git/src/flux-shell: No such file or directory ==26092== Memcheck, a memory error detector ==26092== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==26092== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==26092== Command: /home/grondo/git/f.git/src/shell/.libs/flux-shell 1080687591424 ==26092== ==26158== Warning: invalid file descriptor 1024 in syscall close() ==26158== Warning: invalid file descriptor 1025 in syscall close() ==26158== Warning: invalid file descriptor 1026 in syscall close() ==26158== Warning: invalid file descriptor 1027 in syscall close() ==26158== Use --log-fd=<number> to select an alternative log fd. ==26158== Warning: invalid file descriptor 1028 in syscall close() ==26158== Warning: invalid file descriptor 1029 in syscall close() ==26092== ==26092== HEAP SUMMARY: ==26092== in use at exit: 4,640 bytes in 4 blocks ==26092== total heap usage: 1,185 allocs, 1,181 frees, 11,208,571 bytes allocated ==26092== ==26092== LEAK SUMMARY: ==26092== definitely lost: 0 bytes in 0 blocks ==26092== indirectly lost: 0 bytes in 0 blocks ==26092== possibly lost: 512 bytes in 1 blocks ==26092== still reachable: 4,128 bytes in 3 blocks ==26092== suppressed: 0 bytes in 0 blocks ==26092== Rerun with --leak-check=full to see details of leaked memory ==26092== ==26092== For counts of detected and suppressed errors, rerun with: -v ==26092== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
(We'll need the shell to aggregate individual task exit status before we can give this detail though. New issue?)
For our other run interfaces I agree it might not make sense to be verbose about non-zero exit codes by default, though I imagine users find the error messages useful in batch script output.
I/O handling will need shell completion references, so pass flux_shell_t into the shell_io constructor.
Add shell_task_io_at_eof() for testing if a task's stream has reached EOF, to be called after shell_task_io_readline() returns len == 0. Modify shell_task_io_readline() so that it tests for EOF rather than subprocess exited to return final data, which may not be terminated as a line. To avoid signaling EOF twice, don't call task->io_cb() in shell_task_destroy().
Problem: flux_rpc_vpack() is not exported, but it is useful for creating a wrapped RPC function in the shell. Define in the public rpc.h header and drop 'static'.
Add helpers for registering services on the leader shell, and for making RPCs to them from any shell without needing to build topic strings or calculate leader rank. This will be useful for getting some simple distributed services implemented, such as logging the first task exit or aggregating task stdio on the leader shell.
Problem: services of the leader shell cannot be used reliably until leader initialization has completed. To ensure init has completed across all shells, add a synchronous barrier between init and task start. This is a no-op if --standalone or the job consists of only one shell.
Problem: if jobspec and R are provided on command line, and the shell is NOT in standalone mode, the shell sends empty lookup request to job-info. Change shell_init_jobinfo() logic so that the look occurs only if R or jobspec are missing. Update descriptive comment block over function.
Problem: if flux-shell attempts to use loop:// connector in standalone mode, it needs a valid FLUX_CONNECTOR_PATH. Set this manually to point to the source tree build dir in the t2601-job-shell-standalone.t test.
Problem: leader services don't work in standalone mode. It's convenient to be able to send messages to ones self, e.g. so there don't need to be two code paths for stdio aggregation. Change standalone mode to use a valid loop:// broker handle so it is still possible to send messages to self.
Send each line of I/O from tasks to the leader rank, accumulating them as objects in a json array. Once EOF has been received from all tasks, release a completion reference on the leader. When I/O object is destroyed on leader, commit json array to job's guest KVS namespace under "output" key. If running in standalone mode, simply emit output on shell's stdio as before, with labels.
After job is finished, display output. If -l,--label option is provided, prepend the task id to each line.
Update tests to find output data in the kvs via flux job attach, rather than looking for it in the broker log, where it was originally going.
Problem: valgrind job cancel test relies on prototype job-exec behavior of sleeping for duration (from before we had a job shell). Run sleep 60 instead of /bin/true to extend the window for cancellation. To tidy things up, split up the 'job' script: - split out info test to job-info script. - split out cancellation test to job-cancel script. - in job script, attach to jobs to make sure they are finished, don't drain as this makes other scripts unable to submit jobs.
Fix a cut-and-paste error in the job-exec/exec.c kill callback which increased the reference count of a signaled jobinfo object instead of removing the reference.
Bring back t3000-mpi-basic.t from the v0.11 series with the following modifications: - TEST_MPI=t not required to run; just check for mpi/hello prog - use default rc "personality" (not wreck) - use flux srun not flux wreckrun - eliminate OPTS loop since we don't have any options yet - drop "oversubscribe" test Fixes #2192
Problem: flux attach will look for output key in KVS and fail if it doesn't find it, which is incompatible with the flux-exec test mode that doesn't produce output. The t2500-job-attach.t combines flux attach and test mode, and thus will fail. Alter the test to use the real job shell, which was unavailable when the test was written. Drop jq dependency.
Add "ripple test" program like lptest(1) from the lpr package, for testing shell stdio transport. It's so trivial, it's probably better to write one and include it as a test program than add a dependency on the 'lpr' package, complicated by the fact that cups provides 'lpr' too, but not 'lptest'.
Rebased on master after merging @chu11's PR (thanks!). I also squashed the fixups, tweaked a few commit messages, and dropped the TODO test oversubsribe MPI test that it turns out could pass given the right hwloc setup.
I think this is probably ready to consider for merging so I'll drop the WIP prefix.
Restarted centos builder that failed here:
This looks good to me, and I think ready for a merge so I can rebase #2244 on top.
I did try a run with an insane amount of output (600K lines), and got this surprising error in a loop:
which made me wonder if there was a matchtag leak lurking here (or maybe just messages being sent too quickly to process -- I imagine
Since the current code is a stopgap I vote we merge anyway, perhaps opening an issue for the observation above?
@@ Coverage Diff @@ ## master #2246 +/- ## ========================================== - Coverage 80.75% 80.73% -0.02% ========================================== Files 209 210 +1 Lines 33056 33223 +167 ========================================== + Hits 26693 26822 +129 - Misses 6363 6401 +38