Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
job shell: add flux_shell_t and completion references #2240
This PR addresses #2237.
I apologize in advance because this will be a difficult PR to review. I attempt to make a minor advance by adding a
As noted above, this branch adds a new
Finally, to address #2237, a concept from
Problem: As the job shell grows in complexity, functions and callbacks may need to access more than one of the shell internal datatatypes at a time, potentially including flux handle, reactor, shell job information etc. Currently there is no easy or common way for internal code to accomplish this without use of custom data types, adding globals, et.c To reduce this future effort and present a more common methodology, add a `struct flux_shell` data type as a container for other shell data types, and refactor the shell to pass this object to submodules, simplifying their implementation a bit. As part of this effort, fetching of jobspec and R is moved wholly into the info.c module, whether fetched from the command line or KVS. Command line options, such as jobid and the standalone and verbose flags, is moved up into the `struct flux_shell`.
Problem: Once the flux-shell starts adding reactor watchers other than those of libsubprocess, e.g. message handlers, signalfd watchers, etc., then exiting from the internal reactor when all tasks have completed will not be as straightforward. Instead of adding a single `is_complete()` function that needs to check for all tasks exited, all IO complete, etc, set up a system where shell components can take a "completion reference" on the shell. Once all existing completion refs are released, then the shell's reactor will be stopped manually with flux_reactor_stop(). The completion references are named to aid in future debugging. If a shell appears to be stuck, keys from the internal hash could be dumped to determine which components haven't completed.
Use flux_shell_add_completion_ref() to take a reference on the shell for each task executed. Drop the reference in the completion callback so that the shell's reactor is manually stopped after the final task exits. This doesn't change any functionality of the shell, but ensures the shell will still exit the reactor when new reactor references are attached to the shell in future commits (e.g. message watchers).
@@ Coverage Diff @@ ## master #2240 +/- ## ========================================== + Coverage 80.74% 80.77% +0.02% ========================================== Files 209 209 Lines 32988 33037 +49 ========================================== + Hits 26637 26685 +48 - Misses 6351 6352 +1