New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell: add support for plugins and shell initrc #2376
Conversation
4a1ae89
to
b554edf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of superficial comments/questions so far.
This looks really well done!
src/shell/initrc.lua
Outdated
@@ -0,0 +1,17 @@ | |||
|
|||
-- Load all *.so plugins from plugin.serarchpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling: "searchpath"
It may be worth noting that this is the default in-tree initrc.lua in the comment at the top, and also in the commit message which says it's an example initrc.lua.
I may be missing it but is there intended to be an installed default initrc.lua also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing it but is there intended to be an installed default initrc.lua also?
Yeah, that is not there yet, sorry. I meant to mention that this PR is up for early review to ensure the general approach is acceptable (before spending time writing tests on something that will have to change)
The example initrc.lua
might be a good default, however I started to get a bit nervous about jobs reading from ~/.flux/shell/initrc.lua
by default, because a small error/typo in that script would cause a job to fail at runtime. It seemed like perhaps this approach might be setting users up for frustration.
Maybe it would be better if the user initrc was put directly into the jobspec (not exactly sure how that would work), allowing per-job customization instead of one global initrc for all of a user's jobs?
Anyway, I wasn't sure at this point and wanted to field a consensus before proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth noting that this is the default in-tree initrc.lua in the comment at the top, and also in the commit message which says it's an example initrc.lua.
Yeah, src/shell/initrc.lua
is the default in-tree shell initrc. I'll update the confusing commit message. I'm actually a bit torn on what should be in the default in-tree initrc. Might not be wise or necessary to read from a user's custom initrc here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok fixed the spelling error and updated the commit message to be more explicit.
src/shell/initrc.lua
Outdated
source (shell.rcpath .. "/lua.d/*.lua") | ||
|
||
-- Now source user initrc.lua if it exists | ||
source_if_exists "~/.flux/shell/initrc.lua" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~/.flux/shell/initrc.lua
is a cool feature! Maybe this is a more general "user-configurability vs provenance" question, but should we be logging the rc files that a job sources, or maybe the list of plugins that were loaded so that the hotline or whoever has something to go on if the user's rc file or custom plugin stack are causing jobs to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the logging is another piece of this PR that isn't in place yet.
Provenance is another reason I was thinking it would be nice to add the user initrc to the jobspec, though of course this wouldn't be feasible for custom .so
plugins, or lua fragments sourced via source "/path/to/*.lua"
, so there would be some loss of flexibility there.
At the very least logging to the job shell log should happen, I'm not sure about other provenance ideas -- it could use some discussion.
src/shell/shell.c
Outdated
if (shell->verbose) | ||
log_msg ("Loading %s", rcfile); | ||
|
||
if ((shell_rc (shell, rcfile) < 0) && (errno != ENOENT || required)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to audit shell_rc(()
to ensure errno is not spuriously set on success or set incorrectly on failure, by passing through from another non-fatal error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, though errno
should not be checked on success (if I am that is a bug). I'll carefully review shell_rc()
to ensure it only return ENOENT
on missing file.
Actually, would it be better to just stat()
the file before calling shell_rc()
to make the missing file case more explicit?
I went back and forth on error handling for missing initrc files during development. It seemed like the rules should be
- it is not an error if the main initrc doesn't exist, unless
--initrc FILE
was used. - it is not an erorr if
source
with a glob pattern has no matches, e.g.source *.lua
shouldn't fail when there are no matching files. source
of a non-pattern should fail if the file doesn't exist. (Thus the unfortunate need to add the functionsource_if_exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, sorry - misspoke on the success case.
Agreed, those rules LGTM.
Actually, would it be better to just
stat()
the file before callingshell_rc()
to make the missing file case more explicit?
Or access(3)
. No opinion here, either way works IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I switched to access
in my working copy. Will add some basic tests before pushing again.
b554edf
to
f8035b1
Compare
Rebased on current master and force-pushed. |
Just pushed more in-progress work, mostly addressing some comments above and adding some basic initrc tests in Basic error handling was added to Some improvements and changes to the public I also threw in some example mpi initrc "plugins" in There's still a bit left to do here unfortunately, including
|
281a513
to
7abfa1d
Compare
Hm, of course there is an issue with plugins running in the shell in "standalone" mode, since there is no flux instance from which to query the necessary attributes. I don't know how I didn't catch this during development. I'm actually not sure the best way to proceed. I don't think it will be tractable to have every plugin check for standalone mode. For now I'll just return early from the default in-tree initrc if standalone mode is active. Standalone mode, or something like it, would be really useful for admins and users to run local tests of their own plugins. However it has its limitations, and I can't see users hand crafting proper R, etc. It also doesn't allow testing of multi-shell runs, etc, so we'll need actual parent instances to test distributed plugins. I wonder if what we need is a higher level |
08b0301
to
ba44394
Compare
Ok, I think I've resolved some LUA_PATH issues when running under vpath builds. (the initrc test scripts needed the explicit path to find All shell "standalone" tests are now run either via The default in-tree This should hopefully get at least the current set of tests running and passing on travis. |
ba44394
to
c6af38d
Compare
Rebased on current master. With #2381 merged, Still working on sharness tests for |
Just pushed a bit more incremental work. Mainly some sanity testing that .so plugins can be loaded from the shell, including a testsuite for standard "bad arguments" checks against the public shell API (plugin api). A few fixes found during testing were thrown in along the way. At this point I think the PR might be close to ready, depending on how much more effort we want to throw into testing. The PR also needs a big rebase and squashing. Let me know when it might be safe to force push the result. |
Also still need to test and install into default location the included Lua "mpi" plugins, and |
Excellent! Please rebase + squash when you're ready and I'll make a review pass afterwards. |
aaf7262
to
3a35f7c
Compare
Ok, I've squashed this PR down a bunch in an attempt to remove some of the incremental development. Also, the default The There's been a lot of churn so I apologies if there are still some rough patches in this PR. It could use another review, though it isn't perfect yet. |
64aae07
to
11b6b50
Compare
Rebased on current master. I also added some tests for corner cases and one fix discovered during that testing. (These will need to be squashed later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This represents a big chunk of work that looks really well thought out to me.
I'm still looking through the changes but I see no reason to hold this up, so please add merge-when-passing when you're happy with the squash.
Ok, let me squash the most recent changes if Travis passes. I think other fixes and enhancements can be added on incrementally as needed. |
11b6b50
to
c0b569e
Compare
Squashed a few fixup commits and rebased on current master. I have some other work pending, including reading |
c0b569e
to
56efe9e
Compare
Switch shell plugins to use libflux provided plugin interface.
Add plugstack_load and plugstack_loadall functions to load a DSO plugin into the shell's plugin stack or load from a search path respectively. Allow a builtin searchpath with getter/setter, and an optional set of plugstack "aux" items which are propagated to all plugins loaded via the plugin loading functions of plugstack.
Update shell plugstack tests to include plugin loading, including the use of searchpath, and unit tests for plugstack_plugin_aux_set().
Errors are logged but not returned from plugstack_call(). In lieu of any other solution for throwing errors from plugin callbacks, just return -1 from plugstack_call() if any plugin callback fails.
The shell will need to dynamically export all flux_shell_* symbols in order to load plugins. Rename flux_shell_init so it isn't exported.
Save the local index of tasks in the shell task structure.
Add a flag to track when a task is in the "pre exec" hook. This is necessary to determine what "info" should be available to fetch during callbacks issued at this time.
Add checks for clearly bad arguments in shell public functions. (i.e. the shell api for plugins)
Add some public shell functions to the shell plugin api exported via shell.h, including: * flux_shell_getenv,setenvf,unsetenv() for maniupulating the global job environment (before tasks are created). * flux_shell_get_environ() to get a copy of the environment * flux_shell_get_info() to get basic info about current shell * flux_shell_get_rank_info() to get shell rank information * flux_shell_task_get_info() to get basic shell task information * flux_shell_plugstack_call() to allow plugins to call other plugins * enahnce flux_shell_task_getinfo() including: - "pid" when available - a "state" string, either "Init", "Exec", "Running", or "Exited" - exit status for exited tasks, including standardized exitcode and terminating signal, if any.
Link flux-shell with -export-dynamic and ensure only flux_shell_* symbols are exported. This is required for the shell to load plugins.
flux-shell may want to link against bindings, so be sure those are built before the shell.
Add static configuration for the default shell plugin search path.
Add intree and installed paths to a default "initrc" for shell initialization purposes.
Set the default shell plugin search path from flux static config.
Use plugstack_plugin_aux_set() to ensure the shell object is pushed into all plugins' aux data. This is necessary for flux_plugin_get_shell() to work.
Add a lua-based rc script implementation used to register plugins for the shell. Supports: - loading of plugins with plugin.load() - adjustment of plugin searchpath via setting of plugin.searchpath - inclusion of other initrc files with source() and source_if_exists() - registration of lightweight Lua "plugins" via plugin.register() - shell.info, shell.rankinfo, shell.get_rankinfo() - shell.setenv,getenv,unsetenv - task.info task.setenv,getenv,unsetenv in task.* callbacks
Load initrc in shell by default. Allow default system initrc to be overridden by --initrc=FILE option. It is not an error if initrc.lua does not exist unless an explicit file is specified on the command line.
Add a default initrc.lua for use in-tree. Loads all plugins from src/shell/plugins/*.so and any Lua scripts from src/shell/lua.d/*.lua. Additionally copies jobspec attributes.system.shell.options table. Skips loading of user initrc.lua.
Add some lua scripts that port mpi plugins from wrexec to initrc style "plugins".
Problem: the shell only exits with non-zero exit status if any task returns exit code > 0. However, there may be other cases where we want the shell to "fail", but all tasks were successful. Allow shell->rc to be preset by a shell component to non-zero, indicating failure. If any task exited with a higher exit code, then that will still be reflected in the shell's exit, but initializing shell->rc to 1 ensures failure. Use this capability to ensure the shell fails if any shell.exit plugin returns < 0.
Install shell default system initrc.lua and some lua.d initrc scripts.
Problem: A standalone run from build directory needs more than just the connector path set. Now that Lua plugins are supported, Lua paths must be correct, and in the future it may be something else. For now, ensure correct environment by running the standalone shell under flux(1) cmddriver.
Add a basic set of tests for the shell Lua initrc implementation, including loading of Lua and .so style plugins.
Add a set of tests that ensure the shell mpi plugins are working as designed. Only minimal testing is enabled at this stage.
56efe9e
to
08ff6aa
Compare
Codecov Report
@@ Coverage Diff @@
## master #2376 +/- ##
==========================================
+ Coverage 80.91% 81.07% +0.16%
==========================================
Files 222 223 +1
Lines 34994 35560 +566
==========================================
+ Hits 28314 28832 +518
- Misses 6680 6728 +48
|
Nice work! Merging. |
This PR is still in somewhat early stages, but is up for early review.
The ability to load shell plugins is added and working, but there may still be some rough edges and tests and error handling are not yet fully updated.
More detail:
This PR updates the existing plugin stack implementation in the shell (
plugstack.c
) with the proposed plugin interface added to libflux in #2357, along with support for a plugin searchpath and loading of DSO plugins into the stack.Plugins loaded into the shell have a property of overriding any existing plugin "name", e.g. the builtin PMI implementation can be overridden by loading a new plugin with the name "pmi". This reverses the normal precedence of a search path, so the shell actually traverses the search path in reverse order. (at the time I thought it was counter-intuitive to traverse the search path in order, but now I'm having second thoughts)
Control of which plugins are loaded into the shell and their load time configuration is implemented via a new "initrc.lua" rc script executed by the shell (if found) at startup. I apologize for resurrecting more Lua stuff in flux, but this was really the most efficient way to get this done for now.
The shell initrc script not only supports loading plugins via a
plugin.load()
function, but also supportsplugin.searchpath
shell.info
andshell.rankinfo
shell.verbose = 1
shell.getenv
,shell.setenv
andshell.unsetenv
source
andsource_if_exists
Finally, because it wasn't too difficult, plugins can be added to the shell right from
initrc.lua
. This feature can allow implementation of the "mpi" plugins in initrc instead of as.so
plugins. As part of support for Lua rc plugins, atask
object is available in alltask.*
callbacks to get task specific info and to implement per-tasktask.setenv
task.unsetenv
, etc.Example default
initrc.lua
, which loads user initrc if it exists.Here's an example of an mvapich initrc "plugin":