Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell: add support for plugins and shell initrc #2376

Merged
merged 25 commits into from Sep 23, 2019

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Sep 15, 2019

This PR is still in somewhat early stages, but is up for early review.

The ability to load shell plugins is added and working, but there may still be some rough edges and tests and error handling are not yet fully updated.

More detail:

This PR updates the existing plugin stack implementation in the shell (plugstack.c) with the proposed plugin interface added to libflux in #2357, along with support for a plugin searchpath and loading of DSO plugins into the stack.

Plugins loaded into the shell have a property of overriding any existing plugin "name", e.g. the builtin PMI implementation can be overridden by loading a new plugin with the name "pmi". This reverses the normal precedence of a search path, so the shell actually traverses the search path in reverse order. (at the time I thought it was counter-intuitive to traverse the search path in order, but now I'm having second thoughts)

Control of which plugins are loaded into the shell and their load time configuration is implemented via a new "initrc.lua" rc script executed by the shell (if found) at startup. I apologize for resurrecting more Lua stuff in flux, but this was really the most efficient way to get this done for now.

The shell initrc script not only supports loading plugins via a plugin.load() function, but also supports

  • modification of the shell plugin searchpath via plugin.searchpath
  • access to the jobspec, R, and shell "info" via shell.info and shell.rankinfo
  • modification of shell options, e.g. shell.verbose = 1
  • modification of job environment via shell.getenv, shell.setenv and shell.unsetenv
  • "sourcing" other initrc files via source and source_if_exists

Finally, because it wasn't too difficult, plugins can be added to the shell right from initrc.lua. This feature can allow implementation of the "mpi" plugins in initrc instead of as .so plugins. As part of support for Lua rc plugins, a task object is available in all task.* callbacks to get task specific info and to implement per-task task.setenv task.unsetenv, etc.

Example default initrc.lua, which loads user initrc if it exists.

-- shell.verbose = 1

-- Load all *.so plugins in searchpath, modify searchpath by writing to
-- plugin.searchpath, e.g: 
-- plugin.searchpath = "/usr/lib64/flux/shell/plugins" .. ":" .. plugin.searchpath

plugin.load { file = "*.so", conf = {} }

-- copy any jobspec shell options to shell.options for convenience
jobspec = shell.info.jobspec
if jobspec.attributes.system.shell then
    shell.options = jobspec.attributes.system.shell.options or {}
end

-- Source all rc files in lua.d/*.lua of shell rcpath:
source (shell.rcpath .. "/lua.d/*.lua")

-- Now source user shell initrc if it exists
source_if_exists "~/.flux/shell/initrc.lua"

Here's an example of an mvapich initrc "plugin":

plugin.register {
    name = "mvapich",
    handlers = {
        { 
	 topic = "shell.init",
         fn = function ()
            local dirname = require 'flux.posix'.dirname
            local f, err = require 'flux'.new ()
            if not f then error (err) end
            local libpmi = f:getattr ('conf.pmi_library_path')
            local ldpath = dirname (libpmi)
            local current = shell.getenv ('LD_LIBRARY_PATH')
            if current ~= nil then
                ldpath = ldpath .. ':' .. current
            end
            shell.setenv ("LD_LIBRARY_PATH", ldpath)
            shell.setenv ("MPIRUN_NTASKS", shell.info.ntasks)
            shell.setenv ("MPIRUN_RSH_LAUNCH", 1)
         end
        },
        {
         topic = "task.init",
         fn = function ()
            local rank = task.info.rank
            task.setenv ("MPIRUN_RANK", rank)
         end
        }
    }
}

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of superficial comments/questions so far.
This looks really well done!

@@ -0,0 +1,17 @@

-- Load all *.so plugins from plugin.serarchpath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "searchpath"

It may be worth noting that this is the default in-tree initrc.lua in the comment at the top, and also in the commit message which says it's an example initrc.lua.

I may be missing it but is there intended to be an installed default initrc.lua also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing it but is there intended to be an installed default initrc.lua also?

Yeah, that is not there yet, sorry. I meant to mention that this PR is up for early review to ensure the general approach is acceptable (before spending time writing tests on something that will have to change)

The example initrc.lua might be a good default, however I started to get a bit nervous about jobs reading from ~/.flux/shell/initrc.lua by default, because a small error/typo in that script would cause a job to fail at runtime. It seemed like perhaps this approach might be setting users up for frustration.

Maybe it would be better if the user initrc was put directly into the jobspec (not exactly sure how that would work), allowing per-job customization instead of one global initrc for all of a user's jobs?

Anyway, I wasn't sure at this point and wanted to field a consensus before proceeding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth noting that this is the default in-tree initrc.lua in the comment at the top, and also in the commit message which says it's an example initrc.lua.

Yeah, src/shell/initrc.lua is the default in-tree shell initrc. I'll update the confusing commit message. I'm actually a bit torn on what should be in the default in-tree initrc. Might not be wise or necessary to read from a user's custom initrc here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fixed the spelling error and updated the commit message to be more explicit.

source (shell.rcpath .. "/lua.d/*.lua")

-- Now source user initrc.lua if it exists
source_if_exists "~/.flux/shell/initrc.lua"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~/.flux/shell/initrc.lua is a cool feature! Maybe this is a more general "user-configurability vs provenance" question, but should we be logging the rc files that a job sources, or maybe the list of plugins that were loaded so that the hotline or whoever has something to go on if the user's rc file or custom plugin stack are causing jobs to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the logging is another piece of this PR that isn't in place yet.

Provenance is another reason I was thinking it would be nice to add the user initrc to the jobspec, though of course this wouldn't be feasible for custom .so plugins, or lua fragments sourced via source "/path/to/*.lua", so there would be some loss of flexibility there.

At the very least logging to the job shell log should happen, I'm not sure about other provenance ideas -- it could use some discussion.

if (shell->verbose)
log_msg ("Loading %s", rcfile);

if ((shell_rc (shell, rcfile) < 0) && (errno != ENOENT || required)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to audit shell_rc(() to ensure errno is not spuriously set on success or set incorrectly on failure, by passing through from another non-fatal error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, though errno should not be checked on success (if I am that is a bug). I'll carefully review shell_rc() to ensure it only return ENOENT on missing file.

Actually, would it be better to just stat() the file before calling shell_rc() to make the missing file case more explicit?

I went back and forth on error handling for missing initrc files during development. It seemed like the rules should be

  • it is not an error if the main initrc doesn't exist, unless --initrc FILE was used.
  • it is not an erorr if source with a glob pattern has no matches, e.g. source *.lua shouldn't fail when there are no matching files.
  • source of a non-pattern should fail if the file doesn't exist. (Thus the unfortunate need to add the function source_if_exists

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, sorry - misspoke on the success case.

Agreed, those rules LGTM.

Actually, would it be better to just stat() the file before calling shell_rc() to make the missing file case more explicit?

Or access(3). No opinion here, either way works IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I switched to access in my working copy. Will add some basic tests before pushing again.

@grondo
Copy link
Contributor Author

grondo commented Sep 16, 2019

Rebased on current master and force-pushed.

@grondo
Copy link
Contributor Author

grondo commented Sep 18, 2019

Just pushed more in-progress work, mostly addressing some comments above and adding some basic initrc tests in t2603-job-shell-initrc.t.

Basic error handling was added to plugstack_call -- if any callback returns an error then plugstack_call() returns failure (though not immediately, only after handling all callbacks in the stack)

Some improvements and changes to the public shell.h "api" were also made along the way to satisfy use cases, etc. Including a way to fetch the entire environment, and fleshing out the flux_shell_task_get_info() function.

I also threw in some example mpi initrc "plugins" in src/shell/lua.d/*.lua.

There's still a bit left to do here unfortunately, including

  • tests for .so plugin loading
  • decision on what initrc.lua should be installed by default, install default lua.d initrc scripts
  • implementation of shell options and propagation of that change to the initrc interface
  • the interface to push a flux handle into lua was removed, we might want to re-enable it to avoid opening a new flux connection from each lua plugin that needs to grab an attribute (or wrap flux_attr_get() in a flux_shell_ call).
  • probably more fixes that will be found during testing.

@grondo
Copy link
Contributor Author

grondo commented Sep 18, 2019

Hm, of course there is an issue with plugins running in the shell in "standalone" mode, since there is no flux instance from which to query the necessary attributes. I don't know how I didn't catch this during development.

I'm actually not sure the best way to proceed. I don't think it will be tractable to have every plugin check for standalone mode. For now I'll just return early from the default in-tree initrc if standalone mode is active.

Standalone mode, or something like it, would be really useful for admins and users to run local tests of their own plugins. However it has its limitations, and I can't see users hand crafting proper R, etc. It also doesn't allow testing of multi-shell runs, etc, so we'll need actual parent instances to test distributed plugins.

I wonder if what we need is a higher level flux shell or flux shell-test command that makes it appear like one or more shells are run directly, but actually invokes a job under flux start -s and allows the user to specify test env parameters on the command line. Then the --standalone option could be hidden and used only for internal testing.

@grondo grondo force-pushed the shell-initrc branch 3 times, most recently from 08b0301 to ba44394 Compare September 19, 2019 16:41
@grondo
Copy link
Contributor Author

grondo commented Sep 19, 2019

Ok, I think I've resolved some LUA_PATH issues when running under vpath builds. (the initrc test scripts needed the explicit path to find fluxometer under $SHARNESS_TEST_DIRECTORY)

All shell "standalone" tests are now run either via flux(1) directly or under a test_under_flux so that all paths are set up correctly.

The default in-tree initrc.lua returns immediately if standalone mode is detected.

This should hopefully get at least the current set of tests running and passing on travis.

@grondo
Copy link
Contributor Author

grondo commented Sep 19, 2019

Rebased on current master. With #2381 merged, t3000-mpi-basic.t now passes on TOSS systems in this branch.

Still working on sharness tests for .so plugin loading. Coming soon.

@grondo
Copy link
Contributor Author

grondo commented Sep 20, 2019

Just pushed a bit more incremental work. Mainly some sanity testing that .so plugins can be loaded from the shell, including a testsuite for standard "bad arguments" checks against the public shell API (plugin api).

A few fixes found during testing were thrown in along the way.

At this point I think the PR might be close to ready, depending on how much more effort we want to throw into testing.

The PR also needs a big rebase and squashing. Let me know when it might be safe to force push the result.

@grondo
Copy link
Contributor Author

grondo commented Sep 20, 2019

Also still need to test and install into default location the included Lua "mpi" plugins, and
install a default shell initrc.lua (and decide what goes in there)

@garlick
Copy link
Member

garlick commented Sep 20, 2019

Excellent! Please rebase + squash when you're ready and I'll make a review pass afterwards.

@grondo
Copy link
Contributor Author

grondo commented Sep 20, 2019

Ok, I've squashed this PR down a bunch in an attempt to remove some of the incremental development.

Also, the default initrc.lua is installed along with the set of lua.d/*.lua "plugins" currently included. The spectrum plugin is not yet tested.

The t3001-mpi-personalities.t test script was resurrected to ensure at least some of the MPI plugins work as designed (We'll see if this passes in travis)

There's been a lot of churn so I apologies if there are still some rough patches in this PR.

It could use another review, though it isn't perfect yet.

@grondo
Copy link
Contributor Author

grondo commented Sep 20, 2019

Rebased on current master.

I also added some tests for corner cases and one fix discovered during that testing. (These will need to be squashed later)

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This represents a big chunk of work that looks really well thought out to me.

I'm still looking through the changes but I see no reason to hold this up, so please add merge-when-passing when you're happy with the squash.

@grondo grondo changed the title shell: add support for loading plugins via initrc.lua shell: support plugins and initrc.lua Sep 20, 2019
@grondo
Copy link
Contributor Author

grondo commented Sep 20, 2019

Ok, let me squash the most recent changes if Travis passes.

I think other fixes and enhancements can be added on incrementally as needed.

@grondo grondo changed the title shell: support plugins and initrc.lua shell: add support for plugins and shell initrc Sep 22, 2019
@grondo
Copy link
Contributor Author

grondo commented Sep 22, 2019

Squashed a few fixup commits and rebased on current master.

I have some other work pending, including reading attributes.system.shell when parsing jobspec (including ability to set path to initrc in attributes.system.shell.initrc), but this could go in as a follow-on PR.

Switch shell plugins to use libflux provided plugin interface.
Add plugstack_load and plugstack_loadall functions to load a DSO
plugin into the shell's plugin stack or load from a search path
respectively.

Allow a builtin searchpath with getter/setter, and an optional
set of plugstack "aux" items which are propagated to all plugins
loaded via the plugin loading functions of plugstack.
Update shell plugstack tests to include plugin loading, including
the use of searchpath, and unit tests for plugstack_plugin_aux_set().
Errors are logged but not returned from plugstack_call().
In lieu of any other solution for throwing errors from plugin callbacks,
just return -1 from plugstack_call() if any plugin callback fails.
The shell will need to dynamically export all flux_shell_*
symbols in order to load plugins. Rename flux_shell_init so
it isn't exported.
Save the local index of tasks in the shell task structure.
Add a flag to track when a task is in the "pre exec" hook. This
is necessary to determine what "info" should be available to
fetch during callbacks issued at this time.
Add checks for clearly bad arguments in shell public functions.
(i.e. the shell api for plugins)
Add some public shell functions to the shell plugin api exported via
shell.h, including:

 * flux_shell_getenv,setenvf,unsetenv() for maniupulating the global
   job environment (before tasks are created).

 * flux_shell_get_environ() to get a copy of the environment

 * flux_shell_get_info() to get basic info about current shell

 * flux_shell_get_rank_info() to get shell rank information

 * flux_shell_task_get_info() to get basic shell task information

 * flux_shell_plugstack_call() to allow plugins to call other plugins

 * enahnce flux_shell_task_getinfo() including:
   - "pid" when available
   - a "state" string, either "Init", "Exec", "Running", or "Exited"
   - exit status for exited tasks, including standardized exitcode
     and terminating signal, if any.
Link flux-shell with -export-dynamic and ensure only flux_shell_*
symbols are exported. This is required for the shell to load
plugins.
flux-shell may want to link against bindings, so be sure those
are built before the shell.
Add static configuration for the default shell plugin search path.
Add intree and installed paths to a default "initrc" for shell
initialization purposes.
Set the default shell plugin search path from flux static config.
Use plugstack_plugin_aux_set() to ensure the shell object is
pushed into all plugins' aux data.

This is necessary for flux_plugin_get_shell() to work.
Add a lua-based rc script implementation used to register plugins
for the shell.

Supports:

 - loading of plugins with plugin.load()
 - adjustment of plugin searchpath via setting of plugin.searchpath
 - inclusion of other initrc files with source() and source_if_exists()
 - registration of lightweight Lua "plugins" via plugin.register()
 - shell.info, shell.rankinfo, shell.get_rankinfo()
 - shell.setenv,getenv,unsetenv
 - task.info task.setenv,getenv,unsetenv in task.* callbacks
Load initrc in shell by default. Allow default system initrc to
be overridden by --initrc=FILE option. It is not an error if
initrc.lua does not exist unless an explicit file is specified
on the command line.
Add a default initrc.lua for use in-tree.

Loads all plugins from src/shell/plugins/*.so and any Lua scripts
from src/shell/lua.d/*.lua. Additionally copies jobspec
attributes.system.shell.options table.

Skips loading of user initrc.lua.
Add some lua scripts that port mpi plugins from wrexec to
initrc style "plugins".
Problem: the shell only exits with non-zero exit status if any
task returns exit code > 0. However, there may be other cases
where we want the shell to "fail", but all tasks were successful.

Allow shell->rc to be preset by a shell component to non-zero,
indicating failure. If any task exited with a higher exit code,
then that will still be reflected in the shell's exit, but
initializing shell->rc to 1 ensures failure.

Use this capability to ensure the shell fails if any shell.exit
plugin returns < 0.
Install shell default system initrc.lua and some lua.d initrc
scripts.
Problem: A standalone run from build directory needs more than
just the connector path set. Now that Lua plugins are supported,
Lua paths must be correct, and in the future it may be something
else.

For now, ensure correct environment by running the standalone
shell under flux(1) cmddriver.
Add a basic set of tests for the shell Lua initrc implementation,
including loading of Lua and .so style plugins.
Add a set of tests that ensure the shell mpi plugins are working
as designed.

Only minimal testing is enabled at this stage.
@codecov-io
Copy link

Codecov Report

Merging #2376 into master will increase coverage by 0.16%.
The diff coverage is 88.35%.

@@            Coverage Diff             @@
##           master    #2376      +/-   ##
==========================================
+ Coverage   80.91%   81.07%   +0.16%     
==========================================
  Files         222      223       +1     
  Lines       34994    35560     +566     
==========================================
+ Hits        28314    28832     +518     
- Misses       6680     6728      +48
Impacted Files Coverage Δ
src/common/libflux/conf.c 100% <ø> (ø) ⬆️
src/shell/pmi.c 76.76% <100%> (+0.11%) ⬆️
src/shell/output.c 73.58% <100%> (+0.25%) ⬆️
src/shell/kill.c 100% <100%> (ø) ⬆️
src/shell/builtins.c 95.23% <100%> (+0.5%) ⬆️
src/shell/signals.c 93.75% <100%> (+0.41%) ⬆️
src/shell/task.c 81.94% <85.71%> (+2.77%) ⬆️
src/shell/rc.c 87.46% <87.46%> (ø)
src/shell/plugstack.c 89.36% <88.49%> (+9.57%) ⬆️
src/shell/shell.c 84.43% <88.8%> (+3.8%) ⬆️
... and 10 more

@garlick
Copy link
Member

garlick commented Sep 23, 2019

Nice work! Merging.

@garlick garlick merged commit a1a1ac8 into flux-framework:master Sep 23, 2019
@garlick garlick mentioned this pull request Sep 30, 2019
@grondo grondo deleted the shell-initrc branch January 28, 2020 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants