Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move MPI/PMI environment variables to wreck plugin(s) #669

Closed
garlick opened this Issue May 9, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@garlick
Copy link
Member

garlick commented May 9, 2016

wrexecd sets the following environment variables for subprocesses:

Flux-specific per-job

  • FLUX_JOB_ID -numerical job id
  • FLUX_JOB_NNODES - number of nodes in job
  • FLUX_NODE_ID - rank of local broker running task
  • FLUX_JOB_SIZE - number of global ranks in job
  • FLUX_LOCAL_RANKS - the set of global ranks on node (comma-separated)
  • FLUX_URI - URI for local broker connection
  • FLUX_TASK_RANK - global rank (zero origin for program)
  • FLUX_TASK_LOCAL_ID - local rank (zero origin on each node)

Synonyms for FLUX_JOB_SIZE:

  • MPIRUN_NPROCS
  • PMI_SIZE

Synonyms for FLUX_TASK_RANK:

  • MPIRUN_RANK
  • PMI_RANK

In addition, the broker clears the following for the initial program (run level 2):

  • PMI_FD
  • PMI_RANK
  • PMI_SIZE
    and sets
  • FLUX_URI
  • I_MPI_PMI_LIBRARY
    and prepends the dir containing Flux libpmi.so to
  • LD_LIBRARY_PATH

This is getting a bit confusing. Would it make sense to move the MPI specific environment varaiables to wreck plugins for the specific MPI?

In particular I feel that setting I_MPI_PMI_LIBRARY and LD_LIBRARY_PATH for the initial program just to have those settings inherited by programs run by the initial program seems wrong, since the initial program cannot be (directly) a parallel program (e.g. it cannot be flux or an MPI program - one has to call wreckrun from the initial program to accomplish that).

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 9, 2016

That sounds good to me!

Just some questions to throw out there:
Do you think we should have a list of plugins that are all loaded by default? What do we do with conflicting environment variables for instance?

For now we could just write a small extension to flux-wreckrun that runs through lud.d/* plugins and could set environment variables in a "frontend" context or something as a short term stopgap. That might help us learn what kind of interface we need in the end for these plugins.

Note that it would also be good if we eventually allow plugins to add cmdline options to the launcher, somewhat like the slurm spankplugins do, however, a big challenge with that framework is actually getting the option arguments to the remote side. I think we could do a lot better in Flux using the kvs to store plugin "state" (in fact for lua plugins we could use the kvs as a "global variable" store making writing of distributed plugins quite easy, or maybe even more difficult? I'm not sure)

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 9, 2016

For now we could just write a small extension to flux-wreckrun that runs through lud.d/* plugins and could set environment variables in a "frontend" context or something as a short term stopgap. That might help us learn what kind of interface we need in the end for these plugins.

Yes I think that would be a good start!

Note that it would also be good if we eventually allow plugins to add cmdline options to the launcher, somewhat like the slurm spankplugins do

Unless I'm missing something that seems really straightforward! Not a serious proposal but could this be as simple as

  1. add flux wreckrun --setopt PLUGNAME:NAME=VALUE option
  2. before executing plugins, kvs_put (lwj,n.option.PLUGNAME.NAME, VALUE)
  3. provide readonly access to lwj.n.option.PLUGNAME namespace to plugin in each context?

There's the eventual consistency of the KVS to think about but presumably we've already solved that in wreck by triggering on the lwj creation and doing it atomically? EC may prove more challenging in other plugin state communication situations.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 9, 2016

I was thinking that plugins could register new command-line options with the wreckrun frontend by declaring a table in the plugin souce, and could check to see if the option was used with a wreck:getopt ("option") or similar from any context. Behind the scenes it could work as you say, and that might mot be a bad way to start out.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented May 9, 2016

There's the eventual consistency of the KVS to think about but presumably we've already solved that in wreck by triggering on the lwj creation and doing it atomically? EC may prove more challenging in other plugin state communication situations.

Good point, but for the simple plugin case at least the "options" set for the various plugin would follow the same procedure as existing lwj as you say (like cmdline, env, cwd, etc.), so I don't think there are any consistency issues here.

I probably spoke too soon about writing generic distributed service using global Lua variables tied to kvs values, that isn't really a good or sane use case.

@garlick garlick referenced this issue May 9, 2016

Merged

environment cleanup #671

0 of 2 tasks complete
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented May 17, 2016

Closing after merge of #671. If we need an issue on option registration, etc we can open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.