Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMI compatibility #665

Closed
trws opened this issue May 3, 2016 · 3 comments
Closed

PMI compatibility #665

trws opened this issue May 3, 2016 · 3 comments

Comments

@trws
Copy link
Member

trws commented May 3, 2016

The TLDR, we should do each of the following:

  1. set the MV2_USE_MPIRUN_MAPPING environment variable to 0, stopgap to get mvapich2 working with flux today
  2. set the PMI_VERSION and PMI_SUBVERSION variables to 1 and 0 respectively until we support something higher
  3. Set the PMI_process_mapping pmi kvs value to a valid string, then set the PMI_SUBVERSION variable to 1 to advertise PMI 1.1 support

Come to find out, version 2.1 of MVAPICH2 has some faulty logic determining how to handle its initial wireup. The core of it is that it assumes that if it's getting PMI, it's either SLURM's PMI or hydra's, so using 1.1 features should be safe as long as it's not slurm, and if they aren't then it should be using PMGR or some other mvapich specific launching stuff. Anyway, none of those things are true for flux, so we'll need to chip away at some of these assumptions. Explicitly telling it we're not using the MPIRUN_MAPPING interface gets us to par, but only because we have PMGR-like environment variables set by wreck, so at least IMO we should try and get the PMI_process_mapping value together. The extra trick is that the aforementioned value has an undocumented format, so the only places to really find it are in hydra and in MPID.

hydra code to build such a string: https://github.com/adk9/hydra/blob/f0ce4451f04d26c55e2f59f12b59d222da838a2c/pm/pmiserv/pmiserv_utils.c#L114
MPID code to interpret it: http://fossies.org/dox/mvapich2-2.2rc1/ch3_2src_2mpid__vc_8c_source.html#l00983

Format description from the latter:

  994     /* parse string of the form:
  995        '(' <format> ',' '(' <num> ',' <num> ',' <num> ')' {',' '(' <num> ',' <num> ',' <num> ')'} ')'
  996 
  997        the values of each 3-tuple have the following meaning (X,Y,Z):
  998          X - node id start value
  999          Y - number of nodes with size Z
 1000          Z - number of processes assigned to each node
 1001      */

With this set, all of the versions I've tested work, and use it to do so. None of this does anything for OpenMPI, as far as I can tell, but I'm not sure what does. That still doesn't work in either opt or dotkit versions, will have to dig into that further to see why.

@garlick
Copy link
Member

garlick commented Dec 29, 2016

[triage] Responding to the TLDR:

  1. we're not setting MV2_USE_MPIRUN_MAPPING
  2. we're not setting PMI_VERSION or PMI_SUBVERSION in the environment (those values are part of the PMI wire protocol handshake - is that what was meant?)
  3. we are setting PMI_process_mapping in the kvs as documented in RFC 13

For next steps we should probably retry launching an mvapich program on the RHEL 7 based systems and see if any of the above are needed.

Any mvapich specific environment should be set up in src/modules/wreck/lua.d/mvapich.lua

@trws
Copy link
Member Author

trws commented Dec 29, 2016

I think with TOSS3's mvapich these are no longer necessary, it was an issue with a specific version of MVAPICH looking for slurm's PMI specifically and not finding it, and not trying the simple PMI because of how it was built.

@garlick
Copy link
Member

garlick commented Dec 29, 2016

OK, then let's close this and if new problems arise, open bugs for those specifically.

@garlick garlick closed this as completed Dec 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants