Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@rhc54
Copy link

@rhc54 rhc54 commented Aug 31, 2016

It is possible that one or more procs could get thru PMIx_Init, and thus be marked as in state "registered", before all local procs have been started. If that happens, then we would report some of the procs in state "running", and the others in state "registered" - which means that the HNP would miss the "running" stage of the state machine.

Thanks to Jingchao Zhang for his patience in tracking this down

…that one or more procs could get thru PMIx_Init, and thus be marked as in state "registered", before _all_ local procs have been started. If that happens, then we would report some of the procs in state "running", and the others in state "registered" - which means that the HNP would miss the "running" stage of the state machine.

Thanks to Jingchao Zhang for his patience in tracking this down
@rhc54 rhc54 added this to the v2.0.1 milestone Aug 31, 2016
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/2138/ for details.

@rhc54
Copy link
Author

rhc54 commented Aug 31, 2016

Tested and reported as "good" by initial reporting user on mailing list

👍

@jsquyres
Copy link
Member

@hppritcha I'm good with this for v2.0.1:

  1. It fixes a critical race condition
  2. It was reported by a user on the mailing list (i.e., it's a real world problem)
  3. The same user tested the patch and found that it works

@hppritcha
Copy link
Member

Is this a one off fix for 2x or is there a corresponding commit on master?

@jsquyres
Copy link
Member

Looks like this is the corresponding commit from master: open-mpi/ompi@9b991bd

@hppritcha hppritcha merged commit fb71b10 into open-mpi:v2.x Aug 31, 2016
@ibm-ompi
Copy link

Build Failed with GNU compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/ibm-ompi/514bb6fe123a40de30f56e64a600d85b

@ibm-ompi
Copy link

Build Failed with XL compiler! Please review the log, and get in touch if you have questions.

Gist: https://gist.github.com/ibm-ompi/7f110ab6c94da788f0fed01d83c12215

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants