Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@jsquyres
Copy link
Member

It is verbotten to use opal_output() after the fork() but before the
exec()! It results in all manner of undefined behavior. For example,
on some OS X systems, if you run a trivial "hello world" MPI program
with a high level of ODLS verbosity:

$ mpirun -np 3 --mca odls_base_verbose 100 ./hello_c

You will see a bunch of output from the mpirun ODLS base, but then it
may hang in odls_default_module.c:do_child() -- after the fork() but
before the exec() -- while trying to opal_output() some debugging
statements.

The solution is to remove these extraneous opal_output() statements.
Indeed, the ODLS base is already outputting the same information that
these opal_output() statements are trying to emit, anyway.

Signed-off-by: Jeff Squyres jsquyres@cisco.com

(cherry picked from commit open-mpi/ompi@dd9a819)

@hppritcha This is safe for v2.0.1 because a) it only happens on some systems, and b) it's fairly esoteric to run with odls_base_verbose >10 (which is what causes the problem).

@rhc54 Please review for v2.0.1.

It is verbotten to use opal_output() after the fork() but before the
exec()!  It results in all manner of undefined behavior.  For example,
on some OS X systems, if you run a trivial "hello world" MPI program
with a high level of ODLS verbosity:

```sh
$ mpirun -np 3 --mca odls_base_verbose 100 ./hello_c
```

You will see a bunch of output from the mpirun ODLS base, but then it
*may* hang in odls_default_module.c:do_child() -- after the fork() but
before the exec() -- while trying to opal_output() some debugging
statements.

The solution is to remove these extraneous opal_output() statements.
Indeed, the ODLS base is already outputting the same information that
these opal_output() statements are trying to emit, anyway.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit open-mpi/ompi@dd9a819)
@jsquyres jsquyres added the bug label May 25, 2016
@jsquyres jsquyres added this to the v2.0.1 milestone May 25, 2016
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1706/ for details.

@rhc54
Copy link

rhc54 commented May 25, 2016

👍

@hppritcha hppritcha merged commit 7049f28 into open-mpi:v2.x May 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants