Skip to content

Conversation

@artpol84
Copy link
Contributor

The fact that application proc called Abort (read failed) doesn't
mean that ORTE subsystem has failed - vice versa it does it's work
to gracefuly exit the whole application.

orted exiting with non-zero status creates a problem for at least
plm/slurm environments where orteds are launched via srun with
"--kill-on-bad-exit" flag. If one of orteds has exited with non-
zero status slurm will immediately kill all other orteds. As the
result we see a lot of leftover in the /tmp directory.

(ported from 4af7a08)

Signed-off-by: Artem Polyakov artpol84@gmail.com

The fact that application proc called Abort (read failed) doesn't
mean that ORTE subsystem has failed - vice versa it does it's work
to gracefuly exit the whole application.

orted exiting with non-zero status creates a problem for at least
plm/slurm environments where orteds are launched via `srun` with
"--kill-on-bad-exit" flag. If one of orteds has exited with non-
zero status slurm will immediately kill all other orteds. As the
result we see a lot of leftover in the `/tmp` directory.

(ported from 4af7a08)

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
@artpol84 artpol84 added the bug label Apr 14, 2017
@artpol84 artpol84 added this to the v2.1.1 milestone Apr 14, 2017
@artpol84 artpol84 requested a review from rhc54 April 14, 2017 21:08
@hppritcha
Copy link
Member

bot:lanl:retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants