Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that we always send a SIGTERM prior to SIGKILL to give child processes a chance to cleanup #2528

Merged
merged 1 commit into from Dec 7, 2016

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Dec 6, 2016

Thanks to Noel Rycroft for the report

Signed-off-by: Ralph Castain rhc@open-mpi.org

…rocesses a chance to cleanup

Thanks to Noel Rycroft for the report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
@rhc54
Copy link
Contributor Author

rhc54 commented Dec 6, 2016

Replaces #2498

@rhc54
Copy link
Contributor Author

rhc54 commented Dec 6, 2016

@noelrycroft Had to replace the earlier PR due to a repo issue.

@noelrycroft
Copy link

@rhc54, @jsquyres sent me another tarball.. this branch looks good to me. I can verify that the SIGTERM is being propagated to the MPI tasks as expected. Thanks for fixing this issue so quickly..

./sleeptest.sh
[24018] : Installing signal handler for signal : INT
[24018] : Installing signal handler for signal : TERM
[24018] : Installing signal handler for signal : 0
[24018] : Parallel Sleep Test MPI
[24018] : /users/xxx/data/RUN/377/openmpi/openmpiinstall/gitclone2/bin/mpirun --prefix /users/xxx/data/RUN/377/openmpi/openmpiinstall/gitclone2 -np 2 sleep.sh 30 &
[24024] : Installing signal handler for signal : INT
[24024] : Installing signal handler for signal : TERM
[24024] : Installing signal handler for signal : 0
[24024] : ./sleep.openmpi 30
[24025] : Installing signal handler for signal : INT
[24025] : Installing signal handler for signal : TERM
[24025] : Installing signal handler for signal : 0
[24025] : ./sleep.openmpi 30
Hello world! I'm 0 out of 2 on dougie.lebanon.xxx
My rank = 0 PID = 24026
0 sleeping for 1 seconds
Hello world! I'm 1 out of 2 on dougie.lebanon.xxx
My rank = 1 PID = 24027
1 sleeping for 1 seconds
2 sleeping for 1 seconds
3 sleeping for 1 seconds
4 sleeping for 1 seconds
5 sleeping for 1 seconds
6 sleeping for 1 seconds
7 sleeping for 1 seconds
8 sleeping for 1 seconds
9 sleeping for 1 seconds
10 sleeping for 1 seconds
11 sleeping for 1 seconds
12 sleeping for 1 seconds
# top level sleeptest.sh receives SIGTERM
[24018] : sleeptest.sh Terminated
[24018] : Killing child : 24019
# explicitly propagates SIGTERM to mpirun
[24018] : kill -TERM 24019
13 sleeping for 1 seconds
# 1st MPI task receives SIGTERM and propagates to child
[24024] : sleep.sh Terminated
[24024] : Killing child : 24026
[24024] : kill -TERM 24026
[24024] : Normal Exit : 143
# 2nd MPI task receives SIGTERM and propagates to child
[24025] : sleep.sh Terminated
[24025] : Killing child : 24027
[24025] : kill -TERM 24027
[24025] : Normal Exit : 143
# top level sleeptest.sh gets exit code from mpirun and exits..
[24018] : Normal Exit : 1

@jsquyres
Copy link
Member

jsquyres commented Dec 7, 2016

@noelrycroft Thanks for testing so quickly!

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed by testing / functionality.

@jsquyres
Copy link
Member

jsquyres commented Dec 7, 2016

@hppritcha and I discussed this the other day; I'm confident that he's ok with me merging it. I'd like to get this merged so that it can hit MTT.

@jsquyres
Copy link
Member

jsquyres commented Dec 7, 2016

@hppritcha It looks like LANL Jenkins is wedged.

@rhc54 Is there a master version of this commit?

@rhc54
Copy link
Contributor Author

rhc54 commented Dec 7, 2016

85a6349

@hppritcha
Copy link
Member

bot:lanl:retest

@hppritcha
Copy link
Member

@jsquyres is this ready to merge?

@jsquyres jsquyres merged commit d0b97d7 into open-mpi:v2.0.x Dec 7, 2016
@rhc54 rhc54 deleted the cmr20x/signals branch February 13, 2017 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants