Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sys:get_status/1 noproc crash during release installation #822

Closed
wants to merge 1 commit into from

Conversation

RJ
Copy link
Contributor

@RJ RJ commented Sep 1, 2015

When release_handler_1:get_supervised_procs/0 does a recursive walk of
the supervision tree, it calls sys:get_status/1 on supervisors, to check
if they are suspended or running.

This fixes a race condition where a list of supervisor pids is gathered,
one of them (legitimately) exits before release_handler can examine it,
then sys:get_status/1 is called with a dead pid, causing an exit(noproc)

See: http://erlang.org/pipermail/erlang-questions/2015-August/085712.html

(To recreate this problem for testing, I added a timer:sleep into the
release_handler_1 code, and killed a supervisor during
get_supervised_procs being called).

@RJ
Copy link
Contributor Author

RJ commented Sep 16, 2015

ftr, i deployed this change and have successfully done upgrades with it.

am yet to hit the condition where it would have crashed, during normal upgrades.

@OTP-Maintainer
Copy link

The summary line of the commit message is too long and/or ends with a "."
Make sure the whole message follows the guidelines here: https://github.com/erlang/otp/wiki/Writing-good-commit-messages.

Bad message: Fix sys:get_status/1 noproc crash during release installation


I am a script, I am not human


When release_handler_1:get_supervised_procs/0 does a recursive walk of
the supervision tree, it calls sys:get_status/1 on supervisors, to check
if they are suspended or running.

This fixes a race condition where a list of supervisor pids is gathered,
one of them (legitimately) exits before release_handler can examine it,
then sys:get_status/1 is called with a dead pid, causing an exit(noproc)

See: http://erlang.org/pipermail/erlang-questions/2015-August/085712.html

(To recreate this problem for testing, I added a timer:sleep into the
release_handler_1 code, and killed a supervisor during
get_supervised_procs being called).
@RJ RJ force-pushed the release-handler-noproc-fix branch from bba9065 to 1e390ab Compare September 24, 2015 12:48
@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@sirihansen
Copy link
Contributor

Looks good. I did a minor fix to the text in the warning message, and added the PR to our nightly tests. Thanks!

@sirihansen
Copy link
Contributor

Test ok for five nights. Including PR in OTP-18.3.

@sirihansen sirihansen closed this Feb 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants