Fix for broken stop_child #12

Merged
merged 7 commits into from Jun 10, 2013

Conversation

Projects
None yet
2 participants
@davidw
Contributor

davidw commented Jun 7, 2013

There are three important bits here:

  • unsigned types in ei++.h make date calculations actually work with negative numbers.
  • check_children reuses stop_child to end processes that have gone on beyond their deadline.
  • the internal kill function ensures that kill(-1, ...) will not be called, which is a fairly bad thing to have happen.

davidw added some commits Jun 7, 2013

Use our own kill function in order to avoid kill(-1), and rework kill…
…ing in check_children.

check_children now utilizes stop_child in order to kill child
processes in order to reuse the same logic used elsewhere.
@davidw

This comment has been minimized.

Show comment
Hide comment
@davidw

davidw Jun 7, 2013

Contributor

Hrm... pull request has improved things, but not all is right. When I do application:stop(exec) it does not clean up properly.

Contributor

davidw commented Jun 7, 2013

Hrm... pull request has improved things, but not all is right. When I do application:stop(exec) it does not clean up properly.

@davidw

This comment has been minimized.

Show comment
Hide comment
@davidw

davidw Jun 7, 2013

Contributor

Thinking out load: terminate() should wait on a message from the c++ app...

Contributor

davidw commented Jun 7, 2013

Thinking out load: terminate() should wait on a message from the c++ app...

@davidw

This comment has been minimized.

Show comment
Hide comment
@davidw

davidw Jun 7, 2013

Contributor

https://github.com/davidw/erlexec/tree/application_stop_fix

This branch has a few more fixes that seem to stop things cleanly when we do application:stop(exec). One bit that I was unsure of is where I just put a sleep(1) - maybe that's not the right way of doing things, but it seems to work for me.

Here's the motivation behind a lot of this work:

I'm managing some external processes, and init:stop gets called, which in turn stops exec, but some of the applications that get stopped catch SIGTERM and take a few seconds to shut down (or have issues and require SIGKILL). When those processes are shutting down, they'd still like to log stuff which they do via Erlang (they're really open_ports that are managed by exec). So it'd be nice if Erlang and the logger application would wait until exec has finished shutting down, which this code seems to accomplish.

I will be testing it more on Monday...

Contributor

davidw commented Jun 7, 2013

https://github.com/davidw/erlexec/tree/application_stop_fix

This branch has a few more fixes that seem to stop things cleanly when we do application:stop(exec). One bit that I was unsure of is where I just put a sleep(1) - maybe that's not the right way of doing things, but it seems to work for me.

Here's the motivation behind a lot of this work:

I'm managing some external processes, and init:stop gets called, which in turn stops exec, but some of the applications that get stopped catch SIGTERM and take a few seconds to shut down (or have issues and require SIGKILL). When those processes are shutting down, they'd still like to log stuff which they do via Erlang (they're really open_ports that are managed by exec). So it'd be nice if Erlang and the logger application would wait until exec has finished shutting down, which this code seems to accomplish.

I will be testing it more on Monday...

@saleyn

This comment has been minimized.

Show comment
Hide comment
@saleyn

saleyn Jun 9, 2013

Owner

This pull request looks good except for fprintf() calls which are terminated by "\n" rather than "\r\n". The later is needed so that when the port program prints debug info it shows up properly in console window (otherwise new lines are continued without carriage return on some platforms).

Owner

saleyn commented Jun 9, 2013

This pull request looks good except for fprintf() calls which are terminated by "\n" rather than "\r\n". The later is needed so that when the port program prints debug info it shows up properly in console window (otherwise new lines are continued without carriage return on some platforms).

@saleyn

This comment has been minimized.

Show comment
Hide comment
@saleyn

saleyn Jun 9, 2013

Owner

Also regarding the application_stop_fix branch - the unconditional use of sleep(1) is a problem in case when there is a stuck process that can't be killed for some reason. Say you called a manage(OsPid) on a pid that has different user's ownership, and can't be killed. In that case the port program would get stuck. Prior solution relied on a deadline given to exec to terminate. I believe the original approach is better. Perhaps the default deadline needs to be increased?

Owner

saleyn commented Jun 9, 2013

Also regarding the application_stop_fix branch - the unconditional use of sleep(1) is a problem in case when there is a stuck process that can't be killed for some reason. Say you called a manage(OsPid) on a pid that has different user's ownership, and can't be killed. In that case the port program would get stuck. Prior solution relied on a deadline given to exec to terminate. I believe the original approach is better. Perhaps the default deadline needs to be increased?

@davidw

This comment has been minimized.

Show comment
Hide comment
@davidw

davidw Jun 10, 2013

Contributor

Ok, this one is fixed up. I will open a separate pull request for the other branch so we can hash out the details there.

Contributor

davidw commented Jun 10, 2013

Ok, this one is fixed up. I will open a separate pull request for the other branch so we can hash out the details there.

@davidw davidw referenced this pull request Jun 10, 2013

Merged

Application stop fix #13

saleyn added a commit that referenced this pull request Jun 10, 2013

@saleyn saleyn merged commit 8f51c70 into saleyn:master Jun 10, 2013

1 check passed

default The Travis CI build passed
Details
@saleyn

This comment has been minimized.

Show comment
Hide comment
@saleyn

saleyn Jun 10, 2013

Owner

Thanks for your contribution!

Owner

saleyn commented Jun 10, 2013

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment