Skip to content

Fix for broken stop_child #12

Merged
merged 7 commits into from Jun 10, 2013

2 participants

@davidw
davidw commented Jun 7, 2013

There are three important bits here:

  • unsigned types in ei++.h make date calculations actually work with negative numbers.

  • check_children reuses stop_child to end processes that have gone on beyond their deadline.

  • the internal kill function ensures that kill(-1, ...) will not be called, which is a fairly bad thing to have happen.

davidw added some commits Jun 7, 2013
@davidw
davidw commented Jun 7, 2013

Hrm... pull request has improved things, but not all is right. When I do application:stop(exec) it does not clean up properly.

@davidw
davidw commented Jun 7, 2013

Thinking out load: terminate() should wait on a message from the c++ app...

@davidw
davidw commented Jun 7, 2013

https://github.com/davidw/erlexec/tree/application_stop_fix

This branch has a few more fixes that seem to stop things cleanly when we do application:stop(exec). One bit that I was unsure of is where I just put a sleep(1) - maybe that's not the right way of doing things, but it seems to work for me.

Here's the motivation behind a lot of this work:

I'm managing some external processes, and init:stop gets called, which in turn stops exec, but some of the applications that get stopped catch SIGTERM and take a few seconds to shut down (or have issues and require SIGKILL). When those processes are shutting down, they'd still like to log stuff which they do via Erlang (they're really open_ports that are managed by exec). So it'd be nice if Erlang and the logger application would wait until exec has finished shutting down, which this code seems to accomplish.

I will be testing it more on Monday...

@saleyn
Owner
saleyn commented Jun 9, 2013

This pull request looks good except for fprintf() calls which are terminated by "\n" rather than "\r\n". The later is needed so that when the port program prints debug info it shows up properly in console window (otherwise new lines are continued without carriage return on some platforms).

@saleyn
Owner
saleyn commented Jun 9, 2013

Also regarding the application_stop_fix branch - the unconditional use of sleep(1) is a problem in case when there is a stuck process that can't be killed for some reason. Say you called a manage(OsPid) on a pid that has different user's ownership, and can't be killed. In that case the port program would get stuck. Prior solution relied on a deadline given to exec to terminate. I believe the original approach is better. Perhaps the default deadline needs to be increased?

@davidw
davidw commented Jun 10, 2013

Ok, this one is fixed up. I will open a separate pull request for the other branch so we can hash out the details there.

@davidw davidw referenced this pull request Jun 10, 2013
Merged

Application stop fix #13

@saleyn saleyn merged commit 8f51c70 into saleyn:master Jun 10, 2013

1 check passed

Details default The Travis CI build passed
@saleyn
Owner
saleyn commented Jun 10, 2013

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.