ranch_listener_sup fails to restart ranch_listener #29

Closed
jinnipark opened this Issue Jan 24, 2013 · 10 comments

2 participants

@jinnipark

I tried chaos_monkey on ranch.
Eventually the chaos_monkey killed ranch_listener.
And I got a supervisor report by ranch_listener_sup followed by a crash report saying,

{exit, {{badmatch,false},
        [{ranch_server,insert_listener,2,
                              [{file,"src/ranch_server.erl"},{line,57},

Guess the line true = ets:insert_new(...) can be changed to true = ets:insert(...) from ranch_server:insert_listener/2, or there must be a ranch_server:remove_listener or so defined and be called from ranch_lister:terminate/2.

@essen
Nine Nines member

Can't rely on terminate if the process receives a kill signal.

Can you provide a test for this?

@jinnipark

exit(ranch_server:lookup_listener(Name), kill).

@essen
Nine Nines member

I meant a common_test case for the issue you describe.

@jinnipark

I found it not by common test but by chaos_monkey which kills processes at random.

@jinnipark

Do you want me to commit something to your repo?

@essen
Nine Nines member

Yes, a common_test case that crashes it and checks that Ranch is still up a second later. There's already similar tests, supervisor_* ones. Then we can fix it and make sure it doesn't appear again.

@jinnipark

I suspect that you can find it via common test cause it doesn't happen always. It seems to be a timing issue between the ranch_listener_sup and the ranch_server as both are monitoring the ranch_listener. What if the supervisor tries to restart listener before ranch_server handles 'DOWN' event? The ct might look like below if I have to write though.

exit(ranch_server:lookup_listener(test_listener), kill),
timer:sleep(1000),
case process_info(ranch_server:lookup_listener(test_listener)) of
  undefined -> ct:fail(listener_restart);
  _ -> ok
end.
@essen
Nine Nines member

You need to determine the conditions in which it happens and reproduce these conditions in a test case so it can be fixed. I can't really blindly look for a bug I never observed. :)

@jinnipark

This is how I reproduce the error. Quite sure you can do the same.
Starting from ranch home,

  1. cd examples/tcp_echo
  2. edit rebar.config to insert {chaos_monkey, ".*", {git, "git://github.com/dLuna/chaos_monkey.git"}} in deps
  3. rebar get-deps; rebar comile
  4. ./start.sh

Now in the erlang shell.

1> application:which_applications().
[{tcp_echo,"Ranch TCP Echo example.","1"},
 {ranch,"Socket acceptor pool for TCP protocols.","0.6.1"},
 {stdlib,"ERTS  CXC 138 10","1.18.1"},
 {kernel,"ERTS  CXC 138 10","2.15.1"}]
2> application:start(pman).
ok
3> application:start(chaos_monkey).
ok
4> chaos_monkey:on([{apps,[ranch]}]).
{ok,started}
...
<0.55.0> chaos_monkey:handle_info/2 #130 Killed {<0.73.0>,ranch, im_killing_you}
<0.55.0> chaos_monkey:handle_info/2 #130 Killed {<0.71.0>,ranch, im_killing_you}
<0.55.0> chaos_monkey:handle_info/2 #130 Killed {<0.42.0>,ranch, im_killing_you}

=INFO REPORT==== 30-Jan-2013::14:59:41 ===
    application: ranch
    exited: shutdown
    type: temporary

5> application:which_applications().
[{chaos_monkey,"A Monkey that Spreads Chaos","007847c"},
 {pman,"pman The Process Manager","2.7.1.2"},
 {tcp_echo,"Ranch TCP Echo example.","1"},
 {stdlib,"ERTS  CXC 138 10","1.18.1"},
 {kernel,"ERTS  CXC 138 10","2.15.1"}]

As you can see, chaos_monkey killed ranch. And the process killed just before the application shutdown is ranch_listener. Note that chaos_monkey never kills top-level supervisor.

@essen
Nine Nines member

We got a PR open for this, so closing this ticket. Thanks!

@essen essen closed this Apr 2, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment