Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The net_kernel:setopts call can deadlock the net_kernel process. #6129

Closed
villiros opened this issue Jul 4, 2022 · 4 comments · Fixed by #6216
Closed

The net_kernel:setopts call can deadlock the net_kernel process. #6129

villiros opened this issue Jul 4, 2022 · 4 comments · Fixed by #6216
Assignees
Labels
bug Issue is reported as a bug in progress team:VM Assigned to OTP team VM

Comments

@villiros
Copy link

villiros commented Jul 4, 2022

Describe the bug
The net_kernel:setopts(new, ...) can deadlock the net_kernel process permanently. This happens if the call is made concurrently with an incoming dist connection handshake.

To Reproduce
With the official docker container:

$ docker run --name otpbug -h otpbug -it erlang:25.0.2 /bin/bash
$ erl -sname foo
% Repeatedly call setopts(new)
> spawn_link(fun D() -> net_kernel:setopts(new, [{buffer, 10000}]), timer:sleep(10), D() end).

In another shell in the container:

$ docker exec -ti otpbug /bin/bash
% Repeatedly connect&disconnect with erl_call
$ while true; do /usr/local/lib/erlang/lib/erl_interface-5.3/bin/erl_call -n foo@otpbug -a 'erlang now []' ; echo; sleep 0.5; done

The erl_call loop quickly gets stuck and the node no longer accepts dist connections.

Affected versions
We've seen this in our code on R22, and I've tried 24 and 25 with same result. Bug probably was there since setopts was introduced?

Additional context

I think what happens is that net_kernel:setopts_new_1/1 is calling into pending connections. The incoming one is pending (doing handshake), but at the end of the handshake it calls into net_kernel (dist_util:mark_nodeup/2), so it's a deadlock.

For completeness, here's the process_info and backtrace of net_kernel:

(foo@otpbug)2> process_info(whereis(net_kernel)).
[{registered_name,net_kernel},
 {current_function,{net_kernel,call_owner,2}},
 {initial_call,{proc_lib,init_p,5}},
 {status,waiting},
 {message_queue_len,22},
 {links,[<0.62.0>,<0.160.0>,#Port<0.3>,<0.63.0>,<0.58.0>]},
 {dictionary,[{longnames,false},
              {'$initial_call',{net_kernel,init,1}},
              {'$ancestors',[net_sup,kernel_sup,<0.47.0>]}]},
 {trap_exit,true},
 {error_handler,error_handler},
 {priority,max},
 {group_leader,<0.46.0>},
 {total_heap_size,752},
 {heap_size,376},
 {stack_size,30},
 {reductions,53487},
 {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                      {min_bin_vheap_size,46422},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,96}]},
 {suspending,[]}]
(foo@otpbug)3> {_, X} = process_info(whereis(net_kernel), backtrace), io:format("~s", [X]).
Program counter: 0x00007f274e2781d0 (net_kernel:call_owner/2 + 344)
y(0)     []
y(1)     []
y(2)     #Ref<0.2457213282.1092354051.122147>

0x00007f2727281eb8 Return addr 0x00007f274e071e9c (lists:foreach_1/2 + 148)
y(0)     []
y(1)     #Fun<net_kernel.9.28134678>

0x00007f2727281ed0 Return addr 0x00007f274e278a4c (net_kernel:setopts_new_1/1 + 684)

0x00007f2727281ed8 Return addr 0x00007f274e2696c4 (net_kernel:handle_call/3 + 1356)
y(0)     {state,'foo@otpbug',shortnames,{tick,<0.63.0>,60000,4},7000,sys_dist,#{<0.160.0>=>'c17@otpbug'},#{},#{},[{listen,#Port<0.3>,<0.62.0>,{net_address,{{0,0,0,0},40037},"otpbug",tcp,inet},inet_tcp_dist}],[],0,all,#{},net_sup}
y(1)     {<0.140.0>,#Ref<0.2457213282.1092354051.122143>}

0x00007f2727281ef0 Return addr 0x00007f274e1acb1c (gen_server:try_handle_call/4 + 244)
y(0)     []
y(1)     []
y(2)     Catch 0x00007f274e1acb70 (gen_server:try_handle_call/4 + 328)

0x00007f2727281f10 Return addr 0x00007f274e1ad028 (gen_server:handle_msg/6 + 200)
y(0)     []
y(1)     []
y(2)     []
y(3)     {setopts,new,[{buffer,10000}]}
y(4)     {<0.140.0>,#Ref<0.2457213282.1092354051.122143>}
y(5)     infinity
y(6)     net_kernel
y(7)     {state,'foo@otpbug',shortnames,{tick,<0.63.0>,60000,4},7000,sys_dist,#{<0.160.0>=>'c17@otpbug'},#{},#{},[{listen,#Port<0.3>,<0.62.0>,{net_address,{{0,0,0,0},40037},"otpbug",tcp,inet},inet_tcp_dist}],[],0,all,#{},net_sup}
y(8)     net_kernel
y(9)     <0.58.0>

0x00007f2727281f68 Return addr 0x00007f274e094740 (proc_lib:init_p_do_apply/3 + 192)
y(0)     []
y(1)     []
y(2)     Catch 0x00007f274e09475e (proc_lib:init_p_do_apply/3 + 222)

0x00007f2727281f88 Return addr 0x00007f274df94c30 (<terminate process normally>)

The incoming connection's backtrace:

Program counter: 0x00007f274e363dcc (dist_util:mark_nodeup/3 + 332)
y(0)     'c17@otpbug'
y(1)     []
y(2)     []
y(3)     <0.61.0>
y(4)     {hs_data,<0.61.0>,'c17@otpbug','foo@otpbug',#Port<0.31>,<0.161.0>,17297788828,[],undefined,17297788828,true,fun inet_tcp:send/2,fun inet_tcp:recv/3,#Fun<inet_tcp_dist.0.130765917>,#Fun<inet_tcp_dist.1.130765917>,#Fun<inet_tcp_dist.2.130765917>,#Fun<inet_tcp_dist.3.130765917>,#Fun<inet_tcp_dist.4.130765917>,fun inet_tcp_dist:getstat/1,normal,fun inet_tcp_dist:setopts/2,fun inet_tcp_dist:getopts/2,undefined,0,0,0,undefined,23251}

0x00007f2727266790 Return addr 0x00007f274e362e90 (dist_util:connection/1 + 464)
y(0)     {24,#Ref<0.2457213282.1092485123.120891>}
y(1)     []
y(2)     'c17@otpbug'
y(3)     hidden
y(4)     #Port<0.31>
y(5)     {hs_data,<0.61.0>,'c17@otpbug','foo@otpbug',#Port<0.31>,<0.161.0>,17297788828,[],undefined,17297788828,true,fun inet_tcp:send/2,fun inet_tcp:recv/3,#Fun<inet_tcp_dist.0.130765917>,#Fun<inet_tcp_dist.1.130765917>,#Fun<inet_tcp_dist.2.130765917>,#Fun<inet_tcp_dist.3.130765917>,#Fun<inet_tcp_dist.4.130765917>,fun inet_tcp_dist:getstat/1,normal,fun inet_tcp_dist:setopts/2,fun inet_tcp_dist:getopts/2,undefined,0,0,0,undefined,23251}

0x00007f27272667c8 Return addr 0x00007f274df94c30 (<terminate process normally>)
@villiros villiros added the bug Issue is reported as a bug label Jul 4, 2022
@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Jul 6, 2022
@rickard-green rickard-green self-assigned this Jul 6, 2022
@rickard-green
Copy link
Contributor

Thanks for the detailed bug report! Yes, this bug has existed since the introduction of the setopts() functionality. I'll begin working on a PR fixing it,

rickard-green added a commit to rickard-green/otp that referenced this issue Aug 11, 2022
…-18198' into rickard/net_kernel-setopts-fix/24.3.4/erlangGH-6129/OTP-18198

* rickard/net_kernel-setopts-fix/23.3.4/erlangGH-6129/OTP-18198:
  [kernel] Fix deadlock caused by net_kernel:setopts(new, _)
rickard-green added a commit to rickard-green/otp that referenced this issue Aug 11, 2022
…-18198' into rickard/net_kernel-setopts-fix/25.0.3/erlangGH-6129/OTP-18198

* rickard/net_kernel-setopts-fix/24.3.4/erlangGH-6129/OTP-18198:
  [kernel] Fix deadlock caused by net_kernel:setopts(new, _)
@rickard-green
Copy link
Contributor

The PR #6216 should fix this issue (not very well tested yet, though). Unless something unexpected happens it will be part of the next patch.

@rickard-green rickard-green linked a pull request Aug 11, 2022 that will close this issue
rickard-green added a commit that referenced this issue Aug 15, 2022
…' into maint

* rickard/net_kernel-setopts-fix/25.0.3/GH-6129/OTP-18198:
  [kernel] Fix deadlock caused by net_kernel:setopts(new, _)
lthor pushed a commit that referenced this issue Aug 18, 2022
…' into maint-24

* rickard/net_kernel-setopts-fix/24.3.4/GH-6129/OTP-18198:
  [kernel] Fix deadlock caused by net_kernel:setopts(new, _)
rickard-green pushed a commit that referenced this issue Aug 18, 2022
…' into maint-25

* rickard/net_kernel-setopts-fix/25.0.3/GH-6129/OTP-18198:
  [kernel] Fix deadlock caused by net_kernel:setopts(new, _)
@rickard-green
Copy link
Contributor

Fixed in patches OTP 25.0.4 and OTP 24.3.4.3 which we released today.

@villiros
Copy link
Author

Awesome. Thank you very much :)

kikofernandez pushed a commit that referenced this issue Sep 5, 2022
…' into maint-23

* rickard/net_kernel-setopts-fix/23.3.4/GH-6129/OTP-18198:
  [kernel] Fix deadlock caused by net_kernel:setopts(new, _)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug in progress team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants