Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not emit "unexpected" message on shutdown #6810

Merged

Conversation

max-au
Copy link
Contributor

@max-au max-au commented Feb 7, 2023

During shutdown, net_kernel process that is linked to an acceptor, gracefully terminates, triggering
{EXIT, net_kernel, shutdown} message sent to the acceptor.

However acceptor does not expect it, and instead of a graceful shutdown logs a confusing warning.

During shutdown, net_kernel process that is linked to an
acceptor, gracefully terminates, triggering
`{EXIT, net_kernel, shutdown}` message sent to the acceptor.

However acceptor does not expect it, and instead of a graceful
shutdown logs a confusing warning.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2023

CT Test Results

       2 files       64 suites   44m 12s ⏱️
   745 tests    712 ✔️   33 💤 0
3 537 runs  2 862 ✔️ 675 💤 0

Results for commit df0ef03.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@max-au
Copy link
Contributor Author

max-au commented Feb 7, 2023

See the discussion: https://erlangforums.com/t/tls-distribution-unexpected-message-exit-0-178-0-shutdown and original error rabbitmq/rabbitmq-server#7182

Note: this does not fix another issue with the supervision tree of the TLS dist. That second problem manifests like this:

2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>     supervisor: {<0.2756.0>,tls_dyn_connection_sup}
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>     errorContext: child_terminated
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>     reason: killed
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>     offender: [{pid,<0.2757.0>},
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>                {id,sender},
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>                {mfargs,{tls_sender,start_link,undefined}},
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>                {restart_type,temporary},
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>                {significant,false},
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>                {shutdown,5000},
2023-02-03 15:07:44.565913+00:00 [error] <0.2756.0>                {child_type,worker}]

This appears to be caused by net_kernel shutdown (e.g net:stop() is called for dynamic distribution). net_kernel is linked to a TLS connection receiver (gen_statem process), and not to the tls_dyn_connection_sup. I tracked it down to 0078ebf (when TLS was split into sender/receiver), and I believe the fix would be to link the supervisor process instead of the receiver.

Currently #sslsocket structure does not contain the supervisor pid. I can add a second commit that would use process_info(DistCtrl, parent) to fetch that in an obscure way, but before doing so, I'd be interested in what OTP team thinks (@RaimoNiskanen and @IngelaAndin ).

@RaimoNiskanen
Copy link
Contributor

Regarding df0ef03; I think it is spot on. It will collide with changes I have in 'master', but that is what it is.

Regarding the SSL distribution supervision tree vs. net_kernel and Kernel's net_sup tree, and how they should cooperate to avoid such an ugly and incorrect report during shutdown; we (I and @IngelaAndin) would have to take an internal discussion with the VM team e.g @rickard-green and maybe more to sort this out.

net_kernel does not care much for distribution implementation that has other supervisors. It kills the distribution process it is linked to. And SSL's supervision tree, I suppose, should to be taken down from the top in an orderly manner...

We will see

@bjorng bjorng added the team:PS Assigned to OTP team PS label Feb 8, 2023
@RaimoNiskanen RaimoNiskanen added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels Feb 8, 2023
@RaimoNiskanen RaimoNiskanen merged commit 6daf35f into erlang:maint Feb 9, 2023
@RaimoNiskanen
Copy link
Contributor

Thank you for your pull request!

We will have to take a closer look later about what to do about the child in the SSL distribution supervision tree that is being killed during node shutdown...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants