erts_no_of_hidden_dist_entries=-1 when racing happens between erts_do_net_exits and abort_pending_connection #6247

zzydxm · 2022-08-25T23:16:46Z

Describe the bug
We occasionally see coredumps with backtraces like:

#0  0x00000000006139bf in erts_do_net_exits (dep=<optimized out>, reason=<optimized out>) at beam/dist.c:947
#1  0x000000000046c6fb in erts_continue_exit_process (p=0x17798b8) at beam/erl_process.c:13975
#2  0x00007f7c6a663fe0 in erts_beamasm:continue_exit/0 ()
#3  0x00007f7c61b347a8 in proc_lib:exit_p/3 () at proc_lib.erl:261
#4  0x00007f7c6a663f38 in erts_beamasm:normal_exit/0-CodeInfoPrologue ()

#0  0x0000000000523107 in unlink_carrier (crr=0x7fc1411a8ec0, cl=0x7fc0f385bee0) at beam/erl_alloc_util.c:1258
#1  destroy_carrier (allctr=0x7fc0f385bd80, blk=0x7fc1411a8ee0, busy_pcrr_pp=<optimized out>) at beam/erl_alloc_util.c:4305
#2  0x0000000000530e8e in do_erts_alcu_free (busy_pcrr_pp=0x0, p=0x7fc1411a8ee8, allctr=0x7fc0f385bd80, type=1250)
    at beam/erl_alloc_util.c:5953
#3  do_erts_alcu_free (busy_pcrr_pp=0x0, p=<optimized out>, allctr=0x7fc0f385bd80, type=1250) at beam/erl_alloc_util.c:5929
#4  erts_alcu_free_thr_spec (type=1250, extra=<optimized out>, p=<optimized out>) at beam/erl_alloc_util.c:5993
#5  0x0000000000613b7e in erts_do_net_exits (dep=<optimized out>, reason=<optimized out>) at beam/dist.c:950
#6  0x000000000046c6fb in erts_continue_exit_process (p=0x17b6750) at beam/erl_process.c:13975

The coredump happens on node shutdown when there is pending connections. In both case, we see that erts_no_of_hidden_dist_entries=-1. This means there are duplicated calls of erts_set_dist_entry_not_connected, causing wrong value of erts_no_of_hidden_dist_entries. (Even though there is rwlock for dist table, it is still possible that one operation is blocked and the other operation pass through. Then blocked operation decreased the value again.)

We don't know exactly why unlink_carrier of local variable "DistEntry** pending" would fail, or why beam/dist.c:947 abort_pending_connection(pending[i], pending[i]->connection_id, NULL) would fail, but the pending shutdown racing surely worth fixing.

To Reproduce
This is a rare racing, but logically it surely can happen.

Expected behavior
erts_no_of_hidden_dist_entries should never drop below 0

Affected versions
25.0.2

Additional information
We are running duel dist protocol, non-hidden node connection is using inet_drv, and hidden node connection is using inet_tls_dist

Also, on recent coredumps, we all have erts_no_of_pending_dist_entries = 1

The text was updated successfully, but these errors were encountered:

zzydxm · 2022-08-26T20:57:14Z

dist table is completely messed up

(gdb) set $cnt=0
(gdb) set $de=erts_pending_dist_entries
(gdb) while $de
 >    set $cnt=$cnt+1
 >    set $de=$de->next
 >end
(gdb) p $cnt
$29 = 8109
(gdb) p erts_no_of_pending_dist_entries
$30 = 1
(gdb) set $cnt=0
(gdb) set $de=erts_not_connected_dist_entries
(gdb) while $de
 >    set $cnt=$cnt+1
 >    set $de=$de->next
 >end
(gdb) p $cnt
$32 = 3818
(gdb) p erts_no_of_not_connected_dist_entries
$33 = 11930
(gdb) set $cnt=0
(gdb) set $de=erts_visible_dist_entries
(gdb) while $de
 >    set $cnt=$cnt+1
 >    set $de=$de->next
 >end
(gdb) p $cnt
$35 = 10809
(gdb) p erts_no_of_visible_dist_entries
$36 = 10809
(gdb) set $cnt=0
(gdb) set $de=erts_hidden_dist_entries
(gdb) while $de
 >    set $cnt=$cnt+1
 >    set $de=$de->next
 >end
(gdb) p $cnt
$38 = 0
(gdb) p erts_no_of_hidden_dist_entries
$39 = -1
(gdb)

zzydxm · 2022-08-26T21:12:34Z

Ah I think it just happened that dep->state == ERTS_DE_STATE_IDLE
So if dep->state == ERTS_DE_STATE_IDLE, we can just return without doing anything

sverker · 2022-08-29T22:15:04Z

I think I found the problem. PR #6258 contains a preliminary fix.

zzydxm · 2022-08-29T22:45:28Z

I think I found the problem. PR #6258 contains a preliminary fix.

Thanks a lot for the quick fix! #6258 seems fixed a legit bug but I'm not sure if it solves our problem. We would need several weeks to test it, because the problem currently happens approximately once per week in our entire cluster.

Is it still possible that net_kernel shutdown when there is a pending connection, and the pending connection is also shutdown because port closed before erts_do_net_exits call abort_pending_connection, so that erts_set_dist_entry_not_connected is called twice?

sverker · 2022-08-30T10:03:16Z

I still think this could explain your problem.

A terminating process (net_kernel) is market with ERTS_PSFLG_EXITING before erts_continue_process() is called where it detects F_DISTRIBUTION and calls erts_do_net_exits(). It may even yield in between and execute other things, like terminating ports.

That would allow (I think) for erts_internal:create_dist_channel to fail send msg to exiting net_kernel process, leave the DistEntry PENDING and getting erts_do_net_exits called on the port before it's called on net_kernel.

I will get a second opinion on the fix though...

sverker · 2022-08-30T12:08:50Z

I've updated the fix #6258 slightly. Would be good if you could try it.

sverker · 2022-08-30T12:38:39Z

I can't say that I understand exactly how this bug would explain the particular messed up state of the DistEntry lists and their counters.

In case there still is some other bug messing up the DistEntry lists and you don't want/can run debug emulator, a much more lightweight alternative could be to enable ASSERT's by changing them to ERTS_ASSERT in

erts_set_dist_entry_not_connected()
erts_set_dist_entry_pending()
erts_set_dist_entry_connected()

zzydxm · 2022-08-31T17:46:05Z

I still haven't been able to do a local reproduce, we will test this fix on our prod, but it may take few weeks to know if it works.

You patch currently does following things twice, is it expected?
erts_atomic32_read_bor_nob(&pp->state, ERTS_PORT_SFLG_DISTRIBUTION);
erts_prtsd_set(pp, ERTS_PRTSD_DIST_ENTRY, dep);
erts_prtsd_set(pp, ERTS_PRTSD_CONN_ID, (void*)(UWord)dep->connection_id);

zzydxm · 2022-08-31T17:51:06Z

If in any case that one dep in "pending" list happened that "dep->state == ERTS_DE_STATE_IDLE and dep is currently in erts_not_connected_dist_entries list". I think it will result in follow things:

head = &erts_hidden_dist_entries
erts_no_of_pending_dist_entries end up with 1 when erts_do_net_exits is done
erts_no_of_hidden_dist_entries end up with -1 when erts_do_net_exits is done
*head != dep so erts_hidden_dist_entries won't be changed and eventually be empty

dep will be moved to the head of the erts_not_connected_dist_entries
What I don't understand is why the entries after dep previously will be moved to erts_pending_dist_entries,

sverker · 2022-08-31T18:35:55Z

You patch currently does following things twice, is it expected?

No. If sending the message to net_kernel fails, it resets the values:
if (set_res == 0) {
erts_atomic32_read_band_nob(&pp->state, ~ERTS_PORT_SFLG_DISTRIBUTION);
erts_prtsd_set(pp, ERTS_PRTSD_DIST_ENTRY, NULL);
erts_prtsd_set(pp, ERTS_PRTSD_CONN_ID, NULL);
goto badarg;
}

sverker · 2022-08-31T19:44:59Z

If in any case that one dep in "pending" list happened that "dep->state == ERTS_DE_STATE_IDLE and dep is currently in erts_not_connected_dist_entries list". I think it will result in follow things:

Sorry, I don't follow.
The bug I fixed would make erts_do_net_exits() change dep->state from PENDING to EXITING here:

if (dep->state == ERTS_DE_STATE_EXITING) {
    ASSERT(erts_atomic32_read_nob(&dep->qflgs) & ERTS_DE_QFLG_EXIT);
}
else {
    dep->state = ERTS_DE_STATE_EXITING;

This is not allowed, only CONNECTED can be set to EXITING.

It will then call erts_set_dist_entry_not_connected() which will treat the dep as if it is not in pending list, but rather in either visible or hidden lists. If dep is first in list, it will mess up the lists and it gets really complicated (for me at least) to figure out how subsequent list operations can possibly further mess it up the lists and if what you saw is a possible outcome.

zzydxm · 2022-08-31T21:05:22Z

What I would imagine the steps that triggered the issue would be:

A dist connection is in pending state
net_kernel is shutting down and triggered erts_do_net_exits, which store the dist entry dep in the pending variable
The inet_drv port of the dist connection closed due to some reason (e.g. remote node closed the connection) and also triggered erts_do_net_exits, which eventually did erts_set_dist_entry_not_connected and set dep->state = ERTS_DE_STATE_IDLE, dep -> dflag = 0
erts_do_net_exits for net_kernel calls abort_pending_connection on dep
erts_set_dist_entry_not_connected happens on a dep where dep->state = ERTS_DE_STATE_IDLE, which is unexpected
dist table is messed up and erts_no_of_hidden_dist_entries is set to -1 because when dep -> dflag = 0, head variable in erts_set_dist_entry_not_connected will be set to erts_hidden_dist_entries
Is this possible?

I think you fix solved that

A dist connection is in pending state
net_kernel is shutting down and triggered erts_do_net_exits, which store the dist entry in the "pending" variable
dist_util.erl triggers erlang:setnode and then create_dist_channel but it failed and goes to exiting state
dep->state was set to ERTS_DE_STATE_IDLE
erts_set_dist_entry_not_connected happens on a dep where dep->state = ERTS_DE_STATE_IDLE
dist table is messed up and erts_no_of_hidden_dist_entries is set to -1

I think both case could happen and you fix is solving one of them, am I understanding right?

sverker · 2022-09-01T10:30:43Z

A DistEntry (including its state) is protected by its read-write lock:

erts_de_rwlock(dep);
erts_de_rwunlock(dep);

and the DistEntry lists and counters are further protected by one single read-write lock:

erts_rwmtx_rwlock(&erts_dist_table_rwmtx);
erts_rwmtx_rwunlock(&erts_dist_table_rwmtx);

You seem to think that erts_set_dist_entry_not_connected() gets called with dep->state == IDLE. I don't think that was/is possible.

erts_set_dist_entry_not_connected() is called in two places:

In erts_abort_pending_connection_rwunlock() where it only gets called if dep->state == PENDING
In erts_do_net_exits() if found via a port or process. A DistEntry is only hooked into a port with erts_prtsd_set(prt, ERTS_PRTSD_DIST_ENTRY, dep) by erts_internal_create_dist_channel_3() were we change from PENDING to CONNECTED. And there is a similar case when hooked into a controlling process.

My bug scenario was

dep in PENDING state.
Node start shutting down. Start terminate net_kernel marking it as EXITING.
Other scheduler runs erts_internal_create_dist_channel_3
hooks PENDING dep into port
but then fails to send msg to net_kernel as it's EXITING.
The bug leaves PENDING dep while still hooked into port.
erts_do_net_exit() is called for the port with PENDING dep (as the node is shutting down).
dep->state is changed from PENDING to EXITING (which is wrong as it's still in pending list)
erts_set_dist_entry_not_connected() is called which sees EXITING and assumes dep is in visible or hidden lists and messes up counters and the lists.

zzydxm · 2022-09-01T16:18:04Z

erts_dist_table_rwmtx is locked during setting pending list, but the lock is released before running abort_pending_connection, during that time, dep is also not locked, so state change could happen on dep
So I wound imagine dep can go to CONNECT and then IDLE during that period, (EXITING also introduces problem)

sverker · 2022-09-13T13:38:01Z

I think #6258 will fix this. Closing issue.

zzydxm added the bug Issue is reported as a bug label Aug 25, 2022

zzydxm mentioned this issue Aug 26, 2022

Coredump during beam shutdown #6057

Closed

rickard-green added the team:VM Assigned to OTP team VM label Aug 29, 2022

rickard-green assigned sverker Aug 29, 2022

sverker mentioned this issue Aug 29, 2022

Fix DistEntry corruption at node shutdown #6258

Merged

rickard-green added the in progress label Sep 5, 2022

sverker closed this as completed Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

erts_no_of_hidden_dist_entries=-1 when racing happens between erts_do_net_exits and abort_pending_connection #6247

erts_no_of_hidden_dist_entries=-1 when racing happens between erts_do_net_exits and abort_pending_connection #6247

zzydxm commented Aug 25, 2022 •

edited

Loading

zzydxm commented Aug 26, 2022 •

edited

Loading

zzydxm commented Aug 26, 2022

sverker commented Aug 29, 2022

zzydxm commented Aug 29, 2022 •

edited

Loading

sverker commented Aug 30, 2022

sverker commented Aug 30, 2022

sverker commented Aug 30, 2022 •

edited

Loading

zzydxm commented Aug 31, 2022

zzydxm commented Aug 31, 2022

sverker commented Aug 31, 2022 •

edited

Loading

sverker commented Aug 31, 2022

zzydxm commented Aug 31, 2022 •

edited

Loading

sverker commented Sep 1, 2022

zzydxm commented Sep 1, 2022

sverker commented Sep 13, 2022

erts_no_of_hidden_dist_entries=-1 when racing happens between erts_do_net_exits and abort_pending_connection #6247

erts_no_of_hidden_dist_entries=-1 when racing happens between erts_do_net_exits and abort_pending_connection #6247

Comments

zzydxm commented Aug 25, 2022 • edited Loading

zzydxm commented Aug 26, 2022 • edited Loading

zzydxm commented Aug 26, 2022

sverker commented Aug 29, 2022

zzydxm commented Aug 29, 2022 • edited Loading

sverker commented Aug 30, 2022

sverker commented Aug 30, 2022

sverker commented Aug 30, 2022 • edited Loading

zzydxm commented Aug 31, 2022

zzydxm commented Aug 31, 2022

sverker commented Aug 31, 2022 • edited Loading

sverker commented Aug 31, 2022

zzydxm commented Aug 31, 2022 • edited Loading

sverker commented Sep 1, 2022

zzydxm commented Sep 1, 2022

sverker commented Sep 13, 2022

zzydxm commented Aug 25, 2022 •

edited

Loading

zzydxm commented Aug 26, 2022 •

edited

Loading

zzydxm commented Aug 29, 2022 •

edited

Loading

sverker commented Aug 30, 2022 •

edited

Loading

sverker commented Aug 31, 2022 •

edited

Loading

zzydxm commented Aug 31, 2022 •

edited

Loading