Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredump during handle_alias_message #7996

Open
itssundeep opened this issue Jan 4, 2024 · 2 comments
Open

Coredump during handle_alias_message #7996

itssundeep opened this issue Jan 4, 2024 · 2 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@itssundeep
Copy link
Contributor

Describe the bug
We notice occasional coredumps during cleanup of handle_alias_message. Looks like some race condition of double free

To Reproduce
Happens occassionally

Expected behavior
No coredumps

Affected versions
26.0

Additional context

Coredump bt

* thread #1, name = 'beam.frmptr.smp', stop reason = signal SIGSEGV: address not mapped to object
    frame #0: 0x0000000000885077 beam.frmptr.smp`erts_deref_dist_entry [inlined] ethr_native_atomic64_add_return_mb(incr=-1, var=0xffffffffffffffd8) at atomic.h:240:5
    frame #1: 0x0000000000885070 beam.frmptr.smp`erts_deref_dist_entry [inlined] ethr_atomic_add_read(val=-1, var=0xffffffffffffffd8) at ethr_atomics.h:4219:25
    frame #2: 0x0000000000885070 beam.frmptr.smp`erts_deref_dist_entry [inlined] ethr_atomic_dec_read(var=0xffffffffffffffd8) at ethr_atomics.h:4806:11
    frame #3: 0x0000000000885070 beam.frmptr.smp`erts_deref_dist_entry [inlined] erts_refc_dectest(min_val=0, refcp=0xffffffffffffffd8) at sys.h:1032:23
    frame #4: 0x0000000000885070 beam.frmptr.smp`erts_deref_dist_entry [inlined] erts_bin_release(bp=0xffffffffffffffd0) at erl_binary.h:508:9
    frame #5: 0x0000000000885070 beam.frmptr.smp`erts_deref_dist_entry [inlined] de_refc_dec(min=0, dep=0x0000000000000000) at erl_node_tables.c:98:5
    frame #6: 0x0000000000885070 beam.frmptr.smp`erts_deref_dist_entry(dep=0x0000000000000000) at erl_node_tables.c:120:5
    frame #7: 0x00000000007e64bd beam.frmptr.smp`erts_free_dist_ext_copy(edep=0x00007f3f6d322c50) at external.c:890:5
    frame #8: 0x00000000007a2ced beam.frmptr.smp`erts_cleanup_messages at erl_message.c:227:13
  * frame #9: 0x00000000007a2cd8 beam.frmptr.smp`erts_cleanup_messages(msgp=0x00007f3f6d322bf0) at erl_message.c:251:2
    frame #10: 0x00000000007b465e beam.frmptr.smp`handle_alias_message(c_p=0x00007f3f71377640, sig=0x00007f3f6d322bf0, next_nm_sig=0x00007f3f713777b0) at erl_proc_sig_queue.c:5305:9
    frame #11: 0x00000000007b57a2 beam.frmptr.smp`erts_proc_sig_handle_incoming(c_p=0x00007f3f71377640, statep=0x00007f3f113faafc, redsp=0x00007f3f113fab00, max_reds=4000, local_only=<unavailable>) at erl_proc_sig_queue.c:6018:20
    frame #12: 0x0000000000621619 beam.frmptr.smp`erts_schedule(esdp=<unavailable>, p=0x00007f3f71377640, calls=0) at erl_process.c:10151:14
    
   (lldb) f 9
frame #9: 0x00000000007a2cd8 beam.frmptr.smp`erts_cleanup_messages(msgp=0x00007f3f6d322bf0) at erl_message.c:251:2
(lldb) p msgp
(ErtsMessage *) 0x00007f3f6d322bf0
(lldb) p *msgp
(ErtsMessage) {
  next = NULL
  data = {
    heap_frag = 0x00007f3f6d322c20
    attached = 0x00007f3f6d322c20
  }
  m = ([0] = 0, [1] = 59, [2] = 2236427, [3] = 59)
  hfrag = {
    next = 0x00007f3f6d310dc8
    off_heap = {
      first = NULL
      overhead = 0
    }
    alloc_size = 1
    used_size = 1
    mem = ([0] = 139910391598642)
  }
}

From here we can notice we are clearing null dep entry.

(lldb) f 7
frame #7: 0x00000000007e64bd beam.frmptr.smp`erts_free_dist_ext_copy(edep=0x00007f3f6d322c50) at external.c:890:5
(lldb) p edep
(ErtsDistExternal *) 0x00007f3f6d322c50
(lldb) p *edep
(ErtsDistExternal) {
  heap_size = 26388279066776
  dep = NULL
  flags = 0
  connection_id = 0
  data = NULL
@itssundeep itssundeep added the bug Issue is reported as a bug label Jan 4, 2024
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Jan 6, 2024
@max-au
Copy link
Contributor

max-au commented Jan 6, 2024

This very much reminds me of #7915 that contains a change I don't fully understand (this one: d94af94 ) but appearing somewhat related.
Has that PR been merged in the OTP version that caused crashes?

@rickard-green rickard-green self-assigned this Jan 8, 2024
@itssundeep
Copy link
Contributor Author

Has that PR been merged in the OTP version that caused crashes?
No, we are on 26.0.

Thanks for the response, we are planning to move to 26.2.1, so we can check and see if it reduces the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

4 participants