Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug SIGBUS error #3168

Closed
Wolvan opened this issue Jan 8, 2021 · 7 comments
Closed

How to debug SIGBUS error #3168

Wolvan opened this issue Jan 8, 2021 · 7 comments

Comments

@Wolvan
Copy link

Wolvan commented Jan 8, 2021

  • Node.js Version: 14.15.1, 15.5.1, 10.16.3
  • OS: macOS 10.14.6
  • Scope (install, code, runtime, meta, other?): runtime
  • Module (and version) (if relevant):

Working a piece of code that does a lot of fs interaction, my node application terminates unrecoverably with a SIGBUS error. No Stack Trace is available that could lead me towards the faulting module or Node Core.

I decided to listen to the SIGBUS signal and execute dmesg at crashtime, which worked with the following output

 address 0x12ca180b0000, protections were read-write
Data/Stack execution not permitted: node[pid 14036] at virtual address 0x12ca180b0000, protections were read-write

The following run crashed again sometime in with a similar dmesg output, but the virtual address 0x105004000

So far it seemed to relate to executing remove/unlink on a file or directory, but I changed my strategy to calling the shell's rm function instead of using node's integrated unlink, the same issue occurs.

I know there is the module segfault-handler, but that does not catch stack traces for SIGBUS, I already tried.

@gireeshpunathil
Copy link
Member

Can you run with --report-signal=SIGBUS --report-on-signal (https://nodejs.org/dist/latest-v15.x/docs/api/cli.html#cli_report_on_signal) that may capture a stack trace?

@Wolvan
Copy link
Author

Wolvan commented Jan 8, 2021

Thank you, I didn't know this existed. I'll try it and report back.

@Wolvan
Copy link
Author

Wolvan commented Jan 8, 2021

Unfortunately, the report doesn't really seem to help me much
I attached it in this gist for easy viewing
https://gist.github.com/Wolvan/deb0adec145d3cb80b7b961dba290b22

@gireeshpunathil
Copy link
Member

@Wolvan - agree, looks like the stack is showing the signal handler context, not the failing context (signal originating context) and that is not very useful.

Do you have lldb installed? if so, running the code on lldb, will allow the debugger trap and stop at SIGBUS at which time you can do a bt to get the call stack?

@Wolvan
Copy link
Author

Wolvan commented Jan 13, 2021

Problem seems to come from FSEvents module. Not getting the SIGBUS signal in lldb but EXC_BAD_ACCESS seems to be very much related.

I updated FSEvents and will see if the problem persists, but I attached the lldb output of error and stack below, maybe you see something that I don't? This was my first time using lldb, after all.

Process 30404 stopped
* thread #14, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001047a6875 fsevents.node`fse_handle_events + 228
fsevents.node`fse_handle_events:
->  0x1047a6875 <+228>: jmpq   *%rcx
    0x1047a6877 <+230>: addq   $0x28, %rsp
    0x1047a687b <+234>: popq   %rbx
    0x1047a687c <+235>: popq   %r12
(llnode) jsstack
 * thread #14: tid = 0x590e4, 0x00000001047a6875 fsevents.node`fse_handle_events + 228, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001047a6875 fsevents.node`fse_handle_events + 228
    frame #1: 0x00007fff4b15e12e FSEvents`implementation_callback_rpc + 2991
    frame #2: 0x00007fff4b15d50a FSEvents`_Xcallback_rpc + 231
    frame #3: 0x00007fff4b15d406 FSEvents`FSEventsD2F_server + 55
    frame #4: 0x00007fff4b15fcbf FSEvents`FSEventsClientProcessMessageCallback + 43
    frame #5: 0x00007fff49bb8329 CoreFoundation`__CFMachPortPerform + 246
    frame #6: 0x00007fff49bb8227 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 41
    frame #7: 0x00007fff49bb8185 CoreFoundation`__CFRunLoopDoSource1 + 527
    frame #8: 0x00007fff49ba00b0 CoreFoundation`__CFRunLoopRun + 2524
    frame #9: 0x00007fff49b9f482 CoreFoundation`CFRunLoopRunSpecific + 455
    frame #10: 0x00007fff49b9f296 CoreFoundation`CFRunLoopRun + 40
    frame #11: 0x00000001047a675c fsevents.node`fse_run_loop + 87
    frame #12: 0x00007fff75d3d2eb libsystem_pthread.dylib`_pthread_body + 126
    frame #13: 0x00007fff75d40249 libsystem_pthread.dylib`_pthread_start + 66
    frame #14: 0x00007fff75d3c40d libsystem_pthread.dylib`thread_start + 13

@gireeshpunathil
Copy link
Member

  • as you already rightly said, node is not involved in the failing sequence, instead fsevents module
  • the failing instruction suggests it is a wild branch
  • and the only dynamic branch / call in fse_handle_events is at https://github.com/fsevents/fsevents/blob/328ae396700969fd8345f13cc4fb88c495517cd9/src/fsevents.c#L185 , which is the call to instance-callback
  • so the most probable cause is that instance object (data in the function parameter listing) or its callback field is bad / corrupt.
  • this can be verified by i) dumping the whole instructions of fse_handle_events upto the failing context, ii) dumping the instance content as fse_instance_t *, iii) dumping rcx content

@Wolvan
Copy link
Author

Wolvan commented Jan 14, 2021

Knowing it was fsevents that caused me problems brought me on the right track.
I used npm list fsevents to figure out which module made use of it, turns out it was chokidar.
A bit of googling later I figured out that this error can happen when not waiting for a chokidar watcher to be closed (calling watcher.close() without waiting for the returned promise to resolve) and then performing fs operations in the watched directory. Throwing an await in front of close() seemed to do the trick, thanks a lot for your help!

Actually helped me understand debugging c++ code, so that was a nice side effect. I appreciate the time you've taken for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants