-
-
Notifications
You must be signed in to change notification settings - Fork 34.9k
Description
- Version: v7.1.0
- Platform: Darwin Rodgers-MacBook-Pro.local 16.3.0 Darwin Kernel Version 16.3.0: Tue Nov 29 12:39:07 PST 2016; root:xnu-3789.31.2~11/RELEASE_X86_64 x86_64
- Subsystem: libuv
When running a gulp server on a complex project, I see a crash approximately daily. The backtrace on the crashing thread looks like this:
* thread #9: tid = 0x0008, 0x00007fffb73f5dd6 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
* frame #0: 0x00007fffb73f5dd6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fffb74e1787 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fffb735b420 libsystem_c.dylib`abort + 129
frame #3: 0x0000000100949c8e node`uv_cond_wait + 20
frame #4: 0x000000010093d47b node`worker + 227
frame #5: 0x00007fffb74deaab libsystem_pthread.dylib`_pthread_body + 180
frame #6: 0x00007fffb74de9f7 libsystem_pthread.dylib`_pthread_start + 286
frame #7: 0x00007fffb74de1fd libsystem_pthread.dylib`thread_start + 13
That's this call to abort(): https://github.com/libuv/libuv/blob/3064ae98e5c3cee223f9e229ff20f86cb1b06b8b/src/unix/thread.c#L498
In this call to uv_cond_wait():
Line 75 in db1087c
| uv_cond_wait(&cond, &mutex); |
libuv doesn't provide any detail on the failure, but we can guess that pthread_cond_wait is returning EINVAL. This is odd, since main() hadn't exited and exit() hadn't been called, so we can assume we hadn't called cleanup(), so mutex and cond shouldn't have been destroyed. The mutex and condition variable are both static and only used together, so we don't have a mismatch, and quick overview indicates we definitely have the mutex locked when the call is made.
So… I really have no idea what's going on here. My only remaining guess is some sort of memory corruption, but you'd think we could eliminate that since we're crashing at the same point so consistently.
I notice I'm a minor-version behind, so I'll upgrade and see if it's resolved, but the relevant source file hasn't changed since 2015 and there's nothing in the recent changelog about this (that I can find), so I'd be surprised if that was the issue.
Any ideas?