-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Send signal to node process to output a stack trace #25263
Comments
@dirkmc hmm.. your best bet would be to profile the application in a stress test in a staging or dev environment. You've got something there tying up the event loop. Even if we had a mechanism for inspecting the current trace via a signal I'm not sure it would help much here because your event loop would still be getting stuck. @mhdawson might have some better suggestions, however. Issue type:question |
Thanks @jasnell. I poked around the node and v8 code, and it seems a solution could be to add a signal handler in node.cc that tells v8 to print a stack trace. Unfortunately v8 doesn't expose the method which does so, I was able to compile if I simply declare that function in v8::Isolate (rather than v8::internal::Isolate), but then the linker fails because it can't find the object. My c++ foo is pretty out of date, do you guys know if there's a way of compiling v8 such that it will include the objects in the v8::internal namespace in its output? Thanks again |
I was able to get a stack trace by compiling my own version of v8 and node.
Then I added a listener for SIGUSR2 in node.cc
And a similar line in the When my service went to 100% CPU I sent it a SIGUSR1 to put it into debug mode, and then a SIGUSR2 to print a stack trace. This worked 2 out of 3 times. It seems like @bnoordhuis wrote the code for SIGUSR1, perhaps he has a better idea for a more reliable solution. |
I'm guessing it may be possible to output a stack trace by sending a message to the debugger (so that you don't need a custom compile of v8) |
When you say it worked 2 out of 3 times, what happened in the 1 out of 3 times. Did you get no stack trace or not one that you expected ? |
Yeah, you can do this without patching V8. Assuming a function x() {
var i = 0;
while(1) {
i++;
}
}
function y() {
x();
}
function z() {
y();
}
z();
In another terminal:
(wait for the node process to start using 100% CPU)
Or to see the nice stack trace:
and switch over to the other terminal to see:
|
@mhdawson: Once out of the three times it just gave me an empty stack trace. I think for some reason it wasn't dropping into debug mode when I sent it SIGUSR1, so when I sent SIGUSR2 there was nothing on the stack. I don't really understand very well what happens to libuv when CPU usage goes to 100%, or the interaction between libuv and the debugger, so that's just a guess. |
@ivan are you able to do this without starting node in debug mode? My situation is complicated because node is running on Amazon EC2 instances, inside Docker containers, and they were only spiking to 100% CPU about once a week, so I didn't want to leave node sitting there running in debug mode on all my servers, and to deal with Docker port mapping etc. I just wanted to be able to send a running instance a signal that would give me a stack trace, much like you can send ctrl-break to a Java program to dump the stack. |
@dirkmc IIRC, I explained on the v8-users mailing list that what you're trying to do is unsafe from inside a signal handler. You're probably going to have more luck attaching a debugger. |
@bnoordhuis I don't recall seeing that message, however I did see your comment on this line of the code saying something similar. I don't understand the libuv/debugger interaction very well but from what I can gather, on SIGUSR1 you signal the debugger thread to invoke a callback that tells the debugger agent to go into debug mode. So I wondered if it would be possible to do something similar to tell it to print a stack trace, perhaps by using the debug protocol to request a backtrace. |
Oh, it looks like the post was dropped. For posterity, here is my reply:
The SIGUSR1 handler in node is very careful to only call async signal-safe functions. (I'm not sure about node to be honest, but the one in io.js is.) |
Thanks Ben. FYI I also thought I replied to my own post but it didn't show up, maybe there was an issue with that mailing list. |
I was wondering about the frequency and the symptoms, because I don't think there is any guarantee with respect to which thread is going to handle your signal. It may be that when the signal was handled by the thread consuming the CPU you get the expected stack trace while in the other case it was some other thread that ended up handling the signal. |
@mhdawson that's interesting. From poking around in the code I got the impression that the javascript code is executed in a single thread, managed by libuv, while the c++ main thread handles signals. It appears that the code in the main thread that listens for SIGUSR1 will kick off the debugger in a third thread, and then signal the debugger thread to invoke a callback. I assume a similar mechanism could be used to guarantee a stack trace of the javascript thread, perhaps by using the debug protocol to request a backtrace but my knowledge of the thread model is pretty limited, do you know if that would make sense? |
While it is true that a single thread is used to run most of the code, other threads are used() and the signal might get handled by one of them. Some examples include within libuv for dns queries and some file system requests as well as the debugger one you mentioned. Unfortunately I've not had time to dig into the debug protocol/implementation yet so I can't make and informed comment. At least with pthreads it is possible to send a signal to a specific thread with pthread_kill so its possible the debugger could ensure it runs on the right thread. If you had a list of the threads you could send the signal to all of them to get a stack trace but that would require getting the pthread_t for all of the threads. Having a signal run on multiple threads would also likely increase the chance of running across the issues mentioned by bnoordhuis. One other thought is that V8 does have an option to do profiling which we use in HealthCenter (http://www-01.ibm.com/support/knowledgecenter/SS3KLZ/com.ibm.java.diagnostics.healthcenter.doc/topics/enablingagent.html). Since we can dynamically enable profiling what you might be able to do is to dynamically enable it when you send the signal. The profiling info might in turn help you identify what is causing the spike in cpu. |
Thanks for the detailed explanation. The one part I'm not sure about, if there are several threads, how is it guaranteed that the main thread will receive the SIGUSR1 signal when a user wants to put node into debug mode? |
I apologize in advance for the commercial plug, but it sounds like you have a bug in production, and I'd feel bad not mentioning a tool we (strongloop) have for finding specifically these kinds of bugs. Its a variant of the v8 debugger that only kicks in and starts profiling when an event loop blocks for more than a configurable threshold: http://docs.strongloop.com/display/SLC/Smart+profiling+with+slc That doesn't address the feature request, but if this is a production bug, you may wish to investigate this tool. |
Sorry for the last response. In respect to the question: " how is it guaranteed that the main thread will receive the SIGUSR1 signal when a user wants to put node into debug mode?". I think the answer is that there is no guarantee. The code that runs when the signal is received has to work around this, for example by getting the list of all threads and sending signals directly to the threads using pthread_kill |
@mhdawson makes sense. The code doesn't appear to do that at present, which may explain why it seems like I wasn't always able to get node to drop into debug mode with a signal. @sam-github that's interesting, but frankly I'd rather hear that there was an open source solution. It seems less than ideal that your company has a commercial interest in not resolving my issue. |
I have a node application that periodically spikes to 100% on my production server. I would like to be able to send a signal to the node process that will give me a stack trace so I can find where in the code the problem is. Is there any easy way to do this?
My server is ubuntu 14.04.2 and I'm running node 0.12.2
Note that I cannot use
console.trace()
because that will simply show a trace at the point in the code it is invoked, egNote also that in my tests I found that if the program is in an infinite loop, it does not process user signals until the loop has completed, ie never, eg
The text was updated successfully, but these errors were encountered: