Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Send signal to node process to output a stack trace #25263

Closed
dirkmc opened this issue May 7, 2015 · 19 comments
Closed

Send signal to node process to output a stack trace #25263

dirkmc opened this issue May 7, 2015 · 19 comments

Comments

@dirkmc
Copy link

dirkmc commented May 7, 2015

I have a node application that periodically spikes to 100% on my production server. I would like to be able to send a signal to the node process that will give me a stack trace so I can find where in the code the problem is. Is there any easy way to do this?

My server is ubuntu 14.04.2 and I'm running node 0.12.2

Note that I cannot use console.trace() because that will simply show a trace at the point in the code it is invoked, eg

process.on('SIGUSR2', function() {
    console.trace();
})

Trace at process.<anonymous> (worker.js:14:13)
 at process.emit (events.js:104:17)
 at Signal.wrap.onsignal (node.js:655:46)

Note also that in my tests I found that if the program is in an infinite loop, it does not process user signals until the loop has completed, ie never, eg

process.on('SIGUSR2', function() {
    console.log('Even if a signal is sent, this line is never reached');
});
var i = 0;
while(true) { i++;i--; }
@jasnell
Copy link
Member

jasnell commented May 15, 2015

@dirkmc hmm.. your best bet would be to profile the application in a stress test in a staging or dev environment. You've got something there tying up the event loop. Even if we had a mechanism for inspecting the current trace via a signal I'm not sure it would help much here because your event loop would still be getting stuck. @mhdawson might have some better suggestions, however.

Issue type:question
Answered: Yes
Specific to core: Yes
@joyent/node-coreteam

@dirkmc
Copy link
Author

dirkmc commented May 15, 2015

Thanks @jasnell. I poked around the node and v8 code, and it seems a solution could be to add a signal handler in node.cc that tells v8 to print a stack trace. Unfortunately v8 doesn't expose the method which does so, v8::internal::Isolate::PrintStack(FILE*) I asked them a question about it here, but no response so far:
https://groups.google.com/forum/#!topic/v8-users/faObWjXrC-o

I was able to compile if I simply declare that function in v8::Isolate (rather than v8::internal::Isolate), but then the linker fails because it can't find the object. My c++ foo is pretty out of date, do you guys know if there's a way of compiling v8 such that it will include the objects in the v8::internal namespace in its output?

Thanks again
Dirk

@dirkmc
Copy link
Author

dirkmc commented May 24, 2015

I was able to get a stack trace by compiling my own version of v8 and node.
I added a declaration for the Isolate::PrintStack(FILE*) method in v8.h, and defined it in api.cc:

void Isolate::PrintStack(FILE* out) {
    i::Isolate* isolate = reinterpret_cast<i::Isolate*>(this);
    isolate->PrintStack(out);
}

Then I added a listener for SIGUSR2 in node.cc

static void SignalUsr2(int signo) {
    v8::Isolate::GetCurrent()->PrintStack(stdout);

    uv_tty_reset_mode();
    RegisterSignalHandler(SIGUSR2, SignalUsr2, true);
}

And a similar line in the SignalExit(int) function for good measure.

When my service went to 100% CPU I sent it a SIGUSR1 to put it into debug mode, and then a SIGUSR2 to print a stack trace. This worked 2 out of 3 times. It seems like @bnoordhuis wrote the code for SIGUSR1, perhaps he has a better idea for a more reliable solution.

@dirkmc
Copy link
Author

dirkmc commented May 24, 2015

I'm guessing it may be possible to output a stack trace by sending a message to the debugger (so that you don't need a custom compile of v8)

@mhdawson
Copy link
Member

When you say it worked 2 out of 3 times, what happened in the 1 out of 3 times. Did you get no stack trace or not one that you expected ?

@ivan
Copy link

ivan commented May 25, 2015

Yeah, you can do this without patching V8.

Assuming a buggy.js containing:

function x() {
    var i = 0;
    while(1) {
        i++;
    }
}

function y() {
    x();
}

function z() {
    y();
}

z();
# node --debug-brk buggy.js
Debugger listening on port 5858

In another terminal:

# node debug -p $(pgrep -f 'node.*buggy')
connecting to 127.0.0.1:5858 ... ok
break in buggy.js:16
 14 }
 15 
>16 z();
 17 
 18 });
debug> cont

(wait for the node process to start using 100% CPU)

debug> step
break in buggy.js:3
  1 function x() {
  2     var i = 0;
> 3     while(1) {
  4             i++;
  5     }
debug> bt
#0 buggy.js:3:8
#1 buggy.js:9:2
#2 buggy.js:13:2
#3 buggy.js:16:1

Or to see the nice stack trace:

debug> repl
Press Ctrl + C to leave debug repl
> console.trace()

and switch over to the other terminal to see:

Trace
    at eval (eval at x (/home/at/buggy.js:3:8), <anonymous>:1:9)
    at x (/home/at/buggy.js:3:8)
    at y (/home/at/buggy.js:9:2)
    at z (/home/at/buggy.js:13:2)
    at Object.<anonymous> (/home/at/buggy.js:16:1)
    at Module._compile (module.js:431:26)
    at Object.Module._extensions..js (module.js:449:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:311:12)
    at Module.runMain [as _onTimeout] (module.js:472:10)

@dirkmc
Copy link
Author

dirkmc commented May 26, 2015

@mhdawson: Once out of the three times it just gave me an empty stack trace. I think for some reason it wasn't dropping into debug mode when I sent it SIGUSR1, so when I sent SIGUSR2 there was nothing on the stack. I don't really understand very well what happens to libuv when CPU usage goes to 100%, or the interaction between libuv and the debugger, so that's just a guess.

@dirkmc
Copy link
Author

dirkmc commented May 26, 2015

@ivan are you able to do this without starting node in debug mode?

My situation is complicated because node is running on Amazon EC2 instances, inside Docker containers, and they were only spiking to 100% CPU about once a week, so I didn't want to leave node sitting there running in debug mode on all my servers, and to deal with Docker port mapping etc. I just wanted to be able to send a running instance a signal that would give me a stack trace, much like you can send ctrl-break to a Java program to dump the stack.

@bnoordhuis
Copy link
Member

@dirkmc IIRC, I explained on the v8-users mailing list that what you're trying to do is unsafe from inside a signal handler. You're probably going to have more luck attaching a debugger.

@dirkmc
Copy link
Author

dirkmc commented May 26, 2015

@bnoordhuis I don't recall seeing that message, however I did see your comment on this line of the code saying something similar. I don't understand the libuv/debugger interaction very well but from what I can gather, on SIGUSR1 you signal the debugger thread to invoke a callback that tells the debugger agent to go into debug mode. So I wondered if it would be possible to do something similar to tell it to print a stack trace, perhaps by using the debug protocol to request a backtrace.

@bnoordhuis
Copy link
Member

Oh, it looks like the post was dropped. For posterity, here is my reply:

On Thu, May 14, 2015 at 11:34 PM,  <dirk@redacted> wrote:
> I have a node.js server that periodically spikes to 100% CPU usage in
> production. I would like to be able to send it a signal which causes a stack
> trace, to see where the problem is. So I want to build a custom version of
> node that listens for a signal and calls v8 to print the current stack. It
> looks like the function I need is
> v8::internal::Isolate::Current()->PrintStack(stdout);
>
> My c++ skills are pretty rusty, however if I understand correctly, because
> this is in the internal namespace I'm not able to access it:
>
> ../src/node.cc:3767:21: error: incomplete type 'v8::internal::Isolate' named
> in nested name specifier
>
>       v8::internal::Isolate::Current()->PrintStack(stdout);
>
>       ~~~~~~~~~~~~~~^~~~~~~~~
>
> ../deps/v8/include/v8.h:130:7: note: forward declaration of
> 'v8::internal::Isolate'
>
> class Isolate;
>
>       ^
>
> 1 error generated.
>
>
> Is there some other way to cause v8 to print a stack trace?
> Thanks,
> Dirk

There are two things to consider:

1. As a user of the V8 API, you cannot use anything from the
v8::internal namespace.  The nearest equivalent in the public API is
v8::Message::PrintCurrentStackTrace().  You can find its declaration
in include/v8.h.

2. It's not safe to call into V8 from inside a signal handler or
another thread.  If you do, the odds are good it's going to segfault.

The SIGUSR1 handler in node is very careful to only call async signal-safe functions. (I'm not sure about node to be honest, but the one in io.js is.)

@dirkmc
Copy link
Author

dirkmc commented May 26, 2015

Thanks Ben. FYI I also thought I replied to my own post but it didn't show up, maybe there was an issue with that mailing list.

@mhdawson
Copy link
Member

mhdawson commented Jun 1, 2015

I was wondering about the frequency and the symptoms, because I don't think there is any guarantee with respect to which thread is going to handle your signal. It may be that when the signal was handled by the thread consuming the CPU you get the expected stack trace while in the other case it was some other thread that ended up handling the signal.

@dirkmc
Copy link
Author

dirkmc commented Jun 2, 2015

@mhdawson that's interesting. From poking around in the code I got the impression that the javascript code is executed in a single thread, managed by libuv, while the c++ main thread handles signals. It appears that the code in the main thread that listens for SIGUSR1 will kick off the debugger in a third thread, and then signal the debugger thread to invoke a callback. I assume a similar mechanism could be used to guarantee a stack trace of the javascript thread, perhaps by using the debug protocol to request a backtrace but my knowledge of the thread model is pretty limited, do you know if that would make sense?

@mhdawson
Copy link
Member

mhdawson commented Jun 2, 2015

While it is true that a single thread is used to run most of the code, other threads are used() and the signal might get handled by one of them. Some examples include within libuv for dns queries and some file system requests as well as the debugger one you mentioned.

Unfortunately I've not had time to dig into the debug protocol/implementation yet so I can't make and informed comment.

At least with pthreads it is possible to send a signal to a specific thread with pthread_kill so its possible the debugger could ensure it runs on the right thread. If you had a list of the threads you could send the signal to all of them to get a stack trace but that would require getting the pthread_t for all of the threads. Having a signal run on multiple threads would also likely increase the chance of running across the issues mentioned by bnoordhuis.

One other thought is that V8 does have an option to do profiling which we use in HealthCenter (http://www-01.ibm.com/support/knowledgecenter/SS3KLZ/com.ibm.java.diagnostics.healthcenter.doc/topics/enablingagent.html). Since we can dynamically enable profiling what you might be able to do is to dynamically enable it when you send the signal. The profiling info might in turn help you identify what is causing the spike in cpu.

@dirkmc
Copy link
Author

dirkmc commented Jun 3, 2015

Thanks for the detailed explanation. The one part I'm not sure about, if there are several threads, how is it guaranteed that the main thread will receive the SIGUSR1 signal when a user wants to put node into debug mode?

@sam-github
Copy link

I apologize in advance for the commercial plug, but it sounds like you have a bug in production, and I'd feel bad not mentioning a tool we (strongloop) have for finding specifically these kinds of bugs.

Its a variant of the v8 debugger that only kicks in and starts profiling when an event loop blocks for more than a configurable threshold: http://docs.strongloop.com/display/SLC/Smart+profiling+with+slc

That doesn't address the feature request, but if this is a production bug, you may wish to investigate this tool.

@mhdawson
Copy link
Member

Sorry for the last response. In respect to the question: " how is it guaranteed that the main thread will receive the SIGUSR1 signal when a user wants to put node into debug mode?".

I think the answer is that there is no guarantee. The code that runs when the signal is received has to work around this, for example by getting the list of all threads and sending signals directly to the threads using pthread_kill

@dirkmc
Copy link
Author

dirkmc commented Jul 7, 2015

@mhdawson makes sense. The code doesn't appear to do that at present, which may explain why it seems like I wasn't always able to get node to drop into debug mode with a signal.

@sam-github that's interesting, but frankly I'd rather hear that there was an open source solution. It seems less than ideal that your company has a commercial interest in not resolving my issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants