Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate switching to uncaughtExceptionMonitor event #2367

Open
trentm opened this issue Oct 12, 2021 · 1 comment
Open

investigate switching to uncaughtExceptionMonitor event #2367

trentm opened this issue Oct 12, 2021 · 1 comment
Labels
agent-nodejs Make available for APM Agents project planning.

Comments

@trentm
Copy link
Member

trentm commented Oct 12, 2021

Node v12.17.0 added the uncaughtExceptionMonitor process event, separate from uncaughtException, I'm guessing to provide a mechanism for doing something with these without otherwise changing node's default uncaughtException handler. It would be good to use this when possible. Perhaps we guard on the current node version, or we wait until v12.17 is a base supported version.

@trentm
Copy link
Member Author

trentm commented Aug 8, 2023

With the coming 4.x release the base supported Node.js version will be v14. That means that considering using uncaughtExceptionMonitor -- instead of our current usage of uncaughtException -- for capturing uncaught exceptions is possible.

tl;dr: I think it would be a good idea, but ultimately not something we should do now because there is more dev effort involved than I want to do right now.


Ideally, we want to get the Node.js core handling of uncaughtException. I.e. to get the sync process termination and to get this output on stderr:

/Users/trentm/el/apm-agent-nodejs12/foo.js:47
    throw new Error('boom');
    ^

Error: boom
    at Immediate.<anonymous> (/Users/trentm/el/apm-agent-nodejs12/foo.js:47:11)
    at processImmediate (node:internal/timers:466:21)

That first block (up to the ^) is handled in node_errors.cpp#GetErrorSource. Our current logUncaughtExceptions: true handling just does console.error(err), which does not print that first block.

The only way to get this Node.js core handling is to:

  1. Not have an uncaughtException handler registered. This is so that process.emit('uncaughtException', ...) in Node.js core (in lib/internal/process/execution.js#createOnGlobalUncaughtException()) returns true. This can be done by using uncaughtExceptionMonitor or by shimming process.emit().
  2. And we must report synchronously. The only way to do this async HTTP request to APM server is to do it in a sync child process. (I'm excluding a potential alternative to write out some error data synchronously to file, then attempt to pick that up and send it if/when the service restarted -- because this would be limited and brittle.)

Pros (of using uncaughtExceptionMonitor and sync reporting):

  • The APM agent behaviour on uncaughtException is less invasive. The current behaviour has these impacts on user code:
    • Any process.on('uncaughtException', ...) handler by the user that itself does process.exit(1) will break the APM agent's async reporting.
    • Because our handler is async, the user's application keeps running for a while (the time to capture the error and flush to APM server) after the exception.
  • We can use uncaughtExceptionMonitor, which means we get the core Node.js exception handling -- just with the delay to capture and flush the error to APM server.

Cons:

  • This will take some time to implement. It isn't trivial because the current stacktrace collection (and source-map handling) is async and at least some of that (gathering CallSite info) needs to be done in-process.
  • Ultimately the overhead added from time-error-is-thrown to time-the-process-exits will be increased because there will be the added time of a child process. I think this is minor: if a user's app is crashing, then that added latency shouldn't be relevant.
  • The child process could fail, if the application cannot create child processes; e.g. because of capabilities or being in an environment with exhausted handles.
  • Note that the out of process reporting would report just the error and not any other APM events currently in the queue to be flushed. That's both unfortunate and reasonable.

Other notes:

  • Using process.setUncaughtExceptionCaptureCallback(fn) does not help us much because, it is just as invasive (maybe more so) than adding our own uncaughtException handler. When used, the uncaughtException event is not sent at all. (I suppose that does meant that a user's own uncaughtException handler won't break the APM agent's own, but it would be a confusion side-effect for the user.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

No branches or pull requests

1 participant