Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(node): App Not Responding with stack traces #9079

Merged
merged 15 commits into from Sep 25, 2023

Conversation

timfish
Copy link
Collaborator

@timfish timfish commented Sep 20, 2023

This PR adds ANR detection for Node via a forked child process. The child process runs the same entry point as the main app. To ensure that the main app code does not run in the child process, we use a promise that only resolves in the main process.

When the captureStackTrace option is enabled, debugging is enabled in the main app process and the child process uses WebSockets to capture stack traces via the v8 inspector API.

Overhead is expected to be minimal. With no ANR detected, the only overhead in the main app process is polling the child process over IPC by default every 50ms. The ANR child process consumes around 50-60 MB or RAM and simply keeps track of the polling. Once ANR has been detected, the main process will get paused briefly in the debugger to capture a stack trace frames. At this point, the event loop has been blocked for seconds so the debugging overhead can be considered negligible.

Once an ANR event has been reported, the child process exits to prevent further duplicate events.

import { init, enableANRDetection } from '@sentry/node';

init({ dsn: "__DSN__" });

// ESM supports top-level await
await enableANRDetection({ captureStackTrace: true });
runApp();

// for CJS you'll need then
enableANRDetection({ captureStackTrace: true }).then(() => {
  runApp();
})

It has the following options:

interface Options {
  /**
   * The app entry script. This is used to run the same script as the child process.
   *
   * Defaults to `process.argv[1]`.
   */
  entryScript: string;
  /**
   * Interval to send alive messages to the child process.
   *
   * Defaults to 50ms.
   */
  pollInterval: number;
  /**
   * Threshold in milliseconds to trigger an ANR event.
   *
   * Defaults to 5000ms.
   */
  anrThreshold: number;
  /**
   * Whether to capture a stack trace when the ANR event is triggered.
   *
   * Defaults to `false`.
   *
   * This uses the node debugger which enables the inspector API and opens the required ports.
   */
  captureStackTrace: boolean;
  /**
   * Log debug information.
   */
  debug: boolean;
}

Here is an example event:
image

@@ -13,7 +13,7 @@ export interface Mechanism {
* it hits the global error/rejection handlers, whether through explicit handling by the user or auto instrumentation.
* Converted to a tag on ingest and used in various ways in the UI.
*/
handled: boolean;
handled?: boolean;
Copy link
Collaborator Author

@timfish timfish Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The android SDK submits ARN events without handled set, and neither true/false make sense here so I made this optional.

@timfish timfish marked this pull request as ready for review September 21, 2023 14:04
@AbhiPrasad AbhiPrasad self-requested a review September 21, 2023 16:42
packages/node-integration-tests/suites/anr/test.ts Dismissed Show dismissed Hide dismissed
packages/node-integration-tests/suites/anr/test.ts Dismissed Show dismissed Hide dismissed
Copy link
Member

@mydea mydea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool! Great work 🎉

Copy link
Member

@AbhiPrasad AbhiPrasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The promise not resolving in the child process is so clever.

Can we add checks + docstrings for minimum node version?

Not going to merge in just yet, I'm going to bikeshed naming with the team on monday and then we can merge it in after final decision has been made there.

packages/node/src/anr/index.ts Outdated Show resolved Hide resolved
packages/utils/src/anr.ts Outdated Show resolved Hide resolved
@timfish
Copy link
Collaborator Author

timfish commented Sep 22, 2023

Can we add checks + docstrings for minimum node version?

The CJS test is passing on the Node v10 integration tests and I can't see any reason why I wouldn't also work on v8 too.

I only testing ESM on Node >= v14 since the test was failing for v12. When was top-level await supported from?

Co-authored-by: Abhijeet Prasad <devabhiprasad@gmail.com>
@AbhiPrasad
Copy link
Member

Top level await is supported for Node 14+, so makes sense for me. Let's make it min required Node 10.

@timfish
Copy link
Collaborator Author

timfish commented Sep 25, 2023

One thing I think this PR doesn't yet cover well enough is handling of unexpected errors. For example if the child process fails to start, we probably want to just log that rather than stop the entire app from starting.

packages/node/src/anr/index.ts Show resolved Hide resolved
packages/utils/src/anr.ts Outdated Show resolved Hide resolved
@AbhiPrasad AbhiPrasad enabled auto-merge (squash) September 25, 2023 15:10
@AbhiPrasad AbhiPrasad merged commit 5a8d4aa into getsentry:develop Sep 25, 2023
69 checks passed
@timfish timfish deleted the feat/node-anr branch September 25, 2023 15:15
billyvg pushed a commit that referenced this pull request Sep 26, 2023
This PR adds ANR detection for Node via a forked child process. The
child process runs the same entry point as the main app. To ensure that
the main app code does not run in the child process, we use a promise
that only resolves in the main process.

When the `captureStackTrace` option is enabled, debugging is enabled in
the main app process and the child process uses WebSockets to capture
stack traces via the v8 inspector API.

Overhead is expected to be minimal. With no ANR detected, the only
overhead in the main app process is polling the child process over IPC
by default every 50ms. The ANR child process consumes around 50-60 MB or
RAM and simply keeps track of the polling. Once ANR has been detected,
the main process will get paused briefly in the debugger to capture a
stack trace frames. At this point, the event loop has been blocked for
seconds so the debugging overhead can be considered negligible.

Once an ANR event has been reported, the child process exits to prevent
further duplicate events.

```ts
import { init, enableANRDetection } from '@sentry/node';

init({ dsn: "__DSN__" });

// ESM supports top-level await
await enableANRDetection({ captureStackTrace: true });
runApp();

// for CJS you'll need then
enableANRDetection({ captureStackTrace: true }).then(() => {
  runApp();
})
```

Co-authored-by: Abhijeet Prasad <devabhiprasad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants