Skip to content

feat(dashboards): Add Node.js runtime metrics prebuilt dashboard UI#113517

Open
chargome wants to merge 10 commits intomasterfrom
cg/nodejs-health-dashboard-frontend
Open

feat(dashboards): Add Node.js runtime metrics prebuilt dashboard UI#113517
chargome wants to merge 10 commits intomasterfrom
cg/nodejs-health-dashboard-frontend

Conversation

@chargome
Copy link
Copy Markdown
Member

@chargome chargome commented Apr 21, 2026

Add the Node.js Runtime Metrics prebuilt dashboard and its onboarding panel. Seven widgets:

  • Event Loop Utilization, CPU Utilization, Process Uptime (KPIs)
  • Memory Usage (RSS / heap total / heap used — area chart)
  • CPU Utilization Over Time (line)
  • HTTP Request Duration (p50 / p95 — line, spans dataset)

Surfaces the node.runtime.* trace metrics emitted by nodeRuntimeMetricsIntegration (getsentry/sentry-javascript#19923) so users can monitor CPU, memory, and event loop health alongside request latency.

Metric names, types, and aggregations are aligned with the SDK:

  • max() on the pre-computed p50/p99 event loop delay percentiles (averaging pre-computed percentiles is statistically incorrect).
  • sum() on process.uptime — the SDK emits elapsed-since-last-tick deltas as a counter, so summing across the window gives total process-seconds.

Trade-off: users with transactions but no runtime metrics will see empty widgets rather than the onboarding panel.

Onboarding:
Screenshot 2026-04-20 at 15 39 56

Dashboard:
Screenshot 2026-04-21 at 10 38 08

depends on #113516
ref getsentry/sentry-javascript#19923
closes https://linear.app/getsentry/issue/DAIN-1532/create-prebuilt-nodejs-health-dashboard

@chargome chargome self-assigned this Apr 21, 2026
@chargome chargome requested a review from a team as a code owner April 21, 2026 08:40
@linear-code
Copy link
Copy Markdown

linear-code Bot commented Apr 21, 2026

@github-actions github-actions Bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Apr 21, 2026
Copy link
Copy Markdown
Member

@gggritso gggritso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM overall, but the aggregation of percentiles is … very scary, are you sure you want to do that?

Comment on lines +128 to +139
name: '',
// max() is used because p50/p99 are precomputed percentiles from the SDK's
// event loop delay histogram. Averaging precomputed percentiles is statistically
// incorrect; max() surfaces the worst-observed value per bucket.
fields: [
metric('max', 'node.runtime.event_loop.delay.p50', 'gauge', 'second'),
metric('max', 'node.runtime.event_loop.delay.p99', 'gauge', 'second'),
],
aggregates: [
metric('max', 'node.runtime.event_loop.delay.p50', 'gauge', 'second'),
metric('max', 'node.runtime.event_loop.delay.p99', 'gauge', 'second'),
],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man! This feels like a statistics crime… am I guessing right that the SDK doesn't emit the actual measurements? What interval is the p50 and p99 collected over? The docs just say "periodically". IMO this is highly spooky. Re-aggregating aggregated metrics in any context is pretty misleading. Imagine someone has ~100 VMs running Node.js, and they're collecting metrics on this. If there were 1,000,000 measurements of 10ms and two measurements of 1,000ms the true p99 is 10ms, but with aggregation-on-aggregation this could spit out something totally different! Is there any other way to do this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair point, this is basically just showing the worst observed percentile since we max() on it instead of avg(). The SDK can't emit raw measurements here but only pre-aggregated values from the node api. I don't really see a solution here as we cannot show this per instance (e.g. in the serverless case). We could either explicitly call this out or just drop the widget wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I feel like this data is valuable, and I don't want to hide it completely, it's specifically the aggregation I'm worried about. What do you think about, instead of using a chart, showing a table of measurements? Sentry metrics are individual events, so maybe we can take advantage of that here, and show a table of worst p99 event loop delay measurements?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced the widget with two tables for showing the top 10 worst cases for both p50 and p99, no more aggregation on this data

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭 sorry I misled you here, we don't support Table widgets for Metrics datasets! We will soon though, I think, so maybe worth waiting?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, I just ported this over from the custom dashboard where it did work. We can ship without it and create a follow up maybe?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me 👍

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Copy Markdown
Member

@gggritso gggritso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just two nits about using existing types 👍🏻

@@ -0,0 +1,13 @@
type TraceMetricAggregation = 'avg' | 'sum' | 'max';
type TraceMetricType = 'gauge' | 'counter';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the existing TraceMetricTypeValue type

@@ -0,0 +1,13 @@
type TraceMetricAggregation = 'avg' | 'sum' | 'max';
type TraceMetricType = 'gauge' | 'counter';
type TraceMetricUnit = 'none' | 'byte' | 'second';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use DataUnit for this one 👍🏻

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1d159e0. Configure here.

Comment thread static/app/views/dashboards/utils/prebuiltConfigs/utils/traceMetricField.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants