feat(dashboards): Add Node.js runtime metrics prebuilt dashboard UI#113517
feat(dashboards): Add Node.js runtime metrics prebuilt dashboard UI#113517
Conversation
gggritso
left a comment
There was a problem hiding this comment.
This LGTM overall, but the aggregation of percentiles is … very scary, are you sure you want to do that?
| name: '', | ||
| // max() is used because p50/p99 are precomputed percentiles from the SDK's | ||
| // event loop delay histogram. Averaging precomputed percentiles is statistically | ||
| // incorrect; max() surfaces the worst-observed value per bucket. | ||
| fields: [ | ||
| metric('max', 'node.runtime.event_loop.delay.p50', 'gauge', 'second'), | ||
| metric('max', 'node.runtime.event_loop.delay.p99', 'gauge', 'second'), | ||
| ], | ||
| aggregates: [ | ||
| metric('max', 'node.runtime.event_loop.delay.p50', 'gauge', 'second'), | ||
| metric('max', 'node.runtime.event_loop.delay.p99', 'gauge', 'second'), | ||
| ], |
There was a problem hiding this comment.
Oh man! This feels like a statistics crime… am I guessing right that the SDK doesn't emit the actual measurements? What interval is the p50 and p99 collected over? The docs just say "periodically". IMO this is highly spooky. Re-aggregating aggregated metrics in any context is pretty misleading. Imagine someone has ~100 VMs running Node.js, and they're collecting metrics on this. If there were 1,000,000 measurements of 10ms and two measurements of 1,000ms the true p99 is 10ms, but with aggregation-on-aggregation this could spit out something totally different! Is there any other way to do this?
There was a problem hiding this comment.
Yeah fair point, this is basically just showing the worst observed percentile since we max() on it instead of avg(). The SDK can't emit raw measurements here but only pre-aggregated values from the node api. I don't really see a solution here as we cannot show this per instance (e.g. in the serverless case). We could either explicitly call this out or just drop the widget wdyt?
There was a problem hiding this comment.
🤔 I feel like this data is valuable, and I don't want to hide it completely, it's specifically the aggregation I'm worried about. What do you think about, instead of using a chart, showing a table of measurements? Sentry metrics are individual events, so maybe we can take advantage of that here, and show a table of worst p99 event loop delay measurements?
There was a problem hiding this comment.
replaced the widget with two tables for showing the top 10 worst cases for both p50 and p99, no more aggregation on this data
There was a problem hiding this comment.
😭 sorry I misled you here, we don't support Table widgets for Metrics datasets! We will soon though, I think, so maybe worth waiting?
There was a problem hiding this comment.
oh right, I just ported this over from the custom dashboard where it did work. We can ship without it and create a follow up maybe?
gggritso
left a comment
There was a problem hiding this comment.
LGTM, just two nits about using existing types 👍🏻
| @@ -0,0 +1,13 @@ | |||
| type TraceMetricAggregation = 'avg' | 'sum' | 'max'; | |||
| type TraceMetricType = 'gauge' | 'counter'; | |||
There was a problem hiding this comment.
You can use the existing TraceMetricTypeValue type
| @@ -0,0 +1,13 @@ | |||
| type TraceMetricAggregation = 'avg' | 'sum' | 'max'; | |||
| type TraceMetricType = 'gauge' | 'counter'; | |||
| type TraceMetricUnit = 'none' | 'byte' | 'second'; | |||
There was a problem hiding this comment.
You can use DataUnit for this one 👍🏻
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1d159e0. Configure here.

Add the Node.js Runtime Metrics prebuilt dashboard and its onboarding panel. Seven widgets:
Surfaces the
node.runtime.*trace metrics emitted bynodeRuntimeMetricsIntegration(getsentry/sentry-javascript#19923) so users can monitor CPU, memory, and event loop health alongside request latency.Metric names, types, and aggregations are aligned with the SDK:
max()on the pre-computed p50/p99 event loop delay percentiles (averaging pre-computed percentiles is statistically incorrect).sum()onprocess.uptime— the SDK emits elapsed-since-last-tick deltas as a counter, so summing across the window gives total process-seconds.Trade-off: users with transactions but no runtime metrics will see empty widgets rather than the onboarding panel.
Onboarding:

Dashboard:

depends on #113516
ref getsentry/sentry-javascript#19923
closes https://linear.app/getsentry/issue/DAIN-1532/create-prebuilt-nodejs-health-dashboard