-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: runtime metrics #375
Conversation
Codecov Report
@@ Coverage Diff @@
## main #375 +/- ##
==========================================
- Coverage 93.63% 93.08% -0.55%
==========================================
Files 9 12 +3
Lines 330 434 +104
Branches 87 104 +17
==========================================
+ Hits 309 404 +95
- Misses 21 30 +9
Continue to review full report at Codecov.
|
docs/advanced-config.md
Outdated
| Environment variable<br>``startMetrics()`` argument | Default value | Support | Notes | ||
| --------------------------------------------------------------- | ----------------------- | ------- | --- | ||
| `SPLUNK_METRICS_ENABLED`<br>`enabled` | `false` | Experimental | Enabled metrics export. See [metrics documentation](metrics.md) for more information. | ||
| `SPLUNK_METRICS_ENDPOINT`<br>`endpoint` | `http://localhost:9943` | Experimental | The SignalFx metrics endpoint to send to. | ||
| `SPLUNK_METRICS_EXPORT_INTERVAL`<br>`exportInterval` | `5000` | Experimental | The interval, in milliseconds, of metrics collection and exporting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps an extra column for the startMetrics
argument would make it easier to scan the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/SignalFx/Splunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a link to SignalFx client. Followed the existing doc style and startTracing
did not have the extra column either, should I split both of them to separate column?
docs/advanced-config.md
Outdated
|
||
#### Additional `startMetrics` config options | ||
|
||
- `signalfx`: A JS object with optional `client` and `dimensions` fields. If you have already setup a SignalFx client with custom configuration, it is possible use this for sending instead of creating, configuring a new one. `dimensions` object adds a pre-defined dimension for each datapoint. The format for `dimensions` is `{key: value, ...}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `signalfx`: A JS object with optional `client` and `dimensions` fields. If you have already setup a SignalFx client with custom configuration, it is possible use this for sending instead of creating, configuring a new one. `dimensions` object adds a pre-defined dimension for each datapoint. The format for `dimensions` is `{key: value, ...}`. | |
- `signalfx`: A JS object with optional `client` and `dimensions` fields. If you have already setup a Splunk client with custom configuration, you can use this for sending instead of creating, configuring a new one. `dimensions` object adds a pre-defined dimension for each datapoint. The format for `dimensions` is `{key: value, ...}`. |
Can we rename the setting to "splunk"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, explicitly wanted to make it clear that we are dealing with SignalFx Node.js client. What do you think about removing the signalfx
field completely? And replace it with 2 fields: client
and dimensions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old library? Then it makes sense. Just wondering about backward compatibility if the switch to OTel metrics in the future (see Owais' comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link to the SignalFx client
there and mentioned that metrics will change once OpenTelemetry becomes available.
How will this work with Otel? What happens when Otel introduces similar runtime metrics? Are we marking this as unstable for now so we have a path to migrate to Otel implementation later? Or why aren't we contributing this to otel contrib? |
Yes, it'll be unstable for now. Runtime metrics are disabled by default for that reason. When OTel semconvs and the metrics SDK is available, there will be a breaking change. And after semconvs are available we can start contributing the module upstream. |
OK. If it is experimental and users understand things can break then it's fine I think. That said, SignalFx never had a single library for metrics and traces. Customers always had to use two different libs. Why wouldn't we let customers use signalfx metrics lib till we have counterparts in otel ready? |
I tend to agree with @owais here. What's the take of Ivo on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still rather see the native extension have a separate life-cycle in another package, but this works for now.
We should at least try to add missing conventions to Otel metrics semantic conventions. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/system-metrics.md |
I think I can restate the goals
Whether or not the macro requirements above can be fulfilled with one or two client-facing artifacts is engineering call. Maybe worth a call as well to go over the concerns. |
Description
Add runtime metrics collection in a similar fashion as is done by signalfx-nodejs-collect. However
signalfx-nodejs-collect
used 2 different native extensions to gather event loop and GC metrics. For event loop metric only the whole cycle duration (including the poll phase) was reported as a metric, which is quite useless. It should have measured event loop lag, which is the whole event loop cycle minus the time it took for IO, timer polling (i.e. no CPU used). Besides thissignalfx-nodejs-collect
had a lot of dead code regarding events, which were never sent (e.g. on each GC cycle). So the decision was to dropsignalfx-nodejs-collect
for a custom native extension.Currently the exporting step will still use the SignalFx client.
List of metrics exported (see signalfx-nodejs-collect for the previous list):
nodejs.memory.heap.total
(gauge, bytes) - Heap total viaprocess.memoryUsage().heapTotal
nodejs.memory.heap.used
(gauge, bytes) - Heap used viaprocess.memoryUsage().heapUsed
nodejs.memory.rss
(gauge, bytes) - Resident set size viaprocess.memoryUsage().rss
nodejs.memory.gc.size
(cumulative_counter, bytes) - Memory collected by the garbage collectornodejs.memory.gc.pause
(cumulative_counter, nanoseconds) - Time spent doing GCnodejs.memory.gc.count
(cumulative_counter, count) - Number of times GC ran. (signalfx-nodejs-collect
: nodejs.memory.gc.total)nodejs.event_loop.lag.max
(gauge, nanoseconds) - Max event loop lag (signalfx-nodejs-collect
: nodejs.event_loop.max)nodejs.event_loop.lag.min
(gauge, nanoseconds) - Min event loop lag (signalfx-nodejs-collect
: nodejs.event_loop.min)Environment variables introduced:
SPLUNK_METRICS_ENABLED
(default: 'false') - Metrics are opt-inSPLUNK_METRICS_ENDPOINT
(default:http://localhost:9943
)SPLUNK_METRICS_EXPORT_INTERVAL
(default: 5000)Misc:
getSignalFxClient
? Custom metrics will still need to be supported. This PR allows to provide your own SignalFx client tostartMetrics
, but since we configure it with the same access token anyway, they can just use this helper method to access the currently used client.Type of change
How Has This Been Tested?
Checklist:
npm run change:new
)