Less bloated collection of debug metrics from modules #6260
Labels
enhancement
New feature or request
frozen-due-to-age
Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Request
"Debug metrics", is referring to metrics available on the Agent's
/metrics
endpoint. Those metrics indicate how various internals are working, and are used for alerts and dashboards which monitor the state of Agent instances.Unfortunately, Agents running with modules have a few issues with their metrics.
Issue 1: The component_id labels can be very long.
This is because the ID includes the "module" component (e.g.
module.string
) which imported that component. If amodule.string
imports amodule.string
which usesprometheus.remote_write
, then the ID label on the metric will get quite long.If the label is extremely long, it may even hit a limit in other systems such as Mimir. That said, by default Mimir sets its
max_label_value_length
config parameter to2048
- this should be long enough for most uses.Long component IDs can make dashboards look awkward if they want to show a component ID in a drop down or a graph legend:
prometheus.remote_write
in a drop down, then the dashboards will showprometheus.remote_write
metrics from any module.Issue 2: Each Flow controller has its own set of metric series.
If an Agent uses multiple Flow controllers, the controller metrics could bloat the
/metrics
endpoint.To overcome the additional series, could we maybe consolidate controller functionality so that it's ran only once per process?
Should we make debug metrics more configurable?
There might not be a "one size fits all" solution. We might have to solve this by adding some additional settings for how debug metrics should be gathered and transformed? E.g. could there be a metrics block similar to the existing logging block?
Use case
Ease of use.
The text was updated successfully, but these errors were encountered: