Less bloated collection of debug metrics from modules #6260

ptodev · 2024-01-26T19:19:03Z

Request

"Debug metrics", is referring to metrics available on the Agent's /metrics endpoint. Those metrics indicate how various internals are working, and are used for alerts and dashboards which monitor the state of Agent instances.

Unfortunately, Agents running with modules have a few issues with their metrics.

Issue 1: The component_id labels can be very long.

This is because the ID includes the "module" component (e.g. module.string) which imported that component. If a module.string imports a module.string which uses prometheus.remote_write, then the ID label on the metric will get quite long.

If the label is extremely long, it may even hit a limit in other systems such as Mimir. That said, by default Mimir sets its max_label_value_length config parameter to 2048 - this should be long enough for most uses.

Long component IDs can make dashboards look awkward if they want to show a component ID in a drop down or a graph legend:

Maybe we could make those drop downs work by not using exact component names? E.g. if there is a prometheus.remote_write in a drop down, then the dashboards will show prometheus.remote_write metrics from any module.
Alternatively, there could be separate labels for the "module path" and for the leaf component name? This would mean that there will no longer be a singe metric label with identifies a component. Losing such ID labels is not ideal because they have their own usefulness.

Issue 2: Each Flow controller has its own set of metric series.

If an Agent uses multiple Flow controllers, the controller metrics could bloat the /metrics endpoint.
To overcome the additional series, could we maybe consolidate controller functionality so that it's ran only once per process?

Should we make debug metrics more configurable?

There might not be a "one size fits all" solution. We might have to solve this by adding some additional settings for how debug metrics should be gathered and transformed? E.g. could there be a metrics block similar to the existing logging block?

Use case

Ease of use.

The text was updated successfully, but these errors were encountered:

tpaschalis · 2024-02-05T19:15:29Z

We've discussed this offline and came to the conclusion that a good head start would be to separate the parent path into a new label. This would both be an immediate benefit as well as allow us to work with different solutions in the future (eg. hashing long parent paths stemming from nested modules).

ptodev added the enhancement New feature or request label Jan 26, 2024

rfratto assigned tpaschalis Feb 28, 2024

This was referenced Mar 28, 2024

flow: separate parent component and controller IDs into new labels #6786

Merged

flaky panic on CustomComponentRegistry.getDeclare #6795

Closed

tpaschalis closed this as completed in #6786 Apr 3, 2024

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label May 4, 2024

github-actions bot locked as resolved and limited conversation to collaborators May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less bloated collection of debug metrics from modules #6260

Less bloated collection of debug metrics from modules #6260

ptodev commented Jan 26, 2024

tpaschalis commented Feb 5, 2024

Less bloated collection of debug metrics from modules #6260

Less bloated collection of debug metrics from modules #6260

Comments

ptodev commented Jan 26, 2024

Request

Issue 1: The component_id labels can be very long.

Issue 2: Each Flow controller has its own set of metric series.

Should we make debug metrics more configurable?

Use case

tpaschalis commented Feb 5, 2024