Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less bloated collection of debug metrics from modules #6260

Closed
ptodev opened this issue Jan 26, 2024 · 1 comment · Fixed by #6786
Closed

Less bloated collection of debug metrics from modules #6260

ptodev opened this issue Jan 26, 2024 · 1 comment · Fixed by #6786
Assignees
Labels
enhancement New feature or request frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.

Comments

@ptodev
Copy link
Contributor

ptodev commented Jan 26, 2024

Request

"Debug metrics", is referring to metrics available on the Agent's /metrics endpoint. Those metrics indicate how various internals are working, and are used for alerts and dashboards which monitor the state of Agent instances.

Unfortunately, Agents running with modules have a few issues with their metrics.

Issue 1: The component_id labels can be very long.

This is because the ID includes the "module" component (e.g. module.string) which imported that component. If a module.string imports a module.string which uses prometheus.remote_write, then the ID label on the metric will get quite long.

If the label is extremely long, it may even hit a limit in other systems such as Mimir. That said, by default Mimir sets its max_label_value_length config parameter to 2048 - this should be long enough for most uses.

Long component IDs can make dashboards look awkward if they want to show a component ID in a drop down or a graph legend:

  • Maybe we could make those drop downs work by not using exact component names? E.g. if there is a prometheus.remote_write in a drop down, then the dashboards will show prometheus.remote_write metrics from any module.
  • Alternatively, there could be separate labels for the "module path" and for the leaf component name? This would mean that there will no longer be a singe metric label with identifies a component. Losing such ID labels is not ideal because they have their own usefulness.

Issue 2: Each Flow controller has its own set of metric series.

If an Agent uses multiple Flow controllers, the controller metrics could bloat the /metrics endpoint.
To overcome the additional series, could we maybe consolidate controller functionality so that it's ran only once per process?

Should we make debug metrics more configurable?

There might not be a "one size fits all" solution. We might have to solve this by adding some additional settings for how debug metrics should be gathered and transformed? E.g. could there be a metrics block similar to the existing logging block?

Use case

Ease of use.

@ptodev ptodev added the enhancement New feature or request label Jan 26, 2024
@tpaschalis
Copy link
Member

We've discussed this offline and came to the conclusion that a good head start would be to separate the parent path into a new label. This would both be an immediate benefit as well as allow us to work with different solutions in the future (eg. hashing long parent paths stemming from nested modules).

@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label May 4, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants