Allow custom metric_relabel_config for Prometheus #12194

hvoigt · 2024-03-04T14:54:08Z

What problem are you trying to solve?

Linkerd proxy metrics tend to have quite high cardinality. E.g. see this Medium article suggests to drop some le values to reduce cardinality:

metric_relabel_configs:
  - action: drop
    source_labels: [le]
    regex: "2.*|3.*|4.*|5.*"

We also experience that response_latency_ms_bucket has too high cardinality.

How should the problem be solved?

Provide a custom configuration for metric_relabel_configs in the linkerd-viz helm chart. E.g.

prometheus:
[...]
  proxyMetricRelabelConfigs:
  - action: keep
    source_labels: [le]
    regex: "(?i)(|10|50|100|500|1000|10000|30000|\\+Inf)"

Any alternatives you've considered?

We are currently copying the linkerd-viz helm chart and manually applying this change locally. This is annoying in case of an update as we have to reapply this change and release our own local version.

How would users interact with this feature?

If they use the Prometheus that comes with linkerd they define their wanted metric_relabel_configs using the helm charts values.

Would you like to work on this feature?

yes

The text was updated successfully, but these errors were encountered:

It is currently not possible to easily drop metrics with the Prometheus configuration that is included with the linkerd-viz helm chart. To enable this we extend the Prometheus configuration template to add a metric_relabel_config section. This configuration is usually used when dropping metrics and thus allows users to customize their recorded metrics to avoid high cardinality. E.g. there is a blog post that describes the problem in depth: https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b With this change we are able to deploy the helm chart without the need to do custom modifications to the templates. Fixes linkerd#12194 Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>

hvoigt · 2024-03-12T13:29:03Z

@mateiidavid, @kflynn I attached a PR that solves this issue. What do you think?

kflynn · 2024-03-14T15:26:08Z

@hvoigt This is one of those requests that feels simple, but hides a fair amount of complexity. 😐 The challenge here is that metrics are pretty central to Linkerd's operation, and relabelings can actually break things that many people rely on.

As @alpeb wrote in #11445, "If you know exactly what you're doing, like reducing the cardinality of response_latency_ms_bucket, my advise is to not use the stock linkerd PodMonitors, and instead provide your own with your custom changes, pluging that in your pipeline via the standard mechanisms (kustomize, helm post-rendering, sub-charts, or a separate chart)." This feels like a better route than a broad control in the stock chart.

I'm going to close this one, but thank you for digging into this! Please feel free to reach out here or on Slack if you have more questions.

kflynn · 2024-03-14T15:27:47Z

Whoops, I also meant to link to the instructions for running your own Prometheus with Linkerd because another excellent tack here is doing that, then giving your Prometheus whatever configuration you want.

It is currently not possible to easily drop metrics with the Prometheus configuration that is included with the linkerd-viz helm chart. To enable this we extend the Prometheus configuration template to add a metric_relabel_config section. This configuration is usually used when dropping metrics and thus allows users to customize their recorded metrics to avoid high cardinality. E.g. there is a blog post that describes the problem in depth: https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b With this change we are able to deploy the helm chart without the need to do custom modifications to the templates. Fixes linkerd#12194 Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>

hvoigt · 2024-03-21T16:17:07Z

Sorry I missed that this issue has been closed. This whole issue -> PR discussion feels a bit disconnected.

@hvoigt This is one of those requests that feels simple, but hides a fair amount of complexity. 😐 The challenge here is that metrics are pretty central to Linkerd's operation, and relabelings can actually break things that many people rely on.

Could you elaborate what things other than linkerd-viz this could break? And likely it only breaks for people that already have a broken Prometheus because of high cardinality memory usage. If we document it that way I do not see big harm in allowing this config.

As @alpeb wrote in #11445, "If you know exactly what you're doing, like reducing the cardinality of response_latency_ms_bucket, my advise is to not use the stock linkerd PodMonitors, and instead provide your own with your custom changes, pluging that in your pipeline via the standard mechanisms (kustomize, helm post-rendering, sub-charts, or a separate chart)." This feels like a better route than a broad control in the stock chart.

Bringing your own Prometheus adds unnecessary complexity in my opinion. What is so bad allowing users to fine-tune the stock Prometheus setup? E.g. we added a warning to the documentation of this option over at the PR.

I'm going to close this one, but thank you for digging into this! Please feel free to reach out here or on Slack if you have more questions.

I hope we can revisit this decision as it seems there are multiple people suffering from high cardinality which could be solved simpler by a stock tuning option.

It is currently not possible to easily drop metrics with the Prometheus configuration that is included with the linkerd-viz helm chart. To enable this we extend the Prometheus configuration template to add a metric_relabel_config section. This configuration is usually used when dropping metrics and thus allows users to customize their recorded metrics to avoid high cardinality. E.g. there is a blog post that describes the problem in depth: https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b With this change we are able to deploy the helm chart without the need to do custom modifications to the templates. Fixes #12194 Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>

It is currently not possible to easily drop metrics with the Prometheus configuration that is included with the linkerd-viz helm chart. To enable this we extend the Prometheus configuration template to add a metric_relabel_config section. This configuration is usually used when dropping metrics and thus allows users to customize their recorded metrics to avoid high cardinality. E.g. there is a blog post that describes the problem in depth: https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b With this change we are able to deploy the helm chart without the need to do custom modifications to the templates. Fixes linkerd#12194 Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com> Signed-off-by: Mark S <the@wondersmith.dev>

hvoigt added the enhancement label Mar 4, 2024

mateiidavid assigned kflynn Mar 7, 2024

hvoigt mentioned this issue Mar 12, 2024

linkerd-viz helm: add support for metric_relabel_configs #12248

Merged

kflynn closed this as completed Mar 14, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom metric_relabel_config for Prometheus #12194

Allow custom metric_relabel_config for Prometheus #12194

hvoigt commented Mar 4, 2024 •

edited

hvoigt commented Mar 12, 2024

kflynn commented Mar 14, 2024

kflynn commented Mar 14, 2024

hvoigt commented Mar 21, 2024 •

edited

Allow custom metric_relabel_config for Prometheus #12194

Allow custom metric_relabel_config for Prometheus #12194

Comments

hvoigt commented Mar 4, 2024 • edited

What problem are you trying to solve?

How should the problem be solved?

Any alternatives you've considered?

How would users interact with this feature?

Would you like to work on this feature?

hvoigt commented Mar 12, 2024

kflynn commented Mar 14, 2024

kflynn commented Mar 14, 2024

hvoigt commented Mar 21, 2024 • edited

hvoigt commented Mar 4, 2024 •

edited

hvoigt commented Mar 21, 2024 •

edited