Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom metric_relabel_config for Prometheus #12194

Closed
hvoigt opened this issue Mar 4, 2024 · 4 comments · Fixed by #12248
Closed

Allow custom metric_relabel_config for Prometheus #12194

hvoigt opened this issue Mar 4, 2024 · 4 comments · Fixed by #12248
Assignees

Comments

@hvoigt
Copy link
Contributor

hvoigt commented Mar 4, 2024

What problem are you trying to solve?

Linkerd proxy metrics tend to have quite high cardinality. E.g. see this Medium article suggests to drop some le values to reduce cardinality:

metric_relabel_configs:
  - action: drop
    source_labels: [le]
    regex: "2.*|3.*|4.*|5.*"

We also experience that response_latency_ms_bucket has too high cardinality.

How should the problem be solved?

Provide a custom configuration for metric_relabel_configs in the linkerd-viz helm chart. E.g.

prometheus:
[...]
  proxyMetricRelabelConfigs:
  - action: keep
    source_labels: [le]
    regex: "(?i)(|10|50|100|500|1000|10000|30000|\\+Inf)"

Any alternatives you've considered?

We are currently copying the linkerd-viz helm chart and manually applying this change locally. This is annoying in case of an update as we have to reapply this change and release our own local version.

How would users interact with this feature?

If they use the Prometheus that comes with linkerd they define their wanted metric_relabel_configs using the helm charts values.

Would you like to work on this feature?

yes

hvoigt added a commit to hvoigt/linkerd2 that referenced this issue Mar 12, 2024
It is currently not possible to easily drop metrics with the Prometheus
configuration that is included with the linkerd-viz helm chart. To
enable this we extend the Prometheus configuration template to add a
metric_relabel_config section. This configuration is usually used when
dropping metrics and thus allows users to customize their recorded
metrics to avoid high cardinality.

E.g. there is a blog post that describes the problem in depth:
https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b

With this change we are able to deploy the helm chart without the need
to do custom modifications to the templates.

Fixes linkerd#12194

Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>
@hvoigt
Copy link
Contributor Author

hvoigt commented Mar 12, 2024

@mateiidavid, @kflynn I attached a PR that solves this issue. What do you think?

@kflynn
Copy link
Member

kflynn commented Mar 14, 2024

@hvoigt This is one of those requests that feels simple, but hides a fair amount of complexity. 😐 The challenge here is that metrics are pretty central to Linkerd's operation, and relabelings can actually break things that many people rely on.

As @alpeb wrote in #11445, "If you know exactly what you're doing, like reducing the cardinality of response_latency_ms_bucket, my advise is to not use the stock linkerd PodMonitors, and instead provide your own with your custom changes, pluging that in your pipeline via the standard mechanisms (kustomize, helm post-rendering, sub-charts, or a separate chart)." This feels like a better route than a broad control in the stock chart.

I'm going to close this one, but thank you for digging into this! Please feel free to reach out here or on Slack if you have more questions.

@kflynn kflynn closed this as completed Mar 14, 2024
@kflynn
Copy link
Member

kflynn commented Mar 14, 2024

Whoops, I also meant to link to the instructions for running your own Prometheus with Linkerd because another excellent tack here is doing that, then giving your Prometheus whatever configuration you want.

hvoigt added a commit to hvoigt/linkerd2 that referenced this issue Mar 19, 2024
It is currently not possible to easily drop metrics with the Prometheus
configuration that is included with the linkerd-viz helm chart. To
enable this we extend the Prometheus configuration template to add a
metric_relabel_config section. This configuration is usually used when
dropping metrics and thus allows users to customize their recorded
metrics to avoid high cardinality.

E.g. there is a blog post that describes the problem in depth:
https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b

With this change we are able to deploy the helm chart without the need
to do custom modifications to the templates.

Fixes linkerd#12194

Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>
hvoigt added a commit to hvoigt/linkerd2 that referenced this issue Mar 19, 2024
It is currently not possible to easily drop metrics with the Prometheus
configuration that is included with the linkerd-viz helm chart. To
enable this we extend the Prometheus configuration template to add a
metric_relabel_config section. This configuration is usually used when
dropping metrics and thus allows users to customize their recorded
metrics to avoid high cardinality.

E.g. there is a blog post that describes the problem in depth:
https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b

With this change we are able to deploy the helm chart without the need
to do custom modifications to the templates.

Fixes linkerd#12194

Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>
@hvoigt
Copy link
Contributor Author

hvoigt commented Mar 21, 2024

Sorry I missed that this issue has been closed. This whole issue -> PR discussion feels a bit disconnected.

@hvoigt This is one of those requests that feels simple, but hides a fair amount of complexity. 😐 The challenge here is that metrics are pretty central to Linkerd's operation, and relabelings can actually break things that many people rely on.

Could you elaborate what things other than linkerd-viz this could break? And likely it only breaks for people that already have a broken Prometheus because of high cardinality memory usage. If we document it that way I do not see big harm in allowing this config.

As @alpeb wrote in #11445, "If you know exactly what you're doing, like reducing the cardinality of response_latency_ms_bucket, my advise is to not use the stock linkerd PodMonitors, and instead provide your own with your custom changes, pluging that in your pipeline via the standard mechanisms (kustomize, helm post-rendering, sub-charts, or a separate chart)." This feels like a better route than a broad control in the stock chart.

Bringing your own Prometheus adds unnecessary complexity in my opinion. What is so bad allowing users to fine-tune the stock Prometheus setup? E.g. we added a warning to the documentation of this option over at the PR.

I'm going to close this one, but thank you for digging into this! Please feel free to reach out here or on Slack if you have more questions.

I hope we can revisit this decision as it seems there are multiple people suffering from high cardinality which could be solved simpler by a stock tuning option.

adleong pushed a commit that referenced this issue Apr 11, 2024
It is currently not possible to easily drop metrics with the Prometheus
configuration that is included with the linkerd-viz helm chart. To
enable this we extend the Prometheus configuration template to add a
metric_relabel_config section. This configuration is usually used when
dropping metrics and thus allows users to customize their recorded
metrics to avoid high cardinality.

E.g. there is a blog post that describes the problem in depth:
https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b

With this change we are able to deploy the helm chart without the need
to do custom modifications to the templates.

Fixes #12194

Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 21, 2024
the-wondersmith pushed a commit to the-wondersmith/linkerd2 that referenced this issue Apr 24, 2024
It is currently not possible to easily drop metrics with the Prometheus
configuration that is included with the linkerd-viz helm chart. To
enable this we extend the Prometheus configuration template to add a
metric_relabel_config section. This configuration is usually used when
dropping metrics and thus allows users to customize their recorded
metrics to avoid high cardinality.

E.g. there is a blog post that describes the problem in depth:
https://itnext.io/optimizing-linkerd-metrics-in-prometheus-de607ec10f6b

With this change we are able to deploy the helm chart without the need
to do custom modifications to the templates.

Fixes linkerd#12194

Signed-off-by: Heiko Voigt <heiko.voigt@jimdo.com>
Signed-off-by: Mark S <the@wondersmith.dev>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants