Considerably high metric cardinality #1479

paulfantom · 2021-06-04T14:21:55Z

Describe the bug

Basic installation of fluxv2 produces ~6000 metric series. Majority (~5000) of those come from rest_client_request_latency_seconds_.* buckets. As far as I can see, only a small subset of data from those metrics is actually used (I found them used only in one panel in "Flux Control Plane" dashboard).

Are those used for anything else? If so, maybe there would be a way to reduce their cardinality?

To Reproduce

Steps to reproduce the behaviour:

Applied https://github.com/fluxcd/flux2/releases/download/v0.14.2/install.yaml to a cluster
All manifests used for installation are available at https://github.com/thaum-xyz/ankhmorpork/tree/d24dd02a77479c7884965a23b394a9ee86f279a4

Expected behavior

Less metrics, but of high quality.

Additional context

Kubernetes version: k3s 1.19.7
Git provider: ---
Container registry provider: ---

Below please provide the output of the following commands:

flux --version
flux check
kubectl -n <namespace> get all
kubectl -n <namespace> logs deploy/source-controller
kubectl -n <namespace> logs deploy/kustomize-controller

The text was updated successfully, but these errors were encountered:

paulfantom · 2021-06-04T14:24:17Z

Data from prometheus query count({job="flux-system/flux-system"}):

Just after installation: 6522
After discarding rest_client_request_latency_seconds_.*: 950

stefanprodan · 2021-06-04T14:33:17Z

These metrics come from controller-runtime, it's the Kubernetes SDK that we are using to develop the GitOps toolkit controllers. Feel free to create Prometheus rules and drop things that you don't need or open an issue on controller-runtime.

stefanprodan · 2021-06-04T14:35:33Z

The API metrics are used here https://github.com/fluxcd/flux2/blob/main/manifests/monitoring/grafana/dashboards/control-plane.json#L332

paulfantom · 2021-06-12T16:05:20Z

These metrics come from controller-runtime, it's the Kubernetes SDK that we are using to develop the GitOps toolkit controllers.

Sorry, but that is only an excuse and not really a fix :) The issue is still present in flux, even if it cause by an upstream library misbehavior.

For anyone who finds this issue in the future, here is a relablling that removes all rest_client_request_latency_seconds_.* metrics (including ones that are used in one, relatively meaningless, panel of flux dashboard): https://github.com/thaum-xyz/ankhmorpork/blob/d24dd02a77479c7884965a23b394a9ee86f279a4/base/flux-system/podmonitor.yaml#L26-L30

selaux · 2021-10-25T12:03:28Z

We also encountered this issue and had to disable prometheus scraping for flux, as the costs were not justifyable. It has been fixed in the controller-runtime library version 0.10.0 upwards. Any chance we will get an update?

stefanprodan · 2021-10-25T13:32:45Z

We are rolling the update to all Flux controllers, in the latest release some of them are already on controller-runtime v0.10.2. Once all of them will be updated I will close this issue.

stefanprodan · 2021-11-24T17:44:16Z

As of flux 0.24.0, all controllers have been update to controller-runtime v0.10 so this issue is finally fixed.

Now we need to remove the graph using rest_client_request_latency_seconds from our Grafana dashboard.

stefanprodan mentioned this issue Dec 8, 2021

Monitoring guide is broken #2192

Closed

1 task

stefanprodan closed this as completed Jun 2, 2022

darkowlzz mentioned this issue Aug 2, 2023

Support for flux custom metrics #4128

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Considerably high metric cardinality #1479

Considerably high metric cardinality #1479

paulfantom commented Jun 4, 2021

paulfantom commented Jun 4, 2021

stefanprodan commented Jun 4, 2021 •

edited

Loading

stefanprodan commented Jun 4, 2021

paulfantom commented Jun 12, 2021

selaux commented Oct 25, 2021

stefanprodan commented Oct 25, 2021 •

edited

Loading

stefanprodan commented Nov 24, 2021

Considerably high metric cardinality #1479

Considerably high metric cardinality #1479

Comments

paulfantom commented Jun 4, 2021

Describe the bug

To Reproduce

Expected behavior

Additional context

paulfantom commented Jun 4, 2021

stefanprodan commented Jun 4, 2021 • edited Loading

stefanprodan commented Jun 4, 2021

paulfantom commented Jun 12, 2021

selaux commented Oct 25, 2021

stefanprodan commented Oct 25, 2021 • edited Loading

stefanprodan commented Nov 24, 2021

stefanprodan commented Jun 4, 2021 •

edited

Loading

stefanprodan commented Oct 25, 2021 •

edited

Loading