dealing with mostly broken Grafana dashboards #361

danfromtitan · 2021-07-23T16:38:02Z

I built dashboards following the mixtool instructions, got the dashboards uploaded in Grafana and I found myself lost in a sea of "No data". Most dashdoards fall in that category: reads, writes, compactor resources, queries etc, only the object store shows half the graphs working.

Digging one of the queries tells me the issues are with how the queries were implemented:

sum(cluster_namespace_job:cortex_distributor_received_samples:rate5m{cluster=~"$cluster", job=~"($namespace)/(distributor|cortex$)"})

The metric above has different labels out of Prometheus, there is no way it would work out of the box:

cluster_namespace_job:cortex_distributor_received_samples:rate5m{job="cortex-distributor", namespace="cortex"}

More or less, same is true for all other "No data" charts, it looks like each needs to be changed to make it work.

I instaled Cortex from the latest version of your Helm chart (ver 0.6.0 at the time) and it does get the latest cortex image deployed, so the metric labels should be up to date.

Before I take on the monumental task of fixing these charts I wanted to ask, is there something I missed ? My expectation was, by havind the latest cortex deployed from the latest Helm chart, that most if not all of the dashboards provided would come with queries matching the metrics exposed.

The text was updated successfully, but these errors were encountered:

pracucci · 2021-07-29T14:43:21Z

I would configure your Prometheus jobs to add the following labels:

cluster: name of the K8S cluster
job: "/<deployment/statefulset>" (eg. if Cortex is deployed in the "cortex-01" namespaces then ingesters would have the job label "cortex-01/ingester)

fculpo · 2021-08-18T12:38:18Z

Could you precise the job part ?

Currently my prometheus is sending an external_label for cluster and jobs are automatically name after the deployment like cortex-compactor

pracucci · 2021-08-20T13:41:27Z

Could you precise the job part ?

The job label is a label that we expect to be added by Prometheus (configured in the Prometheus scraping config) whose value is <namespace>/<deployment|statefulset|daemonset> where <namespace> is the namespace of where the pod is running and <deployment|statefulset|daemonset> is the name of the pod's deployment/statefulset/daemonset.

danfromtitan · 2021-08-20T14:42:28Z

There is a job label in the metrics, it's just not in the value format Grafana expects:

Prometheus records the label as: job="cortex-distributor"
Grafana regex looks to match the label as: job=~"($namespace)/(distributor|cortex$)"

There is no way these two would match. I'm still hanging on the though that these label values should match between what Prometheus collects vs what Grafana expects and I'm not sure how comes they are so far apart.

Job label value is just one example, there are other labels in Grafana queries that would prevent the results from showing in the dashboards.

Anyway, the way I intent to deal with this is to fork cortex-mixin and fix queries to match Prometheus labels as much as possible, then adjust dashboards manually for the remaining.

danfromtitan · 2021-08-20T14:55:19Z

The job label is a label that we expect to be added by Prometheus (configured in the Prometheus scraping config) whose value is <namespace>/<deployment|statefulset|daemonset> where <namespace> is the namespace of where the pod is running and <deployment|statefulset|daemonset> is the name of the pod's deployment/statefulset/daemonset.

Are there relabeling rules documented somewhere ? I'm trying to eliminate the guessing factor from the efforts. I deployed Cortex from your helm chart and the metrics come straight out of the Service Monitor without any relabeling. I've added the recording rules from cortex-mixin next to that, but I didn't come across any scrape config requirements in my readings.

danfromtitan · 2021-10-04T17:38:30Z

To follow-up on this issue, I opened cortexproject/cortex-helm-chart#233 to ensure container names produced by the Cortex Helm chart align with cortex-mixin container label values.
Other than that, I had to significantly modify cortex-mixin's to get the dashboards to work after deploying Cortex from the official Helm chart. Waiting for a small change in the helm chart before I can PR this change.

danfromtitan mentioned this issue Oct 4, 2021

update container names to match the ones expected by cortex-mixin cortexproject/cortex-helm-chart#233

Merged

1 task

danfromtitan linked a pull request Jan 24, 2022 that will close this issue

Update dashboards to support Helm chart deployment #427

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dealing with mostly broken Grafana dashboards #361

dealing with mostly broken Grafana dashboards #361

danfromtitan commented Jul 23, 2021 •

edited

pracucci commented Jul 29, 2021

fculpo commented Aug 18, 2021

pracucci commented Aug 20, 2021

danfromtitan commented Aug 20, 2021 •

edited

danfromtitan commented Aug 20, 2021

danfromtitan commented Oct 4, 2021 •

edited

dealing with mostly broken Grafana dashboards #361

dealing with mostly broken Grafana dashboards #361

Comments

danfromtitan commented Jul 23, 2021 • edited

pracucci commented Jul 29, 2021

fculpo commented Aug 18, 2021

pracucci commented Aug 20, 2021

danfromtitan commented Aug 20, 2021 • edited

danfromtitan commented Aug 20, 2021

danfromtitan commented Oct 4, 2021 • edited

danfromtitan commented Jul 23, 2021 •

edited

danfromtitan commented Aug 20, 2021 •

edited

danfromtitan commented Oct 4, 2021 •

edited