Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with mostly broken Grafana dashboards #361

Open
danfromtitan opened this issue Jul 23, 2021 · 6 comments · May be fixed by #427
Open

dealing with mostly broken Grafana dashboards #361

danfromtitan opened this issue Jul 23, 2021 · 6 comments · May be fixed by #427

Comments

@danfromtitan
Copy link

danfromtitan commented Jul 23, 2021

I built dashboards following the mixtool instructions, got the dashboards uploaded in Grafana and I found myself lost in a sea of "No data". Most dashdoards fall in that category: reads, writes, compactor resources, queries etc, only the object store shows half the graphs working.

Digging one of the queries tells me the issues are with how the queries were implemented:

sum(cluster_namespace_job:cortex_distributor_received_samples:rate5m{cluster=~"$cluster", job=~"($namespace)/(distributor|cortex$)"})

The metric above has different labels out of Prometheus, there is no way it would work out of the box:

cluster_namespace_job:cortex_distributor_received_samples:rate5m{job="cortex-distributor", namespace="cortex"}

More or less, same is true for all other "No data" charts, it looks like each needs to be changed to make it work.

I instaled Cortex from the latest version of your Helm chart (ver 0.6.0 at the time) and it does get the latest cortex image deployed, so the metric labels should be up to date.

Before I take on the monumental task of fixing these charts I wanted to ask, is there something I missed ? My expectation was, by havind the latest cortex deployed from the latest Helm chart, that most if not all of the dashboards provided would come with queries matching the metrics exposed.

@pracucci
Copy link
Collaborator

I would configure your Prometheus jobs to add the following labels:

  • cluster: name of the K8S cluster
  • job: "/<deployment/statefulset>" (eg. if Cortex is deployed in the "cortex-01" namespaces then ingesters would have the job label "cortex-01/ingester)

@fculpo
Copy link

fculpo commented Aug 18, 2021

Could you precise the job part ?

Currently my prometheus is sending an external_label for cluster and jobs are automatically name after the deployment like cortex-compactor

@pracucci
Copy link
Collaborator

Could you precise the job part ?

The job label is a label that we expect to be added by Prometheus (configured in the Prometheus scraping config) whose value is <namespace>/<deployment|statefulset|daemonset> where <namespace> is the namespace of where the pod is running and <deployment|statefulset|daemonset> is the name of the pod's deployment/statefulset/daemonset.

@danfromtitan
Copy link
Author

danfromtitan commented Aug 20, 2021

There is a job label in the metrics, it's just not in the value format Grafana expects:

Prometheus records the label as: job="cortex-distributor"
Grafana regex looks to match the label as: job=~"($namespace)/(distributor|cortex$)"

There is no way these two would match. I'm still hanging on the though that these label values should match between what Prometheus collects vs what Grafana expects and I'm not sure how comes they are so far apart.

Job label value is just one example, there are other labels in Grafana queries that would prevent the results from showing in the dashboards.

Anyway, the way I intent to deal with this is to fork cortex-mixin and fix queries to match Prometheus labels as much as possible, then adjust dashboards manually for the remaining.

@danfromtitan
Copy link
Author

The job label is a label that we expect to be added by Prometheus (configured in the Prometheus scraping config) whose value is <namespace>/<deployment|statefulset|daemonset> where <namespace> is the namespace of where the pod is running and <deployment|statefulset|daemonset> is the name of the pod's deployment/statefulset/daemonset.

Are there relabeling rules documented somewhere ? I'm trying to eliminate the guessing factor from the efforts. I deployed Cortex from your helm chart and the metrics come straight out of the Service Monitor without any relabeling. I've added the recording rules from cortex-mixin next to that, but I didn't come across any scrape config requirements in my readings.

@danfromtitan
Copy link
Author

danfromtitan commented Oct 4, 2021

To follow-up on this issue, I opened cortexproject/cortex-helm-chart#233 to ensure container names produced by the Cortex Helm chart align with cortex-mixin container label values.
Other than that, I had to significantly modify cortex-mixin's to get the dashboards to work after deploying Cortex from the official Helm chart. Waiting for a small change in the helm chart before I can PR this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants