Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana dashboard isn't using the good label for querying prometheus #4380

Closed
aimbot31 opened this issue May 12, 2020 · 7 comments · Fixed by #5012
Closed

Grafana dashboard isn't using the good label for querying prometheus #4380

aimbot31 opened this issue May 12, 2020 · 7 comments · Fixed by #5012

Comments

@aimbot31
Copy link
Contributor

aimbot31 commented May 12, 2020

Bug Report

What is the issue?

The dashboard "Kubernetes cluster monitoring (via Prometheus) " isn't using the good label for querying prometheus.

https://github.com/linkerd/linkerd2/blob/master/grafana/dashboards/kubernetes.json

It uses pod_name but i do not have this label in my prometheus.

How can it be reproduced?

I've just installed a new cluster in 1.16.8 with 3 nodes 1 master. I deployed ArgoCD, linkerd and a microservice app (bookinfo).

Logs, error output, etc

img

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-identity
----------------
√ certificate config is valid
√ trust roots are using supported crypto algorithm
√ trust roots are within their validity period
√ trust roots are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust root

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ tap api service is running

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

Status check results are √

Environment

  • Kubernetes Version: 1.16.8
  • Cluster Environment: self-hosted deployed with kubespray
  • Host OS: ubuntu 18.04
  • Linkerd version: Client version: stable-2.7.1
    Server version: stable-2.7.1

Possible solution

Use pod or name instead of pod_name.

Additional context

This is what i got when i query container_cpu_usage_seconds_total in my prometheus :

container_cpu_usage_seconds_total{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",container="POD",cpu="total",id="/kubepods/besteffort/pod0efc5b94-5155-43ec-a76d-7fef302d941a/8df0af43b56d3edc499f39a92fd4b6334e4aa754f0987c3f3049f18e5c8a4b90",image="gcr.io/google_containers/pause-amd64:3.1",instance="node1",job="kubernetes-nodes-cadvisor",kubernetes_io_arch="amd64",kubernetes_io_hostname="node1",kubernetes_io_os="linux",name="k8s_POD_argocd-server-7696cd5f89-g8sj5_argocd_0efc5b94-5155-43ec-a76d-7fef302d941a_0",namespace="argocd",pod="argocd-server-7696cd5f89-g8sj5"}

I think the problem is the same for container_name, haven't got the time to check this one.

@aimbot31 aimbot31 changed the title Grafana dashboards isn't showing the good value Grafana dashboards isn't using the good label May 12, 2020
@aimbot31 aimbot31 changed the title Grafana dashboards isn't using the good label Grafana dashboards isn't using the good label for querying prometheus May 12, 2020
@aimbot31 aimbot31 changed the title Grafana dashboards isn't using the good label for querying prometheus Grafana dashboard isn't using the good label for querying prometheus May 12, 2020
@grampelberg grampelberg added this to To do in Help Wanted via automation May 12, 2020
@grampelberg
Copy link
Contributor

I guess the labels changed, that's a bummer! We'll have to figure that dashboard out again.

@aimbot31
Copy link
Contributor Author

aimbot31 commented May 12, 2020

I would love to help on this one, i'm gonna do a pr to try to fix this

@grampelberg
Copy link
Contributor

That'd be awesome @aimbot31 ! I believe we got that dashboard off grafana hub fwiw. They might have a fix up there already.

@aimbot31
Copy link
Contributor Author

No update since 2 years :/
https://grafana.com/grafana/dashboards/315

aimbot31 added a commit to aimbot31/linkerd2 that referenced this issue May 14, 2020
aimbot31 added a commit to aimbot31/linkerd2 that referenced this issue May 14, 2020
Prometheus use a relabel rule to change pod_name to pod

Change "pod_name" to "pod" in the grafana dashboard

Run some tests on the Grafana dashboard

Fixes linkerd#4380

Signed-off-by: Florian Davasse <aimbot31@gmail.com>
aimbot31 added a commit to aimbot31/linkerd2 that referenced this issue May 14, 2020
Prometheus use a relabel rule to change pod_name to pod

Change "pod_name" to "pod" in the grafana dashboard

Run some tests on the Grafana dashboard

Fixes linkerd#4380

Signed-off-by: Florian Davasse <aimbot31@gmail.com>
aimbot31 added a commit to aimbot31/linkerd2 that referenced this issue Sep 15, 2020
Prometheus use a relabel rule that changed since 1.16

Use "pod_name" and "pod" to avoid breaking changes.
Alse use "container" and "container_name" for the
same reasons.

Run some tests on the Grafana dashboard

Fixes linkerd#4380

Signed-off-by: Florian Davasse <florian.davasse@stack-labs.com>
@aimbot31
Copy link
Contributor Author

The pr associated has been updated as suggested. Can someone re-open it plz ?
@grampelberg or @ihcsim maybe ?

@ihcsim
Copy link
Contributor

ihcsim commented Sep 25, 2020

@aimbot31 GH won't let me re-open the old PR; it's complaining about your branch. I think the easiest thing to do is just to submit a new PR. Thanks!

image

@aimbot31
Copy link
Contributor Author

Here you go @ihcsim :) #5012

aimbot31 added a commit to aimbot31/linkerd2 that referenced this issue Sep 29, 2020
Prometheus use a relabel rule that changed since 1.16

Use "pod_name" and "pod" to avoid breaking changes.
Alse use "container" and "container_name" for the
same reasons.

Run some tests on the Grafana dashboard

Fixes linkerd#4380

Signed-off-by: Florian Davasse <florian.davasse@stack-labs.com>
Help Wanted automation moved this from To do to Done Sep 29, 2020
alpeb pushed a commit that referenced this issue Sep 29, 2020
Prometheus use a relabel rule that changed since 1.16

Use "pod_name" and "pod" to avoid breaking changes.
Also use "container" and "container_name" for the
same reasons.

Fixes #4380

Signed-off-by: Florian Davasse <florian.davasse@stack-labs.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Help Wanted
  
Done
3 participants