Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana dashboards do not correctly show Kubernetes 1.16 metrics #18827

Closed
howardjohn opened this issue Nov 9, 2019 · 5 comments · Fixed by istio/installer#568
Closed

Grafana dashboards do not correctly show Kubernetes 1.16 metrics #18827

howardjohn opened this issue Nov 9, 2019 · 5 comments · Fixed by istio/installer#568
Assignees
Milestone

Comments

@howardjohn
Copy link
Member

Per https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#removed-metrics, pod_name and container_name, which are widely used in our dashboards, no longer exists. See #18820 for an example where dashboard test fails on 1.16

Updates exported cadvisor metric labels to match the Kubernetes instrumentation guidelines. Duplicate labels were included in the 1.14 and 1.15 releases

Since Istio 1.5 will support 1.14 through 1.16, we could switch over to only use the new labels if we need to, but maybe something smarter we can do in prom/grafana - not sure.

Here is what a new metric looks like:

container_cpu_usage_seconds_total{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_os="linux",container="discovery",cpu="total",id="/docker/b24ca077d3eddbdefc25a99ba25929bdcd6fc526c1019f84d8e392a38133ffc6/kubepods/burstable/pod3803ed4c-0513-4561-adb1-0a68b4eab19c/f87b4840ab767d836a05d13087d28ea12dd69ecc0dd86fb5832f6d96caa749dd",image="gcr.io/istio-testing/pilot:1.5-alpha.31948644cc53bb8569e4073486d7796540909adc",instance="kind-control-plane",job="kubernetes-cadvisor",kubernetes_io_arch="amd64",kubernetes_io_hostname="kind-control-plane",kubernetes_io_os="linux",name="f87b4840ab767d836a05d13087d28ea12dd69ecc0dd86fb5832f6d96caa749dd",namespace="istio-system",pod="istio-pilot-59cc469dd9-mkvfr"}

@gargnupur @douglas-reid

@howardjohn
Copy link
Member Author

@douglas-reid assigned you to at least point in the right direction.

Should we just switch the dashboards to use the new labels? is there some clever prometheus trick (either at query time or at scraping time) to seemlessly query for either of these maybe?

@douglas-reid
Copy link
Contributor

Ha. I can't even easily spin up a 1.15 cluster at the moment, let alone a 1.16. It was nice of them to make a pointless, but breaking change.

If the new way is supported in 1.14 and we are truly no longer supporting 1.13, then we should just update the dashboards.

@howardjohn
Copy link
Member Author

I think in the past while we claim to support 3 versions of k8s that was just the messaging we gave, but things worked on plenty of other versions. Right now we have tests for 1.12-1.17 and things seem to work on all versions (we don't run the dashboard test though). I am not sure how many people will be into 1.14 by the time istio 1.5 is out.

That being said it is just the resources utilization part of the dashboard, so it's not the end of the world if someone uses an unsupported version and that doesn't work, so I agree, let's just change it.

The easiest way to test new versions of Kubernetes is probably https://kind.sigs.k8s.io/ by the way, but it is sort of a pain specifically for this because it doesn't install a metrics server by default. We do have the yamls to get that running at Prow/config/metrics/

@douglas-reid
Copy link
Contributor

@howardjohn should we close this now?

@howardjohn
Copy link
Member Author

I need to bacmport to the operator still

howardjohn added a commit to howardjohn/istio-installer that referenced this issue Nov 24, 2019
istio-testing pushed a commit to istio/installer that referenced this issue Nov 25, 2019
sdake pushed a commit to sdake/istio that referenced this issue Dec 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants