Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribute grafana dashboards #205

Merged

Conversation

Projects
None yet
2 participants
@povilasv
Copy link
Contributor

commented May 15, 2019

Some changes from my stuff include:

  • added tags and prefix for dashboards
  • added kube-proxy selector
  • renamed Dashboards
  • removed Kube prefix

kube-proxy:

  • Added dash for network programming rate + latency
  • Switched to seconds metrics instead of microseconds.

kube-scheduler:

  • switched to seconds metrics instead of microseconds.
  • Added scheduler_volume_scheduling_duration_seconds_bucket
  • Latency rate 5m instead of 10h
  • Added sheduling rate

kube-controller-manager:

  • Remove deprecated work queue metrics
  • Add new workqueue metrics.

kube api server

  • Changes in etcd cache duration, went from summary to buckets.
  • Fixed cache latency -> cache duration title
  • apiserver_request_count > apiserver_request_total
  • apiserver_request_latencies_bucket > apiserver_request_duration_seconds_bucket
  • Admission controller queue -> generic work queue metrics.
  • etcd_helper_cache_entry_count -> etcd_helper_cache_entry_total

kubelet

  • Changed a lot of metrics to use _second instead of _microsecond
  • Use histogram kubelet_cgroup_manager_duration_seconds_bucket instead of summary
  • Use histogram kubelet_pleg_relist_duration_seconds_bucket
  • Add PLEG relist interval
  • kubelet_runtime_operations -> kubelet_runtime_operations_total
  • Add rate storage_operation_errors_total
  • Removed kubelet_network_plugin_operations_latency_microseconds_count and kubelet_network_plugin_operations_latency_microseconds
  • Add instance label everywhere, because now there is histograms for everything we can have an aggregate view!
  • Add actual volume count
  • Add desired volume count
  • Added config error rate
  • Remove volume table, as there is no more metrics? (maybe issue with kind)

povilasv added some commits May 15, 2019

@metalmatze

This comment has been minimized.

Copy link
Member

commented May 15, 2019

On Kubernetes Slack in #monitoring-mixins we agreed to move these dashboards to Kubernetes 1.14 before merging.

@povilasv povilasv force-pushed the povilasv:contribute-grafana-dashboards branch from 73666cc to 35e5132 May 16, 2019

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Kube proxy on kind v1.14.1:

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-05-14T01:43:56Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

Added dash for network programming rate + latency

2019-05-16-081355
2019-05-16-081403

@povilasv povilasv changed the title Contribute grafana dashboards WIP: Contribute grafana dashboards May 16, 2019

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Kube-Scheduler:

2019-05-16-083032
2019-05-16-083039

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Kube controller manager:

2019-05-16-083747
2019-05-16-083755
2019-05-16-083802

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Kube api server
ETCD cache duration doesn't show because cache is 0 in my kind cluster

2019-05-16-085707
2019-05-16-085715

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Leaving Kubelet for tommorow as it doesn't work at all :)

This is how menu looks:

2019-05-16-085957

@metalmatze

This comment has been minimized.

Copy link
Member

commented May 16, 2019

Leaving Kubelet for tommorow as it doesn't work at all :)

This is how menu looks:

2019-05-16-085957

All of the above look really good. Looking forward to merging them!
Just for consistency: I don't think you need to prefix those dashboard names twice.

- Kubernetes / Kube API server
+ Kubernetes / API server

That should be fine.

@povilasv povilasv force-pushed the povilasv:contribute-grafana-dashboards branch from 7bde859 to d19d940 May 17, 2019

@povilasv povilasv force-pushed the povilasv:contribute-grafana-dashboards branch from d19d940 to 62f46ad May 17, 2019

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

@metalmatze Done, I've also renamed the files

-rw-r--r-- 1 povilasv povilasv 35925 May 17 06:03 apiserver.json
-rw-r--r-- 1 povilasv povilasv 34874 May 17 06:03 controller-manager.json
-rw-r--r-- 1 povilasv povilasv 26422 May 17 06:03 k8s-cluster-rsrc-use.json
-rw-r--r-- 1 povilasv povilasv 26293 May 17 06:03 k8s-node-rsrc-use.json
-rw-r--r-- 1 povilasv povilasv 44314 May 17 06:03 k8s-resources-cluster.json
-rw-r--r-- 1 povilasv povilasv 28509 May 17 06:03 k8s-resources-namespace.json
-rw-r--r-- 1 povilasv povilasv 30336 May 17 06:03 k8s-resources-pod.json
-rw-r--r-- 1 povilasv povilasv 30165 May 17 06:03 k8s-resources-workload.json
-rw-r--r-- 1 povilasv povilasv 31309 May 17 06:03 k8s-resources-workloads-namespace.json
-rw-r--r-- 1 povilasv povilasv 62919 May 17 06:03 kubelet.json
-rw-r--r-- 1 povilasv povilasv 40564 May 17 06:03 nodes.json
-rw-r--r-- 1 povilasv povilasv 16180 May 17 06:03 persistentvolumesusage.json
-rw-r--r-- 1 povilasv povilasv 18830 May 17 06:03 pods.json
-rw-r--r-- 1 povilasv povilasv 32995 May 17 06:03 proxy.json
-rw-r--r-- 1 povilasv povilasv 26116 May 17 06:03 scheduler.json
-rw-r--r-- 1 povilasv povilasv 27220 May 17 06:03 statefulset.json

2019-05-17-060243

povilasv added some commits May 17, 2019

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

Kubelet

2019-05-17-074416
2019-05-17-074424
2019-05-17-074431
2019-05-17-074437
2019-05-17-074444
2019-05-17-074453

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

TODO:

  • Maybe add scheduling rate?
  • Fix some places we still have microseconds
  • retest everything after fixes
  • for next metric overhaul:

job_adds > job_add_total
job_queue_latency - > job_queue_latency_secnods
job_work_duration -> job_work_duration_seconds
and maybe move this stuff to buckets instead of summaries.

Edit: Turns out all of those metrics were deprecated, so I moved us onto good metrics so everything is in seconds. + added scheduling rate, so everything is fixed. ❤️ the metric ovrerhaul

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

Controller manager after fixes

2019-05-17-081016
2019-05-17-081034
2019-05-17-081040

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

API server after fixes:
2019-05-17-083836
2019-05-17-083851
2019-05-17-083858

@povilasv povilasv force-pushed the povilasv:contribute-grafana-dashboards branch from 88c5c85 to 9de2c7c May 17, 2019

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

Scheduler fixes:

2019-05-17-085214
2019-05-17-085259

@povilasv povilasv changed the title WIP: Contribute grafana dashboards Contribute grafana dashboards May 17, 2019

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

Super glad to do a v2 of this :D although it involved a lot of manual work, but as they say repetition is the mother of learning 👯‍♂

@metalmatze

This comment has been minimized.

Copy link
Member

commented May 17, 2019

This is super awesome and thanks for putting all the work into it. 🎉
To not make things even more complicated we want to merge this now and then create follow up PRs.

I'll create an issue with a TODO list of things we want to improve about these in the future.

Thanks again!

@metalmatze metalmatze merged commit 0e4fc48 into kubernetes-monitoring:master May 17, 2019

1 check passed

ci/circleci Your tests passed on CircleCI!
Details

@metalmatze metalmatze referenced this pull request May 17, 2019

Open

Improve new control plane dashboards #206

0 of 5 tasks complete
@povilasv

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

@metalmatze cool, let me now I might help :)

@povilasv povilasv deleted the povilasv:contribute-grafana-dashboards branch May 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.