Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes monitoring #5392

Open
ilyam8 opened this Issue Feb 14, 2019 · 3 comments

Comments

Projects
None yet
2 participants
@ilyam8
Copy link
Member

ilyam8 commented Feb 14, 2019

Kubernetes monitoring is a very complex task.

#5387 is only about monitoring pods/containers, not a kubernetes cluster as a whole.

The task is:

  • investigate what is kubernetes monitoring , analyze other monitoring solutions (a lot of very nice out there).
  • split that big task into smaller pieces.
  • understand in what direction we need to move.

The big challenge is not to collect data, but represent it in a meaningful way, which may be not achievable with current netdata implementation.

I will update OP post with all my finding and thoughts.

@ilyam8 ilyam8 added this to the v1.13-rc1 milestone Feb 14, 2019

@ilyam8 ilyam8 self-assigned this Feb 14, 2019

@cakrit cakrit modified the milestones: v1.13-rc1, v1.13 Feb 28, 2019

@cakrit cakrit self-assigned this Feb 28, 2019

@ilyam8

This comment has been minimized.

Copy link
Member Author

ilyam8 commented Mar 14, 2019

Ports:

  • 2379 (etcd)
  • 8443 (api server)
  • 10248 (kubelet)
  • 10249 (proxy status)
  • 10250 (kubelet)
  • 10251 (insecure scheduler)
  • 10252 (insecure kube controller manager)
  • 10253 (insecure cloud controller manager)
  • 10255 (kubelet read only)
  • 10256 (proxy healthz)
  • 10257 (kube controller manager)
  • 10258 (cloud controller manager)
  • 10259 (kube scheduler)

Kubernetes components to monitor.

Location: control plane node, separate nodes?

  • etcd
    • http://localhost:2379/metrics

Location: master node.

  • api-server
    • https://localhost:8443/metrics
  • kube-scheduler
    • http://127.0.0.1:10251/metrics
  • kube-controller-manager
    • http://127.0.0.1:10252/metrics
  • kube-state-metrics
    • http://127.0.0.1:80/metrics (?)

Location: slave node.

  • kubelet
    • http://127.0.0.1:10255/metrics
    • http://127.0.0.1:10255/metrics/cadvisor
    • http://127.0.0.1:10255/pods
    • http://127.0.0.1:10255/specs
    • http://127.0.0.1:10255/healthz
    • http://127.0.0.1:10255/stats
    • http://127.0.0.1:10255/stats/summary
    • http://127.0.0.1:10255/stats/container
    • http://127.0.0.1:10255/stats/<pod name>/<container name>
    • http://127.0.0.1:10255/stats/<namespace>/<pod name>/<uid>/<container name>
  • kube-dns (optional: kubedns, dnsmasq, sidecar)
    • http://127.0.0.1:10055/metrics
  • kube-dns (default: coreDNS)
    • http://127.0.0.1:9153/metrics
  • kube-proxy
    • http://127.0.0.1:10249/metrics

@cakrit cakrit removed this from the v1.13 milestone Mar 14, 2019

@cakrit

This comment has been minimized.

Copy link
Contributor

cakrit commented Mar 14, 2019

Sprint 1

  • @ilyam8 : Collect data from the kubelet (port 10255). go collector. Also collect info for each pod/container, if valuable metrics not available to cgroups are provided by the kubelet. #5639
  • @cakrit : Investigate discrepancies between kubelet pods info and API server pods info #5636
  • @cakrit : Investigate access of endpoints for kube-dns #5636
  • @cakrit : Helmchart TODOs 1 (netdata/helmchart#1) #5637.

Sprint 2

  • @ilyam8 : Collect proxy and DNS metrics
  • @cakrit : Investigate how we can monitor applications running on other pods (e.g. mysql, nginx logs). Perhaps installing a netdata on the same pod? Can we extend at least the collector that get their info via TCP to automatically discover new pods?
  • @cakrit , @ktsaou, @ilyam8 : Extend netdata to accept labels/tags - Design
  • @cakrit , @ktsaou, @ilyam8, @gmosx: Change console UI to present k8s information more meaningfully on each host (right menu) - Design

Sprint 3

  • TBD: Extend netdata to accept labels/tags - Implementation
  • @gmosx: Change console UI to present k8s information more meaningfully on each host (right menu) - Implementation
  • @ilyam8 : Extend existing plugins to autodetect ephemeral pods in a k8s environment e.g. for nginx, apache. Design. (to be discussed)

Sprint 4

TBD

  • Investigate access of endpoints for master node endpoints and etcd
  • k8s network and service view implementation (cloud) v2
  • Consider making external plugins trully external (i.e. communication over TCP instead of pipes). Will also help netdata stop running as root. It's related to k8s, because we could just install a collector on the same pod as e.g. apache to read its logs and push them to a netdata daemon (like the prometheus exporters).
@cakrit

This comment has been minimized.

Copy link
Contributor

cakrit commented Mar 21, 2019

Regarding

Investigate how we can monitor applications running on other pods (e.g. mysql, nginx logs). Perhaps installing a netdata on the same pod? Can we extend at least the collector that get their info via TCP to automatically discover new pods?

We have two things we need to do, neither of which is related to the helm chart:

  • For collectors that get metrics via TCP, attempt autodiscovery of the the ip/port and perhaps allow specification of endpoints using label-based configuration, or something better if we can manage it.
  • For collectors that get metrics by reading files, we need to see how people would install netdata as a sidecar container in the same pod.

The second one is obviously a quick workaround that could be used even for collectors that read metrics via TCP. The caveat is that people would end up installing more netdata instances than what they really need, so we should avoid this with a powerful autodiscovery mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.