Instrumentation needed for kubelet, apiserver, etc #1625

thockin · 2014-10-07T16:50:40Z

We do not report much in the way of stats (memory used, latency, counters, etc) for our core components.

lavalamp · 2014-10-07T17:50:02Z

Let's use http://golang.org/pkg/expvar/ to publish metrics?

ddysher · 2014-12-02T22:29:50Z

I've proposed the same thing on #2675

@lavalamp expvar package uses fixed url "/debug/vars", annoying. Downstream issue pointed to a package https://github.com/armon/go-metrics, I'll take a look.

ddysher · 2014-12-06T03:20:38Z

Most of the existing tools try to daemonize the stats/metrics process, which is not what we are looking for. I'd like simple package that we can expose metrics per component/server, probably easy enough to implement our own.

Speaking of which, running the stats as a per node daemon is also an option, but I think we are not there yet.

bgrant0607 · 2014-12-16T21:28:48Z

This is redundant with #621, but I'll close that one, as this has more discussion.

bgrant0607 · 2014-12-16T21:29:02Z

/cc @satnam6502

bgrant0607 · 2014-12-16T21:32:24Z

#415 is related.

bgrant0607 · 2014-12-16T21:58:38Z

#490 is also somewhat related.

bgrant0607 · 2014-12-17T19:27:16Z

/cc @nikhiljindal

bgrant0607 · 2015-02-05T01:03:25Z

cc @rsokolowski

a-robinson · 2015-02-05T21:33:49Z

I assume our goal is to add an HTTP endpoint to these jobs so that it's easy for third parties to do what they will with the metrics, right? That's a much more extensible approach than having to add new plugins or libraries to kubernetes to support exporting to different metric aggregators.

Assuming that's the case, has anyone looked much into or worked with any of the existing libraries we could use for this? I see the following from a quick search:

expvar - the standard? It looks somewhat limited in that it doesn't allow for label fields beyond using a map. It also doesn't support distributions.
https://github.com/rcrowley/go-metrics - the most popular library I found on Google by a long shot. Supports a number of different metric types, including distributions. Doesn't appear to support label fields at all, or have a default way of exposing an HTTP endpoint. rcrowley has said he'd be willing to merge support for exposing through expvar, but has ignored all pull requests for the last few months.
https://github.com/codahale/metrics - a small wrapper around expvar. Supports distributions by turning them into percentile histograms periodically and exporting percentiles from as gauges. No labels fields or maps.
https://github.com/armon/go-metrics - pretty standard counter, gauge, and distribution metrics, mostly just seems notable for its large number of output formats.

thockin · 2015-02-05T21:36:26Z

AFAIK nobody has looked at anything here. I want to see things like, for
example, a histogram of query response time or a timer stack for scheduling
or a count of bindings processed...

On Thu, Feb 5, 2015 at 1:34 PM, Alex Robinson notifications@github.com
wrote:

I assume our goal is to add an HTTP endpoint to these jobs so that it's
easy for third parties to do what they will with the metrics, right? That's
a much more extensible approach than having to add new plugins or libraries
to kubernetes to support exporting to different metric aggregators.

Assuming that's the case, has anyone looked much into or worked with any
of the existing libraries we could use for this? I see the following from a
quick search:

expvar - the standard? It looks somewhat limited in that it doesn't
allow for label fields beyond using a map. It also doesn't support
distributions.
2.

https://github.com/rcrowley/go-metrics - the most popular library I
found on Google by a long shot. Supports a number of different metric
types, including distributions. Doesn't appear to support label fields at
all, or have a default way of exposing an HTTP endpoint. rcrowley has said
he'd be willing to merge support for exposing through expvar, but has
ignored all pull requests for the last few months.
3.

https://github.com/codahale/metrics - a small wrapper around expvar.
Supports distributions by turning them into percentile histograms
periodically and exporting percentiles from as gauges. No labels fields or
maps.
4.

https://github.com/armon/go-metrics - pretty standard counter, gauge,
and distribution metrics, mostly just seems notable for its large number of
output formats.

Reply to this email directly or view it on GitHub
#1625 (comment)
.

xiang90 · 2015-02-05T21:41:55Z

@ddysher expvar does not use a fixed url.

@thockin @a-robinson For publishing metrics over HTTP, etcd uses https://github.com/codahale/metrics. We have looked at other options too. But they are not suitable for a simple HTTP metrics endpoint.

thockin · 2015-02-05T21:45:30Z

I don't have a preference up front, I just want something a)
human-friendly, b) machine parseable, c) unambiguous, d) reasonably
self-describing

On Thu, Feb 5, 2015 at 1:42 PM, Xiang Li notifications@github.com wrote:

@ddysher https://github.com/ddysher expvar does not use a fixed url.

@thockin https://github.com/thockin @a-robinson
https://github.com/a-robinson For publishing metrics over HTTP, etcd
uses https://github.com/codahale/metrics. We have looked at other options
too. But they are not suitable for a simple HTTP metrics endpoint.

Reply to this email directly or view it on GitHub
#1625 (comment)
.

a-robinson · 2015-02-05T22:09:38Z

@xiang90, you do? All I see in etcd's repo is this custom wrapper around expvar. Maybe it was inspired by codahale's library (minus the histogram stuff), but it doesn't look like you're using codahale's stuff directly.

xiang90 · 2015-02-05T22:39:00Z

@a-robinson
https://github.com/coreos/etcd/blob/210/wal/wal.go#L332
https://github.com/coreos/etcd/blob/210/rafthttp/entry_reader.go#L43
https://github.com/coreos/etcd/blob/210/rafthttp/entry_reader.go#L69.

Actually, we first went with go-metrics and found it is over-kill for a http endpoint.
Then we try to wrap expvar to go the standard way without introducing dependency.
Later we want to support histogram and clean up our wrapper. And then we realized that what we came up with is almost the same with coda's implementation.

a-robinson · 2015-02-05T22:56:05Z

Ah, I'm sorry. Github hadn't indexed your change yet, so my searches for codahale and for metrics didn't turn that up.

Other than its support for distributions, the codahale library looks worse than directly using expvar. It uses global mutexes for all updates on counters, gauges, and histograms (one mutex for each category), and it doesn't seem to make the interfaces any nicer than expvar's.

I'd propose just using expvar directly for everything other than histograms, for which we could try using the implementation in rcrowley's larger library or codahale's hdrhistogram, with a bit of added logic to export them through expvar.

I'll throw a few simple metrics into the apiserver as a proof of concept.

xiang90 · 2015-02-05T23:06:02Z

It uses global mutexes for all updates on counters, gauges, and histograms (one mutex for each category), and it doesn't seem to make the interfaces any nicer than expvar's.

sure. we are not worry about stats the perf at this moment (since it is low frequency and low contention).
we need the wrapper mainly because we want to remove a metrics, support duplicate metrics. the expvar simply panics in that case. Other than that expvar is good enough.

davidopp · 2015-04-28T09:40:05Z

Here are a few other ideas. Maybe we should put this stuff in a doc...

etcd
- read/write rate
- is there some kind of queue of reads and/or writes waiting? (could collect this from the "front" of etcd or the "back" of apisever, since apiserver is etcd's only client)
- other custom metrics exported by etcd? (e.g. raft propagation lag to replicas, # leader elections per last 1m/15m/hour, # replicas up/down, ... obviously these don't matter until we are running multiple etcd instances)
conflicts
- client-specified resourceVersion out-of-date, detected at API server
- IIUC we don't use CAS at etcd level yet, so there are no version number conflicts at etcd yet?
scheduler
- time to complete most recent loop of trying to assign all unassigned pods
- number of unassigned pods (mentioned earlier) - break down between "just arrived and we haven't tried scheduling it yet" from "has been in the queue because there just isn't any machine where it fits", but also have an undifferentiated count from API server viewpoint to detect when the scheduler has deadlocked/livelocked
- average waiting time between unassigned and assigned - need to exclude pods that we've tried to assign but don't fit
- assignments considered OK by scheduler but rejected by Kubelet
- note: can we measure this stuff in the API server rather than the scheduler, so we get "for free" detecting that the scheduler has disappeared/died?
end-to-end latency from pod created to pod status == ready (need to filter out pods that were not feasible on the first try)
NodeController
- machines in NodeReady=={True, False, Unknown}
- compare # of machines registered in cloud provider (that should turn into k8s Nodes) vs. # of k8s Nodes
kubelet
- stats on package pull times
- we could do various pod-level stats like restarts, container crashes, etc.

all components

restarts in list 1m/15m/60m
resource usage and usage growth, as measured by cAdvisor stats
any kinds of go-level measurements (number of goroutines or whatever?)

vmarmol · 2015-04-28T15:17:16Z

Note that because the master does not register itself as a node the usage of the pods there is not tracked. I think someone was working on fixing that.

vmarmol · 2015-04-28T15:19:30Z

We measure package pull times in the Kubelet (and latency of Docker operations), but we don't have restarts, crashes, etc. We should add those.

Prometheus exports by default some metrics from the Go runtime as well.

a-robinson · 2015-04-28T17:38:45Z

+1 on etcd metrics - since it seems to be a common pain point, more visibility would be great

As for what Prometheus automatically exports - resident and virtual memory usage, cpu usage, number of goroutines, number (and max number) of open file descriptors, and process start time, although I don't know how accurate it all is.

lavalamp · 2015-04-28T17:55:43Z

IIUC we don't use CAS at etcd level yet, so there are no version number conflicts at etcd yet?

I'm not sure what this means-- we definitely do have etcd do CAS for us. Every CAS is sent to etcd.

assignments considered OK by scheduler but rejected by Kubelet

Good thing to track, but scheduler has no idea about the latter, so it may not be the best place to track it. Instead, how about "rejected by apiserver", and keep a histogram of rejection reasons. This would let us see the double-scheduling rate (to verify that it's very low).

restarts in list 1m/15m/60m

This should already be gettable from the status kubelet writes about the pod that the component runs in. I think we should only talk about things that aren't applicable to all components in this bug, because those things should be implemented by kubernetes on behalf of all pods in the system.

Other ideas:

Endpoint & replication manager controllers:
- current queue length
- stats on queue length
- stats on amount of time between {item marked as dirty/inserted into queue} and {begin processing item}
- stats on amount of time between {begin processing item} and {finish processing item}
- throughput: {finish processing item} per second

(stats on X means: rolling median, 99th %ile, maybe average)

All of that can be added to the workqueue object-- the interface allows it to collect all that info.

davidopp · 2015-04-29T06:43:56Z

Thanks for the feedback. Yeah, sorry about my confusion about etcd index vs. apiserver resourceVersion.

@a-robinson

As for what Prometheus automatically exports - resident and virtual memory usage, cpu usage,
number of goroutines, number (and max number) of open file descriptors, and process start time,
although I don't know that how accurate it all is.

Oh that's cool, I thought the only stuff we were exporting was the HTTP handlers you explicitly instrumented (PRs listed earlier in this issue). How do I access the Prometheus stats that are exported? And is every k8s component (api server, scheduler, controller manager, kubelet, etc.) linked with Prometheus, so we're getting these stats for every component?

@lavalamp

Instead, how about "rejected by apiserver", and keep a histogram of rejection reasons. This would let > us see the double-scheduling rate (to verify that it's very low).

Yeah sorry, I didn't mean to imply this would be counted by the scheduler (I should have labeled that section "scheduling" not "scheduler"). What I had in mind was something like exporting a count of the number of pods (cluster-wide) that have gone to phase podFailed with one of the message strings listed in handleNotFittingPods(). Does that sound reasonable? BTW I wasn't clear what you meant "rejected by apiserver"

@lavalamp

I think we should only talk about things that aren't applicable to all components in this bug, because
those things should be implemented by kubernetes on behalf of all pods in the system.

I guess it depends on whether you view this bug as being about "requirements" or "implementation." I was hoping this issue could basically be "additional instrumentation needed for 1.0 to make us feel comfortable that users will have enough information to debug problems." In that sense I think it doesn't matter whether it's something we only do in one component or in all components.

fgrzadkowski · 2015-04-29T08:50:37Z

@davidopp All of our components (i.e. api server, scheduler, controller amanger, kubelet) are exporting prometheus metrics on /metrics handler. I believe it's enabled everywhere now.

roberthbailey · 2015-05-06T15:51:06Z

It sounds like this issue might be resolved. @davidopp / @a-robinson can you verify that we have metrics available and then we can file individual issue if we believe any further metrics are necessary for v1.0?

wojtek-t · 2015-05-07T10:23:34Z

+1 for @roberthbailey - I think we already have a bunch of different metrics. In case we find something is missing we can file another issue.

davidopp · 2015-05-07T20:39:47Z

IMO this issue isn't finished. Maybe we can move it out of 1.0, but there is a long list of metrics that we don't have yet that will be useful for debugging production clusters (for example see my earlier comment in this issue:
#1625 (comment)
)

I'd rather not file individual issues right now as that will just explode the number of open issues.

a-robinson · 2015-05-12T17:14:46Z

I'd definitely be interested in scheduling latency as well as more visibility into etcd, such as errors broken down by type and number of open watches, but after thinking a little more I'm not convinced anything more is absolutely needed for 1.0.

roberthbailey · 2015-05-12T17:45:34Z

Thanks @a-robinson. Moving this to the v1.0-post milestone so that we can follow up with all of the great ideas for metrics discussed herein.

davidopp · 2015-05-12T22:36:09Z

We did an experiment today and discovered that it actually is possible to access the /metrics endpoint on the master components (previously we had thought not, because the master node does not register as a cluster node).

One way to do this (for apiserver; same should work for other master components, but you need to know their port number, instead of 8080 -- see pkg/master/ports/ports.go for port numbers)

run "kubectl proxy" (rest of example assumes it runs on port 8001)
in web browser go to
http://localhost:8001/api/v1beta3/pods
and find the apiserver pod's selfLink, for example
/api/v1beta3/namespaces/default/pods/kube-apiserver-kubernetes-master
construct a URL using that URL, like so: http://localhost:8001/api/v1beta3/proxy/namespaces/default/pods/kube-apiserver-kubernetes-master:8080/metrics

/cc @dchen1107

roberthbailey · 2015-05-12T22:57:50Z

I think that is because today the master kubelet is registered with the cluster. It'll be interesting to see if this still works after I explicitly detach them in #6949.

a-robinson · 2015-05-21T19:24:55Z

Do we have any visibility into or tracking of master component crashes/restarts?

bgrant0607 · 2017-03-09T03:57:49Z

I'm sure we could do more here, but closing in favor of more specific issues.

…s-check OCPBUGS-15866: remove readiness check for cache exclusion

thockin added area/introspection kind/design Categorizes issue or PR as related to design. kind/enhancement labels Oct 7, 2014

jdef mentioned this issue Nov 28, 2014

emit metrics for offers, tasks, etc. mesosphere/kubernetes-mesos#85

Closed

bgrant0607 mentioned this issue Dec 1, 2014

Node metrics #2675

Closed

ddysher self-assigned this Dec 2, 2014

bgrant0607 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Dec 4, 2014

bgrant0607 added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Dec 16, 2014

This was referenced Dec 16, 2014

Add monitorable stats to all binaries... #621

Closed

LIST pods is very slow or times out #2975

Closed

bgrant0607 added status/help-wanted and removed status/help-wanted labels Feb 5, 2015

davidopp mentioned this issue Apr 29, 2015

Kubelet running on the master should be accessible #6428

Closed

davidopp mentioned this issue Apr 30, 2015

Setup prometheus server and dashboard #6981

Closed

fgrzadkowski mentioned this issue May 6, 2015

Add latency metrics for etcd operations #7833

Merged

roberthbailey modified the milestones: v1.0-post, v1.0 May 12, 2015

bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015

mikedanese removed the team/master label Aug 20, 2015

lgomez mentioned this issue Nov 28, 2015

KUBECONFIG file loading order and per-directory configuration #17896

Closed

bgrant0607 removed the area/hosting label Jul 12, 2016

therc mentioned this issue Jul 20, 2016

Expose metrics for desired replica counts #29323

Closed

bgrant0607 removed the help-wanted label Aug 30, 2016

bgrant0607 added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Mar 9, 2017

bgrant0607 closed this as completed Mar 9, 2017

soltysh pushed a commit to soltysh/kubernetes that referenced this issue Aug 2, 2023

Merge pull request kubernetes#1625 from JoelSpeed/azure-drop-readines…

73ac561

…s-check OCPBUGS-15866: remove readiness check for cache exclusion

Instrumentation needed for kubelet, apiserver, etc #1625

Instrumentation needed for kubelet, apiserver, etc #1625

Comments

thockin commented Oct 7, 2014

lavalamp commented Oct 7, 2014

ddysher commented Dec 2, 2014

ddysher commented Dec 6, 2014

bgrant0607 commented Dec 16, 2014

bgrant0607 commented Dec 16, 2014

bgrant0607 commented Dec 16, 2014

bgrant0607 commented Dec 16, 2014

bgrant0607 commented Dec 17, 2014

bgrant0607 commented Feb 5, 2015

a-robinson commented Feb 5, 2015

thockin commented Feb 5, 2015

xiang90 commented Feb 5, 2015

thockin commented Feb 5, 2015

a-robinson commented Feb 5, 2015

xiang90 commented Feb 5, 2015

a-robinson commented Feb 5, 2015

xiang90 commented Feb 5, 2015

davidopp commented Apr 28, 2015

assignments considered OK by scheduler but rejected by Kubelet

machines in NodeReady=={True, False, Unknown}

restarts in list 1m/15m/60m

vmarmol commented Apr 28, 2015

vmarmol commented Apr 28, 2015

a-robinson commented Apr 28, 2015

lavalamp commented Apr 28, 2015

assignments considered OK by scheduler but rejected by Kubelet

restarts in list 1m/15m/60m

davidopp commented Apr 29, 2015

fgrzadkowski commented Apr 29, 2015

roberthbailey commented May 6, 2015

wojtek-t commented May 7, 2015

davidopp commented May 7, 2015

a-robinson commented May 12, 2015

roberthbailey commented May 12, 2015

davidopp commented May 12, 2015

roberthbailey commented May 12, 2015

a-robinson commented May 21, 2015

bgrant0607 commented Mar 9, 2017