Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation needed for kubelet, apiserver, etc #1625

Closed
thockin opened this issue Oct 7, 2014 · 46 comments
Closed

Instrumentation needed for kubelet, apiserver, etc #1625

thockin opened this issue Oct 7, 2014 · 46 comments
Assignees
Labels
area/introspection kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@thockin
Copy link
Member

thockin commented Oct 7, 2014

We do not report much in the way of stats (memory used, latency, counters, etc) for our core components.

@thockin thockin added area/introspection kind/design Categorizes issue or PR as related to design. kind/enhancement labels Oct 7, 2014
@lavalamp
Copy link
Member

lavalamp commented Oct 7, 2014

Let's use http://golang.org/pkg/expvar/ to publish metrics?

@ddysher
Copy link
Contributor

ddysher commented Dec 2, 2014

I've proposed the same thing on #2675

@lavalamp expvar package uses fixed url "/debug/vars", annoying. Downstream issue pointed to a package https://github.com/armon/go-metrics, I'll take a look.

@ddysher ddysher self-assigned this Dec 2, 2014
@bgrant0607 bgrant0607 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Dec 4, 2014
@ddysher
Copy link
Contributor

ddysher commented Dec 6, 2014

Most of the existing tools try to daemonize the stats/metrics process, which is not what we are looking for. I'd like simple package that we can expose metrics per component/server, probably easy enough to implement our own.

Speaking of which, running the stats as a per node daemon is also an option, but I think we are not there yet.

@bgrant0607
Copy link
Member

This is redundant with #621, but I'll close that one, as this has more discussion.

@bgrant0607
Copy link
Member

/cc @satnam6502

@bgrant0607 bgrant0607 added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Dec 16, 2014
@bgrant0607
Copy link
Member

#415 is related.

@bgrant0607
Copy link
Member

#490 is also somewhat related.

@bgrant0607
Copy link
Member

/cc @nikhiljindal

@bgrant0607
Copy link
Member

cc @rsokolowski

@a-robinson
Copy link
Contributor

I assume our goal is to add an HTTP endpoint to these jobs so that it's easy for third parties to do what they will with the metrics, right? That's a much more extensible approach than having to add new plugins or libraries to kubernetes to support exporting to different metric aggregators.

Assuming that's the case, has anyone looked much into or worked with any of the existing libraries we could use for this? I see the following from a quick search:

  1. expvar - the standard? It looks somewhat limited in that it doesn't allow for label fields beyond using a map. It also doesn't support distributions.
  2. https://github.com/rcrowley/go-metrics - the most popular library I found on Google by a long shot. Supports a number of different metric types, including distributions. Doesn't appear to support label fields at all, or have a default way of exposing an HTTP endpoint. rcrowley has said he'd be willing to merge support for exposing through expvar, but has ignored all pull requests for the last few months.
  3. https://github.com/codahale/metrics - a small wrapper around expvar. Supports distributions by turning them into percentile histograms periodically and exporting percentiles from as gauges. No labels fields or maps.
  4. https://github.com/armon/go-metrics - pretty standard counter, gauge, and distribution metrics, mostly just seems notable for its large number of output formats.

@thockin
Copy link
Member Author

thockin commented Feb 5, 2015

AFAIK nobody has looked at anything here. I want to see things like, for
example, a histogram of query response time or a timer stack for scheduling
or a count of bindings processed...

On Thu, Feb 5, 2015 at 1:34 PM, Alex Robinson notifications@github.com
wrote:

I assume our goal is to add an HTTP endpoint to these jobs so that it's
easy for third parties to do what they will with the metrics, right? That's
a much more extensible approach than having to add new plugins or libraries
to kubernetes to support exporting to different metric aggregators.

Assuming that's the case, has anyone looked much into or worked with any
of the existing libraries we could use for this? I see the following from a
quick search:

expvar - the standard? It looks somewhat limited in that it doesn't
allow for label fields beyond using a map. It also doesn't support
distributions.
2.

https://github.com/rcrowley/go-metrics - the most popular library I
found on Google by a long shot. Supports a number of different metric
types, including distributions. Doesn't appear to support label fields at
all, or have a default way of exposing an HTTP endpoint. rcrowley has said
he'd be willing to merge support for exposing through expvar, but has
ignored all pull requests for the last few months.
3.

https://github.com/codahale/metrics - a small wrapper around expvar.
Supports distributions by turning them into percentile histograms
periodically and exporting percentiles from as gauges. No labels fields or
maps.
4.

https://github.com/armon/go-metrics - pretty standard counter, gauge,
and distribution metrics, mostly just seems notable for its large number of
output formats.

Reply to this email directly or view it on GitHub
#1625 (comment)
.

@xiang90
Copy link
Contributor

xiang90 commented Feb 5, 2015

@ddysher expvar does not use a fixed url.

@thockin @a-robinson For publishing metrics over HTTP, etcd uses https://github.com/codahale/metrics. We have looked at other options too. But they are not suitable for a simple HTTP metrics endpoint.

@thockin
Copy link
Member Author

thockin commented Feb 5, 2015

I don't have a preference up front, I just want something a)
human-friendly, b) machine parseable, c) unambiguous, d) reasonably
self-describing

On Thu, Feb 5, 2015 at 1:42 PM, Xiang Li notifications@github.com wrote:

@ddysher https://github.com/ddysher expvar does not use a fixed url.

@thockin https://github.com/thockin @a-robinson
https://github.com/a-robinson For publishing metrics over HTTP, etcd
uses https://github.com/codahale/metrics. We have looked at other options
too. But they are not suitable for a simple HTTP metrics endpoint.

Reply to this email directly or view it on GitHub
#1625 (comment)
.

@a-robinson
Copy link
Contributor

@xiang90, you do? All I see in etcd's repo is this custom wrapper around expvar. Maybe it was inspired by codahale's library (minus the histogram stuff), but it doesn't look like you're using codahale's stuff directly.

@xiang90
Copy link
Contributor

xiang90 commented Feb 5, 2015

@a-robinson
https://github.com/coreos/etcd/blob/210/wal/wal.go#L332
https://github.com/coreos/etcd/blob/210/rafthttp/entry_reader.go#L43
https://github.com/coreos/etcd/blob/210/rafthttp/entry_reader.go#L69.

Actually, we first went with go-metrics and found it is over-kill for a http endpoint.
Then we try to wrap expvar to go the standard way without introducing dependency.
Later we want to support histogram and clean up our wrapper. And then we realized that what we came up with is almost the same with coda's implementation.

@a-robinson
Copy link
Contributor

Ah, I'm sorry. Github hadn't indexed your change yet, so my searches for codahale and for metrics didn't turn that up.

Other than its support for distributions, the codahale library looks worse than directly using expvar. It uses global mutexes for all updates on counters, gauges, and histograms (one mutex for each category), and it doesn't seem to make the interfaces any nicer than expvar's.

I'd propose just using expvar directly for everything other than histograms, for which we could try using the implementation in rcrowley's larger library or codahale's hdrhistogram, with a bit of added logic to export them through expvar.

I'll throw a few simple metrics into the apiserver as a proof of concept.

@xiang90
Copy link
Contributor

xiang90 commented Feb 5, 2015

It uses global mutexes for all updates on counters, gauges, and histograms (one mutex for each category), and it doesn't seem to make the interfaces any nicer than expvar's.

sure. we are not worry about stats the perf at this moment (since it is low frequency and low contention).
we need the wrapper mainly because we want to remove a metrics, support duplicate metrics. the expvar simply panics in that case. Other than that expvar is good enough.

@davidopp
Copy link
Member

Here are a few other ideas. Maybe we should put this stuff in a doc...

  • etcd
    • read/write rate
    • is there some kind of queue of reads and/or writes waiting? (could collect this from the "front" of etcd or the "back" of apisever, since apiserver is etcd's only client)
    • other custom metrics exported by etcd? (e.g. raft propagation lag to replicas, # leader elections per last 1m/15m/hour, # replicas up/down, ... obviously these don't matter until we are running multiple etcd instances)
  • conflicts
    • client-specified resourceVersion out-of-date, detected at API server
    • IIUC we don't use CAS at etcd level yet, so there are no version number conflicts at etcd yet?
  • scheduler
    • time to complete most recent loop of trying to assign all unassigned pods
    • number of unassigned pods (mentioned earlier) - break down between "just arrived and we haven't tried scheduling it yet" from "has been in the queue because there just isn't any machine where it fits", but also have an undifferentiated count from API server viewpoint to detect when the scheduler has deadlocked/livelocked
    • average waiting time between unassigned and assigned - need to exclude pods that we've tried to assign but don't fit
    • assignments considered OK by scheduler but rejected by Kubelet

    • note: can we measure this stuff in the API server rather than the scheduler, so we get "for free" detecting that the scheduler has disappeared/died?
  • end-to-end latency from pod created to pod status == ready (need to filter out pods that were not feasible on the first try)
  • NodeController
    • machines in NodeReady=={True, False, Unknown}

    • compare # of machines registered in cloud provider (that should turn into k8s Nodes) vs. # of k8s Nodes
  • kubelet
    • stats on package pull times
    • we could do various pod-level stats like restarts, container crashes, etc.

all components

  • restarts in list 1m/15m/60m

  • resource usage and usage growth, as measured by cAdvisor stats
  • any kinds of go-level measurements (number of goroutines or whatever?)

@vmarmol
Copy link
Contributor

vmarmol commented Apr 28, 2015

Note that because the master does not register itself as a node the usage of the pods there is not tracked. I think someone was working on fixing that.

@vmarmol
Copy link
Contributor

vmarmol commented Apr 28, 2015

We measure package pull times in the Kubelet (and latency of Docker operations), but we don't have restarts, crashes, etc. We should add those.

Prometheus exports by default some metrics from the Go runtime as well.

@a-robinson
Copy link
Contributor

+1 on etcd metrics - since it seems to be a common pain point, more visibility would be great

As for what Prometheus automatically exports - resident and virtual memory usage, cpu usage, number of goroutines, number (and max number) of open file descriptors, and process start time, although I don't know how accurate it all is.

@lavalamp
Copy link
Member

IIUC we don't use CAS at etcd level yet, so there are no version number conflicts at etcd yet?

I'm not sure what this means-- we definitely do have etcd do CAS for us. Every CAS is sent to etcd.

  • assignments considered OK by scheduler but rejected by Kubelet

Good thing to track, but scheduler has no idea about the latter, so it may not be the best place to track it. Instead, how about "rejected by apiserver", and keep a histogram of rejection reasons. This would let us see the double-scheduling rate (to verify that it's very low).

  • restarts in list 1m/15m/60m

This should already be gettable from the status kubelet writes about the pod that the component runs in. I think we should only talk about things that aren't applicable to all components in this bug, because those things should be implemented by kubernetes on behalf of all pods in the system.

Other ideas:

  • Endpoint & replication manager controllers:
    • current queue length
    • stats on queue length
    • stats on amount of time between {item marked as dirty/inserted into queue} and {begin processing item}
    • stats on amount of time between {begin processing item} and {finish processing item}
    • throughput: {finish processing item} per second

(stats on X means: rolling median, 99th %ile, maybe average)

All of that can be added to the workqueue object-- the interface allows it to collect all that info.

@davidopp
Copy link
Member

Thanks for the feedback. Yeah, sorry about my confusion about etcd index vs. apiserver resourceVersion.

@a-robinson

As for what Prometheus automatically exports - resident and virtual memory usage, cpu usage,
number of goroutines, number (and max number) of open file descriptors, and process start time,
although I don't know that how accurate it all is.

Oh that's cool, I thought the only stuff we were exporting was the HTTP handlers you explicitly instrumented (PRs listed earlier in this issue). How do I access the Prometheus stats that are exported? And is every k8s component (api server, scheduler, controller manager, kubelet, etc.) linked with Prometheus, so we're getting these stats for every component?

@lavalamp

Instead, how about "rejected by apiserver", and keep a histogram of rejection reasons. This would let > us see the double-scheduling rate (to verify that it's very low).

Yeah sorry, I didn't mean to imply this would be counted by the scheduler (I should have labeled that section "scheduling" not "scheduler"). What I had in mind was something like exporting a count of the number of pods (cluster-wide) that have gone to phase podFailed with one of the message strings listed in handleNotFittingPods(). Does that sound reasonable? BTW I wasn't clear what you meant "rejected by apiserver"

@lavalamp

I think we should only talk about things that aren't applicable to all components in this bug, because
those things should be implemented by kubernetes on behalf of all pods in the system.

I guess it depends on whether you view this bug as being about "requirements" or "implementation." I was hoping this issue could basically be "additional instrumentation needed for 1.0 to make us feel comfortable that users will have enough information to debug problems." In that sense I think it doesn't matter whether it's something we only do in one component or in all components.

@fgrzadkowski
Copy link
Contributor

@davidopp All of our components (i.e. api server, scheduler, controller amanger, kubelet) are exporting prometheus metrics on /metrics handler. I believe it's enabled everywhere now.

@roberthbailey
Copy link
Contributor

It sounds like this issue might be resolved. @davidopp / @a-robinson can you verify that we have metrics available and then we can file individual issue if we believe any further metrics are necessary for v1.0?

@wojtek-t
Copy link
Member

wojtek-t commented May 7, 2015

+1 for @roberthbailey - I think we already have a bunch of different metrics. In case we find something is missing we can file another issue.

@davidopp
Copy link
Member

davidopp commented May 7, 2015

IMO this issue isn't finished. Maybe we can move it out of 1.0, but there is a long list of metrics that we don't have yet that will be useful for debugging production clusters (for example see my earlier comment in this issue:
#1625 (comment)
)

I'd rather not file individual issues right now as that will just explode the number of open issues.

@a-robinson
Copy link
Contributor

I'd definitely be interested in scheduling latency as well as more visibility into etcd, such as errors broken down by type and number of open watches, but after thinking a little more I'm not convinced anything more is absolutely needed for 1.0.

@roberthbailey roberthbailey modified the milestones: v1.0-post, v1.0 May 12, 2015
@roberthbailey
Copy link
Contributor

Thanks @a-robinson. Moving this to the v1.0-post milestone so that we can follow up with all of the great ideas for metrics discussed herein.

@davidopp
Copy link
Member

We did an experiment today and discovered that it actually is possible to access the /metrics endpoint on the master components (previously we had thought not, because the master node does not register as a cluster node).

One way to do this (for apiserver; same should work for other master components, but you need to know their port number, instead of 8080 -- see pkg/master/ports/ports.go for port numbers)

  1. run "kubectl proxy" (rest of example assumes it runs on port 8001)
  2. in web browser go to
    http://localhost:8001/api/v1beta3/pods
    and find the apiserver pod's selfLink, for example
    /api/v1beta3/namespaces/default/pods/kube-apiserver-kubernetes-master
  3. construct a URL using that URL, like so: http://localhost:8001/api/v1beta3/proxy/namespaces/default/pods/kube-apiserver-kubernetes-master:8080/metrics

/cc @dchen1107

@roberthbailey
Copy link
Contributor

I think that is because today the master kubelet is registered with the cluster. It'll be interesting to see if this still works after I explicitly detach them in #6949.

@a-robinson
Copy link
Contributor

Do we have any visibility into or tracking of master component crashes/restarts?

@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@bgrant0607 bgrant0607 added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Mar 9, 2017
@bgrant0607
Copy link
Member

I'm sure we could do more here, but closing in favor of more specific issues.

soltysh pushed a commit to soltysh/kubernetes that referenced this issue Aug 2, 2023
…s-check

OCPBUGS-15866: remove readiness check for cache exclusion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/introspection kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests