New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a metric exposing number of objects per type #59757

Merged
merged 2 commits into from Feb 24, 2018

Conversation

@gmarek
Member

gmarek commented Feb 12, 2018

Fix #51953

Adds a goroutine that periodically checks the count of objects in etcd and publishes a metric with this data.

APIserver backed by etcdv3 exports metric showing number of resources per kind
@@ -284,6 +289,20 @@ func (s *DefaultStorageFactory) NewConfig(groupResource schema.GroupResource) (*
}
glog.V(3).Infof("storing %v in %v, reading as %v from %#v", groupResource, codecConfig.StorageVersion, codecConfig.MemoryVersion, codecConfig.Config)
if storageConfig.Type == storagebackend.StorageTypeETCD3 {

This comment has been minimized.

@liggitt

liggitt Feb 12, 2018

Member

really not a fan of leaking the storage implementation here

This comment has been minimized.

@gmarek

gmarek Feb 13, 2018

Member

I can just always fill those details here. It's generic enough that it should be possible to implement in arbitrary backend.

storageConfig.ResourcesToMonitor = make(map[string]string)
}
resourcePrefix := s.ResourcePrefix(groupResource)
path := []string{"/registry"}

This comment has been minimized.

@liggitt

liggitt Feb 12, 2018

Member

this sets up a duplicate prefix construction path that doesn't honor the configured etcd prefix (--etcd-prefix), and assumes nesting under the API group name, which isn't a valid assumption

This comment has been minimized.

@gmarek

gmarek Feb 13, 2018

Member

Prefix is easily fixable, I'm not sure about the api group name - why it's not valid (except core/v1 ones)?

This comment has been minimized.

@liggitt

liggitt Feb 13, 2018

Member

none of the kube-apiserver types place themselves under group subpaths. see test/integration/etcd/etcd_storage_path_test.go for all the various non-standard key paths used.

I don't think we should set up another place like this where we try to pull together all the etcd prefix + resource prefix config values... if we want to run metrics on storage, why not do it inside generic store and storage, where this information already lives?

This comment has been minimized.

@gmarek

gmarek Feb 14, 2018

Member

I was looking into that yesterday, but got pulled into different things before I was able to get to the bottom of this. Short answer is that I have no idea how the storage code is structured.

This comment has been minimized.

@gmarek

gmarek Feb 14, 2018

Member

The way I understand it is that Storage (the layer that talks to etcd and the place where monitoring needs to live) is created before generic.Store(s) and get injected into Stores by something (Decorator)?.

So I can either modify the injection logic, to also pass some data down to Storage (which does have an unpleasant smell to it), or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request.

This comment has been minimized.

@gmarek

gmarek Feb 14, 2018

Member

The issue with the latter is that I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher, or just always return -1/unimplemented error. Neither is particularly nice...

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

or add a simple "Count" function to Storage. I'd rather do the latter, but I don't really want to make it full blown API-like request

That is the most accurate representation of what this is attempting to do. If we want to pull info out of etcd in a new way, the Storage interface is the correct place to represent that.

I have to either provide EXTREMELY inefficient implementations for etcdv2 and Cacher

Etcd2 returning an unimplemented error doesn't particularly bother me. Cacher could delegate to underlying store (as it does for most things), or add a KeyCount() function to the cache.Store interface

This comment has been minimized.

@gmarek

gmarek Feb 14, 2018

Member

This is what I'll try to send out today, assuming I'll stop getting interrupted...

@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) {
fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval,
"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.")
fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", 5*time.Second, ""+

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

I don't know how expensive a call this is, but counting every resource type every 5 seconds is more frequent than I expected. who would be a good resource to ask about this?

This comment has been minimized.

@gmarek

gmarek Feb 14, 2018

Member

I can put anything you prefer here. As for resources to know I guess someone from etcd team - maybe @xiang90 still works on that?

This comment has been minimized.

@gmarek

gmarek Feb 15, 2018

Member

I increased it to one minute interval. It should be good enough, but we'll see how scale tests will behave. @shyamjvs @kubernetes/sig-scalability-misc

objectCounts = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "apiserver_object_counts",
Help: "Current number of objects stored. Counts only types that have watch-cache enabled.",

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

I don't think the comment is accurate... looks like it would work without the watch cache, right?

This comment has been minimized.

@gmarek

gmarek Feb 14, 2018

Member

Right.

@@ -1365,9 +1367,26 @@ func (e *Store) CompleteWithOptions(options *generic.StoreOptions) error {
)
}
if opts.CountMetricPollPeriod > 0 {
e.startObservingCount(e.DefaultQualifiedResource.String(), opts.StorageConfig.Prefix+prefix, opts.CountMetricPollPeriod)

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

don't run forever, make sure we have a stop channel and non-nil Destroy() func that will stop this

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

for example:

stopMetricsCh := make(chan struct{})
destroy := e.DestroyFunc
e.DestroyFunc = func(){
  close(stopMetricsCh)
  if destroy != nil {
    destroy()
  }
}

e.startObservingCount(..., stopMetricsCh)

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

don't join opts.StorageConfig.Prefix+prefix here... the storage is responsible for dealing with prepending opts.StorageConfig.Prefix

also, don't assume prefix is the key... call e.KeyRootFunc() to get the key

This comment has been minimized.

@gmarek

gmarek Feb 15, 2018

Member

I started to return a destroy function from 'start...', which is functionally equivalent.

return nil
}
func (e *Store) startObservingCount(resourceName, pathPrefix string, period time.Duration) {
glog.V(2).Infof("Monitoring %v bound to %v", resourceName, pathPrefix)
go func() {

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

simplify to go wait.Forever(...?

This comment has been minimized.

@gmarek

gmarek Feb 14, 2018

Member

Done.

@@ -85,6 +85,14 @@ type objState struct {
stale bool
}
func (s *store) Count(pathPrefix string) (int64, error) {

This comment has been minimized.

@liggitt

liggitt Feb 14, 2018

Member

I'd expect the key parameter to match that passed to GetToList(), etc, and get joined with the pathPrefix already held by store

func (s *store) Count(key string) (int64, error) {
  key = path.Join(s.pathPrefix, key)
  getResp, err := s.client.KV.Get(context.Background(), key, ...
  ...
}

This comment has been minimized.

@gmarek

gmarek Feb 15, 2018

Member

Done.

@liggitt

This comment has been minimized.

Member

liggitt commented Feb 14, 2018

@gmarek

This comment has been minimized.

Member

gmarek commented Feb 15, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

1 similar comment
@gmarek

This comment has been minimized.

Member

gmarek commented Feb 15, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@gmarek gmarek changed the title from WIP: add a metric exposing number of objects per type to Add a metric exposing number of objects per type Feb 15, 2018

@mbohlool

This comment has been minimized.

Member

mbohlool commented Feb 15, 2018

@k8s-ci-robot k8s-ci-robot requested a review from cheftako Feb 15, 2018

@@ -1365,9 +1367,34 @@ func (e *Store) CompleteWithOptions(options *generic.StoreOptions) error {
)
}
if opts.CountMetricPollPeriod > 0 {
stopFunc := e.startObservingCount(e.DefaultQualifiedResource.String(), prefix, opts.CountMetricPollPeriod)

This comment has been minimized.

@liggitt

liggitt Feb 16, 2018

Member

don't assume prefix is the key... call e.KeyRootFunc() to get the key

this is still outstanding

This comment has been minimized.

@liggitt

liggitt Feb 16, 2018

Member

also, no need to pass DefaultQualifiedResource and the output of KeyRootFunc as args, startObservingCount can derive those internally

This comment has been minimized.

@gmarek

gmarek Feb 16, 2018

Member

Should I change the decorator call above as well?

@shyamjvs

This comment has been minimized.

Member

shyamjvs commented Feb 16, 2018

I briefly looked at your PR and it looks great. However, it seems like you're getting the metric from storage (correct me if I'm wrong). You may instead want to fetch it from the watch cache directly (which should work for all except iirc CRDs, events?) - I think @smarterclayton was also suggesting earlier.

I'm not asking for it in this PR - but maybe add a TODO?

@liggitt

This comment has been minimized.

Member

liggitt commented Feb 16, 2018

this seems reasonable to me (assuming range checks are cheap). would like an ack on change to Storage interface from @kubernetes/sig-api-machinery-pr-reviews

@gmarek

This comment has been minimized.

Member

gmarek commented Feb 16, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

if err != nil {
glog.V(10).Infof("Failed to update storage count metric: %v", err)
}
metrics.UpdateObjectCount(resourceName, count)

This comment has been minimized.

@tallclair

tallclair Feb 16, 2018

Member

This is already a polled value, so there's some expectation that it's approximate. Does prometheus have any notion of "last updated"?

resourceName := e.DefaultQualifiedResource.String()
glog.V(2).Infof("Monitoring %v bound to <storage-prefix>/%v", resourceName, prefix)
stopCh := make(chan struct{})
go wait.Until(func() {

This comment has been minimized.

@tallclair

tallclair Feb 16, 2018

Member

This should use JitterUntil

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

Done.

resourceName := e.DefaultQualifiedResource.String()
glog.V(2).Infof("Monitoring %v bound to <storage-prefix>/%v", resourceName, prefix)
stopCh := make(chan struct{})
go wait.Until(func() {

This comment has been minimized.

@tallclair

tallclair Feb 16, 2018

Member

Do we really want a separate go routine for every resource type? As opposed to a single routine that loops through all resources (possibly waiting between each)?

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

That's the cost of doing it at the store interface level. Alternative would be to "register" types here for the Storage to use, but it would required leaking stuff from the higher to lower levels of the stack. I didn't like this design.

Or we could have a global registry of things to pull, but I really don't like global variables.

Functionally it's not a big difference (except the fact that polling in completely unsynchronized way can cause few calls to be issued in the same-ish moment). It it proves to be an issue, it's easy enough to change to global variable approach.

glog.V(2).Infof("Monitoring %v bound to <storage-prefix>/%v", resourceName, prefix)
stopCh := make(chan struct{})
go wait.Until(func() {
count, err := e.Storage.Count(prefix)

This comment has been minimized.

@tallclair

tallclair Feb 16, 2018

Member

Is there a possibility that this could take a long time? If so, I this should use a sliding period (on JitterUntil), see above comment.

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

Yup, done that already.

func (e *Store) startObservingCount(period time.Duration) func() {
prefix := e.KeyRootFunc(genericapirequest.NewContext())
resourceName := e.DefaultQualifiedResource.String()
glog.V(2).Infof("Monitoring %v bound to <storage-prefix>/%v", resourceName, prefix)

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

Confusing message. Maybe s/bound/count/ ?

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

Done.

go wait.Until(func() {
count, err := e.Storage.Count(prefix)
if err != nil {
glog.V(10).Infof("Failed to update storage count metric: %v", err)

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

Why can this happen? 10 is pretty high for an error.

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

It may happen if someone is using etcd2, which doesn't have sane implementation of Count. We can treat it more seriously when we drop etcd2 support:)

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

don't we have a NotImplemented error type?

return nil
}
// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection.
func (e *Store) startObservingCount(period time.Duration) func() {

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

What happens for co-habitation? Will we get two monitors?

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

I don't know storage code good enough to answer that. It depends on how co-habitation is implemented and whether two Stores for co-habitating resources share the same Storage object, or control will enter the if Storage == nilbranch twice.

Having two monitors is not ideal, but it's also not a disaster IMO.

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

Agree. We can live with that.

return nil
}
// startObservingCount starts monitoring given prefix and periodically updating metrics. It return a function to stop collection.

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

nit: returns.

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

Done.

@@ -586,6 +586,10 @@ func (h *etcdHelper) GuaranteedUpdate(
}
}
func (*etcdHelper) Count(pathPerfix string) (int64, error) {
return 0, fmt.Errorf("Count is unimplemented for etcd2!")

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

is this an error? If we want to distinguish between error and "not implement" we should have at least a special pre-defined error type.

This comment has been minimized.

@gmarek

gmarek Feb 19, 2018

Member

Agreed, but I guess we'd need to add it to apimachinery/pkg/api/errors to be consistently used. I can add it in separate PR, as I expect a lot of discussions around that.

This comment has been minimized.

@sttts

sttts Feb 19, 2018

Contributor

Alternative: convention that -1 means "not implemented"

This comment has been minimized.

@gmarek

gmarek Feb 20, 2018

Member

-1 currently means "no data", as per @liggitt request. Also we're not writing in C to do such things;)

@gmarek

This comment has been minimized.

Member

gmarek commented Feb 19, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@gmarek

This comment has been minimized.

Member

gmarek commented Feb 20, 2018

@sttts @liggitt - please let me know what's necessary to get this PR in before the freeze. kubemark-big test passed, so it's not impacting the performance too much.

@wojtek-t

/approve no-issue

objectCounts = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "etcd_object_counts",
Help: "Current number of stored objects split by kind.",

This comment has been minimized.

@wojtek-t

wojtek-t Feb 21, 2018

Member

It's not "Current". It's number of objects from some period of time in the last minute (or whetever user will say). May be worth being explicit...

This comment has been minimized.

@gmarek

gmarek Feb 21, 2018

Member

I will change it, but frankly no metric is ever "current" in the strict sense:)

@gmarek

This comment has been minimized.

Member

gmarek commented Feb 21, 2018

Ping @sttts @liggitt - I'd like to get this in before the freeze.

@liggitt

This comment has been minimized.

Member

liggitt commented Feb 21, 2018

would like an ack from @lavalamp or @deads2k on the Storage interface addition, since it could impact api-machinery storage plans (though it seems a pretty reasonable thing to support to me)

go wait.JitterUntil(func() {
count, err := e.Storage.Count(prefix)
if err != nil {
glog.V(10).Infof("Failed to update storage count metric: %v", err)

This comment has been minimized.

@deads2k

deads2k Feb 21, 2018

Contributor

This log level would include client calls. You probably want 5, just before the client calls

This comment has been minimized.

@gmarek

gmarek Feb 22, 2018

Member

Done.

@@ -155,6 +156,8 @@ func (s *EtcdOptions) AddFlags(fs *pflag.FlagSet) {
fs.DurationVar(&s.StorageConfig.CompactionInterval, "etcd-compaction-interval", s.StorageConfig.CompactionInterval,
"The interval of compaction requests. If 0, the compaction request from apiserver is disabled.")
fs.DurationVar(&s.StorageConfig.CountMetricPollPeriod, "etcd-count-metric-poll-period", time.Minute, ""+

This comment has been minimized.

@deads2k

deads2k Feb 21, 2018

Contributor

instead of time.Minute using s.StorageConfig.CountMetricPollPeriod allows overriding defaults like the other options above

This comment has been minimized.

@deads2k

deads2k Feb 21, 2018

Contributor

And you can still set a default in NewEtcdOptions above

This comment has been minimized.

@gmarek

gmarek Feb 22, 2018

Member

It's not perfect, but done.

@deads2k

This comment has been minimized.

Contributor

deads2k commented Feb 21, 2018

A few comments. It looks good to me otherwise. The etcd2 thing is unfortunate, but I don't think it will be common and the retry shouldn't hot loop

@@ -412,6 +412,15 @@ func (s *store) GetToList(ctx context.Context, key string, resourceVersion strin
return s.versioner.UpdateList(listObj, uint64(getResp.Header.Revision), "")
}
func (s *store) Count(key string) (int64, error) {
key = path.Join(s.pathPrefix, key)
getResp, err := s.client.KV.Get(context.Background(), key, clientv3.WithRange(clientv3.GetPrefixRangeEnd(key)), clientv3.WithCountOnly())

This comment has been minimized.

@xiang90

xiang90 Feb 21, 2018

Contributor

we should not issue large range to etcd. just to count the keys, we should use keys only option with limit and maybe also with pagination.

This comment has been minimized.

@liggitt

liggitt Feb 21, 2018

Member

is WithCountOnly not more efficient than WithKeysOnly?

This comment has been minimized.

@xiang90

xiang90 Feb 21, 2018

Contributor

oh. i did not see count only is already there. but still if we know the count is going to be huge, probably need pagination as well. etcd does not maintain a index for count(*), but scan everything internally.

This comment has been minimized.

@xiang90

xiang90 Feb 21, 2018

Contributor

doing a micro-benchmark could be helpful. if count 1M keys take <1s, it might be ok.

This comment has been minimized.

@wojtek-t

wojtek-t Feb 22, 2018

Member

If we would have 1M keys, we would have other problems than this metric...

This comment has been minimized.

@xiang90

xiang90 Feb 22, 2018

Contributor

counting 1M keys just to make the test result more reliable and safer. you can divide that by 10 to get 100k for an upper bound, etc.. periodically blocking etcd for more than 50ms is not good in general (for node kinds there can be 5k, for pods there can be 50k). i have not really looked into this pr. so just some random suggestions and thoughts on this.

This comment has been minimized.

@gmarek

gmarek Feb 22, 2018

Member

@xiang90

I can run benchmarks after the freeze, as it's not enough time now. It shouldn't be an issue, as it's a 1-word change to disable this functionality.

As for pagination I'm not sure I follow. I believe that etcd-side 'WithLimit' restricts response size (i.e. number of items returned), which shouldn't apply to "count only" requests, as they always return single value. Or is the 'countOnly' implemented on the client side, thus "limit" will effectively cap the returned value to, well, limit?

Or maybe you meant doing some kind of manual pagination by dividing the range into multiple subranges, querying them separately and combining the result?

@gmarek

This comment has been minimized.

Member

gmarek commented Feb 23, 2018

@liggitt @deads2k - can we get this in and iterate on performance if needed afterwards?

@liggitt

This comment has been minimized.

Member

liggitt commented Feb 23, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 23, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 23, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gmarek, liggitt, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 24, 2018

Automatic merge from submit-queue (batch tested with PRs 57672, 60299, 59757, 60283, 60265). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit e3e954a into kubernetes:master Feb 24, 2018

13 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation gmarek authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@smarterclayton

This comment has been minimized.

Contributor

smarterclayton commented Feb 27, 2018

Note that this will not work correctly if we start separating resources into multiple keys, nor if there are other keys interspersed into a range that can't be accessed by name. I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

@liggitt

This comment has been minimized.

Member

liggitt commented Feb 27, 2018

I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

If you feel strongly we can revert

@wojtek-t

This comment has been minimized.

Member

wojtek-t commented Feb 27, 2018

I would have really preferred something that just remembered the last full list or checked the watch cache periodically.

The solution with watch cache was also my first thought, but it requires having watch cache enabled. And we were already discussing in the past enabling it only for some resources (and e.g for events it's not even enabled now).
The problem with remembering last list result is that they may not happen frequently (example: secrets - I think nothing is really listing all secrets).

That said, your argument is also very good one. Especially, since I may try to attack the problem with "making heartbeats cheaper (on etcd side)" by separating "hearbeat" part of object into separate thing in etcd (leaving the api as is).

I don't have any better suggestion now though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment