Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try yet again to add metrics about LIST handling #104983

Merged
merged 1 commit into from Sep 22, 2021

Conversation

@MikeSpreitzer
Copy link
Member

@MikeSpreitzer MikeSpreitzer commented Sep 13, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds metrics about how LIST requests were handled.
Some LIST requests take a lot more CPU per unit of execution duration than most requests, making concurrency limiters (the max-in-flight filter, the API Priority and Fairness (APF) filter) not very effective at managing the CPU usage of apiservers.
We are modifying the APF filter to recognize that some LIST requests are exceptionally costly, but:

  1. we need data to tune this, and
  2. we want to be able to get useful data from customers in the field.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This is an alternative to #104723 and is built on #104981 . That is, review of #104723 suggested doing the last bit of plumbing in a separate PR, which I just created (#104981). The PR at hand includes #104981 and will be re-based onto master once #104981 merges.

Does this PR introduce a user-facing change?

The kube-apiserver's Prometheus metrics have been extended with some that describe the costs of handling LIST requests.  They are as follows.
- *apiserver_cache_list_total*: Counter of LIST requests served from watch cache, broken down by resource_prefix and index_name
- *apiserver_cache_list_fetched_objects_total*: Counter of objects read from watch cache in the course of serving a LIST request, broken down by resource_prefix and index_name
- *apiserver_cache_list_evaluated_objects_total*: Counter of objects tested in the course of serving a LIST request from watch cache, broken down by resource_prefix
- *apiserver_cache_list_returned_objects_total*: Counter of objects returned for a LIST request from watch cache, broken down by resource_prefix
- *apiserver_storage_list_total*: Counter of LIST requests served from etcd, broken down by resource
- *apiserver_storage_list_fetched_objects_total*: Counter of objects read from etcd in the course of serving a LIST request, broken down by resource
- *apiserver_storage_list_evaluated_objects_total*: Counter of objects tested in the course of serving a LIST request from etcd, broken down by resource
- *apiserver_storage_list_returned_objects_total*: Counter of objects returned for a LIST request from etcd, broken down by resource

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 13, 2021

Loading

@k8s-ci-robot k8s-ci-robot requested a review from wojtek-t Sep 13, 2021
@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 13, 2021

Loading

@k8s-ci-robot k8s-ci-robot requested a review from dgrisonnet Sep 13, 2021
@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 14, 2021

To quantify the amount of metrics added, I used local-up-cluster and took a sample.

mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ kubectl get --raw /metrics > /tmp/m.txt

mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ wc /tmp/m.txt 
  12284   26453 1770029 /tmp/m.txt

mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ grep -c apiserver_storage_list_ /tmp/m.txt
204

mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ grep -c apiserver_cache_list_ /tmp/m.txt
198

Loading

@wojtek-t
Copy link
Member

@wojtek-t wojtek-t commented Sep 14, 2021

/assign

@dgrisonnet
/assign @jan--f @ehashman

Loading

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Sep 14, 2021

@wojtek-t: GitHub didn't allow me to assign the following users: jan--f.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign

@dgrisonnet
/assign @jan--f @ehashman

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Loading

@@ -716,9 +720,10 @@ func (c *Cacher) List(ctx context.Context, key string, opts storage.ListOptions,
if listVal.Kind() != reflect.Slice {
return fmt.Errorf("need a pointer to slice, got %v", listVal.Kind())
}
var numEvals int
Copy link
Member

@wojtek-t wojtek-t Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced we want to have that for cacher - this will always be equal to the number of fetched.

So I would personally remove that metric.

Loading

@@ -724,13 +727,22 @@ func (s *store) List(ctx context.Context, key string, opts storage.ListOptions,
var lastKey []byte
var hasMore bool
var getResp *clientv3.GetResponse
var numRangeQueries int
Copy link
Member

@wojtek-t wojtek-t Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove - it's not used anywhere

Loading

@MikeSpreitzer MikeSpreitzer force-pushed the list-metrics-take3 branch from fb962d7 to 5759a2f Sep 14, 2021
@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 14, 2021

The force-push to 5759a2f fixes the oversights noted above and adds the intended caching of the string form of the schema.GroupResource in the store struct.

Loading

@fedebongio
Copy link
Contributor

@fedebongio fedebongio commented Sep 14, 2021

/triage accepted

Loading

Copy link
Member

@wojtek-t wojtek-t left a comment

I'm fine with this PR modulo one minor comment that I added.

But I would like to hear from sig-instrumentation folks before approving.
@dgrisonnet @logicalhan @ehashman - thoughts?

Loading

Loading
Copy link
Member

@dgrisonnet dgrisonnet left a comment

One concern from my side regarding the index label.

Loading

listCacheCount = compbasemetrics.NewCounterVec(
&compbasemetrics.CounterOpts{
Name: "apiserver_cache_list_total",
Help: "Number of LIST requests served from watch cache",
StabilityLevel: compbasemetrics.ALPHA,
},
[]string{"resource_prefix", "index"},
)
listCacheNumFetched = compbasemetrics.NewCounterVec(
&compbasemetrics.CounterOpts{
Name: "apiserver_cache_list_fetched_objects_total",
Help: "Number of objects read from watch cache in the course of serving a LIST request",
StabilityLevel: compbasemetrics.ALPHA,
},
[]string{"resource_prefix", "index"},
Copy link
Member

@dgrisonnet dgrisonnet Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the use case for the index label? I can see how these metrics can be useful per resource, but I don't think that having the index used to filter the list is useful to the consumers of these metrics. Also, without the actual value used when matching the index, it is difficult to come up with actual conclusions since based on the value, different objects will be fetched and returned.

Loading

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case is to understand the impact of indexing on the costs of handling LIST requests. Just knowing the index name lets us know which indices (if any) are proving helpful.

Loading

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have histograms instead of counters, but that was rejected.

Loading

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a commit that dares to add one histogram. Perhaps that is not too much to ask?

Loading

Copy link
Member

@dgrisonnet dgrisonnet Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the insights you are trying to get from the index label, it makes sense to me to keep it.

Loading

Add metrics that illuminate the costs of handling LIST requests.
@MikeSpreitzer MikeSpreitzer force-pushed the list-metrics-take3 branch from 5759a2f to bf42429 Sep 15, 2021
@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 15, 2021

The force-push to bf42429 adds the comment suggested by @wojtek-t above.

Loading

@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 16, 2021

BTW, I just noticed that storage/cacher already had some metrics. I missed it because they are supported directly from that package rather than from a distinct metrics package.

Loading

@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 16, 2021

With the latest revision and hack/local-up-cluster.sh:

mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ kubectl get --raw /metrics > /tmp/m.txt
mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ wc /tmp/m.txt
  13659   29160 2000336 /tmp/m.txt
mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ grep -c apiserver_cache_list_ /tmp/m.txt
531
mspreitz@ubu20:~/go/src/k8s.io/kubernetes$ grep -c apiserver_storage_list_ /tmp/m.txt 
204

Loading

* the metric stability policy.
*/
var (
listCacheNumFetched = compbasemetrics.NewHistogramVec(
Copy link
Member

@wojtek-t wojtek-t Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit strange to me. I mean - not the fact that you want histogram, but the fact that we have 1 histogram and N counters. It just looks very inconsistent to me.

If you really believe we need a histogram (I'm personally not convinced - I think per-time-interval delta is what we need here which we get from counter), then my question is why counters are enough in other cases.

I think that "either all or nothing" is the way to proceed.

Loading

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I break usage scenarios down into two broad categories, profiling by developers and experience by real users in the wild. For the former, it is relatively easy to get the information we want from controlled experiments and count metrics. For metrics collected in the wild, there was only one traffic mix when a problem happens so it is more difficult to draw conclusions. Not completely impossible, one can compare with metrics from more normal situations. But more difficult. So I do not see it as a black-and-white issue, rather a matter of degree. And there is the contrary consideration of the volume of metrics, which is also a matter of degree. So I see our task here as balancing two matters of degree.

Loading

Copy link
Member

@dgrisonnet dgrisonnet Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not convinced that having histograms is meaningful here since it lacks precision compared to what we had before with counters. To me with these metrics you are interested in knowing exactly how many objects are fetched in the cache over a certain time frame based on their resource and index to measure the efficiency of indexing, which is something that you won't know from the histograms.

Loading

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The histogram gives strictly more information than the two counters it replaced. Those two counters have the same content as the _sum and _count time series in the histogram, and the buckets add a rough breakdown.

Loading

Copy link
Member

@wojtek-t wojtek-t Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For metrics collected in the wild, there was only one traffic mix when a problem happens so it is more difficult to draw conclusions. Not completely impossible, one can compare with metrics from more normal situations. But more difficult. So I do not see it as a black-and-white issue, rather a matter of degree.

Thinking more about this...
My question is - how one would one use those additional information from the histogram. They can quickly say if there were fewer larger lists or more smaller one. And what do they get from it? How to use that information?
So I don't think it's just a matter of degree - it's a matter of how to use those metrics.
So unless you have a clear case on your mind on how exactly to use that additional information (note that we already have metrics for number of particular calls, so users would already know "the average size of a request") - I'm actually against this histogram and I would revert to the previous version.

Loading

Copy link
Member

@wojtek-t wojtek-t Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information in the buckets gives a rough idea of how effective the indices are.

This isn't particularly useful information, because we have all the data to answer this question already.
There are exactly 4 indices:

  • pods by spec.NodeName [we know how many pods on a given node is]
  • secrets by metadata.Name [always exactly 1]
  • configmaps by metadata.Name [always exactly 1]
  • nodes by metadata.Name [always exactly 1]
    Users have no way to influence that - you need to update k8s code and recompile to be able to change that. So if someone is doing that - they can also add metrics if they really need.
    Also - the index is a label in your metric.

The buckets also give rough information on how the load is divided among resources.

resource_prefix is a label - so we have that information independently if this is histogram or not.

So I actually don't but the above argument.

Loading

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting, I did not know those specifics about the actually existing indices.

On the second part, sorry, I botched what I was trying to say. The idea is that for a given resource, there can be a variety of observations depending on other factors (such as populations in namespaces). The buckets give some rough information about the distribution of that variety.

Loading

Copy link
Member

@wojtek-t wojtek-t Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second part, sorry, I botched what I was trying to say. The idea is that for a given resource, there can be a variety of observations depending on other factors (such as populations in namespaces). The buckets give some rough information about the distribution of that variety.

Sure - but the question is how we can use that information. Let's say that instead of knowing that there were 100,000 object fetches I now that there were 100 of size 100 and 9 of size 10,000. What I can use that information for?

Loading

Copy link
Member Author

@MikeSpreitzer MikeSpreitzer Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can get useful in combination with other information. For one example, if I am investigating a situation with a few spikes in LIST latency for a particular resource, that information helps me narrow down my search for the cause(s). For another example, if I am trying to understand the distribution of LIST latencies for a given resource for the sake of tuning our coefficients in API Priority and Fairness, knowing the distribution of numbers fetched helps with that.

Loading

Copy link
Member

@wojtek-t wojtek-t Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are somewhat hypothetical examples to me. Metrics indeed try to pin-point the approximate area of the problem, but it's really help to debug something purely on metrics. And if we pinpoint the latency of LIST requests of a particular resource in a particular point in time, we have logs (and traces in the logs) that would allow us to actually debug it.

For the tuning example - what you described sounds a bit like an attempt to reverse-engineer the expected charactestics introduced by the code from the metrics. Sure - we can do that, but is that really what we want?
Also - can we really do it if we have only partial data for it (only number of fetched objects and only for lists served from cache)? I would be much closer to buy that if we would have all 4 things being histograms [though I'm still not convinced we need that].

Loading

@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 16, 2021

BTW, I am sure the name of the histogram metric is wrong, waiting for someone to tell me what the conventions imply the name should be (or remind me where to find the relevant conventions).

Loading

@MikeSpreitzer MikeSpreitzer force-pushed the list-metrics-take3 branch from 8898d3c to bf42429 Sep 20, 2021
@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 20, 2021

The force-push to bf42429 undoes the histogram addition

Loading

@wojtek-t
Copy link
Member

@wojtek-t wojtek-t commented Sep 20, 2021

OK - based on the comment from @dgrisonnet from above that his only concern is about index label and that was clarified in the comment thread, I'm going to approve it, but add a hold to give other people to potentially take another look.
I'm going to unhold on Wednesday morning (my time, so before PST Wednesday).
@logicalhan @ehashman ^^

/lgtm
/approve

/hold

Loading

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Sep 20, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Loading

@dgrisonnet
Copy link
Member

@dgrisonnet dgrisonnet commented Sep 21, 2021

Looks good from my side 👍

/lgtm

Loading

@wojtek-t
Copy link
Member

@wojtek-t wojtek-t commented Sep 22, 2021

With the above lgtm and timeout, I'm unholding.

/hold cancel

Loading

@MikeSpreitzer
Copy link
Member Author

@MikeSpreitzer MikeSpreitzer commented Sep 22, 2021

/retest

Loading

@k8s-ci-robot k8s-ci-robot merged commit 5b489e2 into kubernetes:master Sep 22, 2021
15 checks passed
Loading
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment