Switch from watermarking to counting time in bands #109066

MikeSpreitzer · 2022-03-28T08:05:19Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR replaces the use of watermarking with counting time spent in bands of utilization, and also increases the sampling period by a factor of 10. The goal is to reduce the amount of runtime CPU spent on these metrics, as well as reduce the volume of these metrics.

This is hoped to partially address #108272

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

TBD

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Also multiply sampling period by 10. To reduce work on these metrics.

k8s-ci-robot · 2022-03-28T08:06:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/util/flowcontrol/OWNERS~~ [MikeSpreitzer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

MikeSpreitzer · 2022-03-28T08:19:27Z

/cc @wojtek-t
/cc @tkashem

wojtek-t · 2022-03-28T08:44:23Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/metrics.go

 		[]string{priorityLevel},
 	)
 	// PriorityLevelConcurrencyObserverPairGenerator creates pairs that observe concurrency for priority levels
-	PriorityLevelConcurrencyObserverPairGenerator = NewSampleAndWaterMarkHistogramsPairGenerator(clock.RealClock{}, time.Millisecond,
+	PriorityLevelConcurrencyObserverPairGenerator = NewSampleAndCountHistogramsPairGenerator(clock.RealClock{}, time.Millisecond*10,


NewSampleAndWaterMarkHistograms seems to be no longer used anywhere.
Can we remove it?

wojtek-t · 2022-03-28T08:48:29Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample-and-count.go

+)
+
+const (
+	labelNameLB = "lb"


What is LB supposed to be? For me the first associatiation with LB is "load-balancer", which clearly isn't the case here.
Can we be more explict?

"lower bound".
I was thinking that if a histogram can use "le" for "Less than or Equal", this could use "lb" for "Lower Bound".

Let's call it "lower_bound"

wojtek-t · 2022-03-28T08:49:28Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample-and-count.go

+	}
+}
+
+type SampleAndCountObserverGenerator struct {


Shouldn't SampleAndCountObserverGenerator just be an interface? (and sampleAndCountObserverGenerator just its implementation)?

wojtek-t · 2022-03-28T08:50:49Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample-and-count.go

+	labelNameLB = "lb"
+)
+
+// NewSampleAndCountHistogramsGenerator makes a new one


nit: please update function name to match the function name below

wojtek-t · 2022-03-28T08:59:19Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample-and-count.go

+	when, whenInt, acc, wellOrdered := func() (time.Time, int64, sampleAndCountAccumulator, bool) {
+		saw.Lock()
+		defer saw.Unlock()
+		// Moved these variables here to tiptoe around https://github.com/golang/go/issues/43570 for #97685


The mentioned bug seems to be close.
Can we verify if we still need it?

wojtek-t · 2022-03-28T09:00:11Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/metrics.go

 			ConstLabels:    map[string]string{phase: "executing"},
 			StabilityLevel: compbasemetrics.ALPHA,
 		},
+		[]float64{0.9, 1},


I was thinking about it, and how about 0.5, 0.9 and 0.99?

Can you explain your thinking?

The virtue of 1 is that it tells us how much time was spent completely saturated. For a priority level with a concurrency limit of 100 or more, that is very different from --- and, I think, more interesting than --- the amount of time with at least 99% utilized.

Maybe 0.9 is pretty boring, it is really unlikely that a lot of time the utilization will be in [0.9, 1) without this showing up in the samples.

I'm very confused by this, you're replacing a histogram with a counterVec w/ explicit buckets? Why not just reduce the buckets in the existing histogram? The only difference between a counterVec and a histogram is the aggregate metrics you get with a histogram (you get two additional summary metrics).

The reason is that histogram/summary compute quantiles from reported observations. This is not what we precisely care about. Because what we want is to say what percentage of the "real time" we were X% saturated.

@MikeSpreitzer - value of "1" was kind-of special until each request was occupying 1 seat. Now, it's no longer that special, because we may not be able to consume more requests even with occupancy less than 1.
Now - as a cluster operator I would like to be able to use those metrics not just to signal me we're out of capacity, but also to be able to tune them e.g. on organic growth of the load. I don't have very strong preference about exact numbers, but 0.5 is kind of useful value, and 0.9 and 0.99 are "stop-the-gaps".

wojtek-t · 2022-03-28T09:01:20Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/metrics.go

-			Name:           "priority_level_seat_count_watermarks",
-			Help:           "Watermarks of the number of seats occupied for any stage of execution (but only initial stage for WATCHes)",
-			Buckets:        []float64{0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1},
+			Name:           "priority_level_seat_count_band_secs",


I guess we probably shouldn't be changing the type of the already exposed metric...
We should deprecate the historical one and introduce a new one instead.

@dgrisonnet @logicalhan - for thoughts

Yes, your intuition that we should deprecate the old one and introduce the new one is mostly correct, but since you are renaming this metric, you're effectively deleting the old alpha metric and creating a brand new one. Since there is a memory usage issue with the old one, I'm actually okay with this approach, but definitely we should note this in the release notes, since anyone ingesting the old metric is just going to stop receiving data.

I am not sure I understand. We are replacing the watermark histograms with a few counters, there is no doubt in my mind about changing the type.

wojtek-t · 2022-03-28T09:07:40Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample-and-count.go

+		klog.Errorf("Time went backwards from %s to %s for labelValues=%#+v", lastSetS, whenS, saw.labelValues)
+	}
+	for acc.lastSetInt < whenInt {
+		saw.samples.WithLabelValues(saw.labelValues...).Observe(acc.ratio)


I'm still struggling with this one a bit. Namely: what this metric really gives us.

Once we have a counter, we know how much time we actually spent in each of predefined buckets. So I don't really see how I would be supposed to use this histogram in addition to the counter above.

The sample histograms have a complete set of buckets, the band counters are focused on just extremely high values.

But - the bigger the sample period we take, the less usable that is (as it may be completely inaccurate).
And additionally, we're not really solving the core problem, because we're still reporting a bunch of metrics here.

Also - I know that it gives us complete set of buckets - but how will I use it. What do I get from knowing that I spend 20% of time in 0.2 bucket instead of 0.3 bucket?
I guess my point is - if we add 0.5 in our counter (and maybe one more small value like 0.1 or sth) I don't know how I would ever want to use this metric.

wojtek-t · 2022-03-28T09:09:16Z

staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics/sample-and-count.go

+		if wellOrdered {
+			bucket := findBucket(saw.countBuckets, saw.ratio)
+			if saw.lastBucket >= 0 {
+				saw.counts.WithLabelValues(saw.countLabelValues[saw.lastBucket]...).Add(dt.Seconds())


This has a problem that we report it only for a given bucket.
This seems fine as long as we name the bucket in a clear way e.g. not just the end of the bucket, but rather the whole bucket.
Something like "0.9-1.0" (or sth like that).

is .9 representative of the 90% percentile?

Why not use a summary metric if that's the case?

https://prometheus.io/docs/practices/histograms/#quantiles

See my response above - we don't want quantiles from the observations.

k8s-ci-robot · 2022-03-28T09:55:09Z

@MikeSpreitzer: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-unit	`e1a84c0`	link	true	`/test pull-kubernetes-unit`
pull-kubernetes-integration	`e1a84c0`	link	true	`/test pull-kubernetes-integration`
pull-kubernetes-e2e-kind-ipv6	`e1a84c0`	link	true	`/test pull-kubernetes-e2e-kind-ipv6`
pull-kubernetes-e2e-kind	`e1a84c0`	link	true	`/test pull-kubernetes-e2e-kind`
pull-kubernetes-e2e-gce-ubuntu-containerd	`e1a84c0`	link	true	`/test pull-kubernetes-e2e-gce-ubuntu-containerd`
pull-kubernetes-e2e-gce-100-performance	`e1a84c0`	link	true	`/test pull-kubernetes-e2e-gce-100-performance`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

MikeSpreitzer · 2022-03-28T14:43:16Z

Several of the comments are rooted in the replacement of a complete histogram with counters covering just a few bands of possible values. I also find this dissatisfying.

Remember that a histogram is just a collection of counters that follow a certain pattern. I wondered if I could get the behavior I want (a histogram with Add instead of Observe) by manipulating actual counters whose names, labels, and semantics follow the same pattern. I was stopped by the following thoughts.

A scrape has these # TYPE lines as well, and they would be different. Maybe it would not actually matter to someone applying histogram_quantile in PromQL?

utilization <= 1 is much less interesting than utilization >= 1. However, if we focus on the complement of utilization, namely unusued or spare capacity, a bucket for spare <= 0 tell us the same thing as utilization >= 1.

I was not enthused about replicating the logic that keeps a member in a Vec for each combination of labels in use --- efficiently. Actually, this is not a blocker; the sample-and-watermark histograms file is already keeping an object per label combination in use.

In a histogram, one observation causes an increment in several counters. Maybe that is not a prohibitive cost? Or maybe I could synthesize that by attacking this at a lower level that allows me to do the sums at gather time rather than Add time.

MikeSpreitzer · 2022-03-28T17:25:38Z

Actually, for utilization, both 0 and 1 are interesting values to distinguish from all others. So reversing the polarity only changes which one of them is not easy to distinguish. But there is another simple hack. Using utilization buckets closed on the top end (as in histograms today), have a bucket boundary at 0.999999 as well as at 1.

The boundary at 1 is not even needed for a normal histogram, because the implicit +inf bucket will cover it.

MikeSpreitzer · 2022-03-28T21:09:43Z

On second thought, the way to represent accumulated time is obvious --- because the accumulator is not necessarily a float64. With the pattern that @beorn7 showed the accumulator can be anything and is converted to a float64 at Collect time. So we can simply use a time.Duration as the accumulator.

MikeSpreitzer · 2022-03-29T02:44:30Z

On closer examination, there is no choice. The pattern that @beorn7 showed (https://github.com/MikeSpreitzer/kubernetes/blob/6b31109557c48bf985d2af7b2098c50c5412f360/staging/src/k8s.io/component-base/metrics/prometheusextension/sampling-histogram.go#L173-L183) requires the use of uint64 as the counter values.

Also, alarmingly, https://github.com/prometheus/client_golang/blob/8dfa334295e85f9b1e48ce862fae5f337faa6d2f/prometheus/histogram.go#L615-L616 says the +inf bucket can not be delivered.

cici37 · 2022-03-29T20:12:24Z

/triage accepted

logicalhan · 2022-04-07T16:44:14Z

/triage accepted
/assign @logicalhan @dgrisonnet @CatherineF-dev

k8s-ci-robot · 2022-05-24T17:17:21Z

@MikeSpreitzer: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

MikeSpreitzer · 2022-05-24T17:26:58Z

This PR is moot, we are taking a more fundamental whack at the problem in the PR series including #110104 .

Swith from watermarking to counting time in bands

e1a84c0

Also multiply sampling period by 10. To reduce work on these metrics.

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2022

k8s-ci-robot requested review from deads2k and yue9944882 March 28, 2022 08:06

k8s-ci-robot requested review from tkashem and wojtek-t March 28, 2022 08:19

wojtek-t reviewed Mar 28, 2022

View reviewed changes

MikeSpreitzer mentioned this pull request Mar 29, 2022

Start drafting timing histogram #109094

Closed

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 29, 2022

k8s-ci-robot assigned CatherineF-dev, dgrisonnet and logicalhan Apr 7, 2022

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2022

MikeSpreitzer closed this May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from watermarking to counting time in bands #109066

Switch from watermarking to counting time in bands #109066

MikeSpreitzer commented Mar 28, 2022

k8s-ci-robot commented Mar 28, 2022

MikeSpreitzer commented Mar 28, 2022

wojtek-t Mar 28, 2022

MikeSpreitzer Mar 28, 2022

wojtek-t Mar 28, 2022

MikeSpreitzer Mar 28, 2022

wojtek-t Mar 29, 2022

wojtek-t Mar 28, 2022

wojtek-t Mar 28, 2022

wojtek-t Mar 28, 2022

wojtek-t Mar 28, 2022

MikeSpreitzer Mar 28, 2022

logicalhan Mar 28, 2022

wojtek-t Mar 29, 2022

wojtek-t Mar 28, 2022

logicalhan Mar 28, 2022

MikeSpreitzer Mar 28, 2022

wojtek-t Mar 28, 2022

MikeSpreitzer Mar 28, 2022

wojtek-t Mar 29, 2022

wojtek-t Mar 28, 2022

logicalhan Mar 28, 2022

logicalhan Mar 28, 2022

wojtek-t Mar 29, 2022

k8s-ci-robot commented Mar 28, 2022

MikeSpreitzer commented Mar 28, 2022

MikeSpreitzer commented Mar 28, 2022 •

edited

MikeSpreitzer commented Mar 28, 2022

MikeSpreitzer commented Mar 29, 2022

cici37 commented Mar 29, 2022

logicalhan commented Apr 7, 2022

k8s-ci-robot commented May 24, 2022

MikeSpreitzer commented May 24, 2022

Switch from watermarking to counting time in bands #109066

Switch from watermarking to counting time in bands #109066

Conversation

MikeSpreitzer commented Mar 28, 2022

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Mar 28, 2022

MikeSpreitzer commented Mar 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 28, 2022

MikeSpreitzer commented Mar 28, 2022

MikeSpreitzer commented Mar 28, 2022 • edited

MikeSpreitzer commented Mar 28, 2022

MikeSpreitzer commented Mar 29, 2022

cici37 commented Mar 29, 2022

logicalhan commented Apr 7, 2022

k8s-ci-robot commented May 24, 2022

MikeSpreitzer commented May 24, 2022

MikeSpreitzer commented Mar 28, 2022 •

edited