-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from watermarking to counting time in bands #109066
Conversation
Also multiply sampling period by 10. To reduce work on these metrics.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MikeSpreitzer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[]string{priorityLevel}, | ||
) | ||
// PriorityLevelConcurrencyObserverPairGenerator creates pairs that observe concurrency for priority levels | ||
PriorityLevelConcurrencyObserverPairGenerator = NewSampleAndWaterMarkHistogramsPairGenerator(clock.RealClock{}, time.Millisecond, | ||
PriorityLevelConcurrencyObserverPairGenerator = NewSampleAndCountHistogramsPairGenerator(clock.RealClock{}, time.Millisecond*10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewSampleAndWaterMarkHistograms seems to be no longer used anywhere.
Can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
) | ||
|
||
const ( | ||
labelNameLB = "lb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is LB supposed to be? For me the first associatiation with LB is "load-balancer", which clearly isn't the case here.
Can we be more explict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"lower bound".
I was thinking that if a histogram can use "le" for "Less than or Equal", this could use "lb" for "Lower Bound".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it "lower_bound"
} | ||
} | ||
|
||
type SampleAndCountObserverGenerator struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't SampleAndCountObserverGenerator just be an interface? (and sampleAndCountObserverGenerator just its implementation)?
labelNameLB = "lb" | ||
) | ||
|
||
// NewSampleAndCountHistogramsGenerator makes a new one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please update function name to match the function name below
when, whenInt, acc, wellOrdered := func() (time.Time, int64, sampleAndCountAccumulator, bool) { | ||
saw.Lock() | ||
defer saw.Unlock() | ||
// Moved these variables here to tiptoe around https://github.com/golang/go/issues/43570 for #97685 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mentioned bug seems to be close.
Can we verify if we still need it?
ConstLabels: map[string]string{phase: "executing"}, | ||
StabilityLevel: compbasemetrics.ALPHA, | ||
}, | ||
[]float64{0.9, 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it, and how about 0.5, 0.9 and 0.99?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain your thinking?
The virtue of 1 is that it tells us how much time was spent completely saturated. For a priority level with a concurrency limit of 100 or more, that is very different from --- and, I think, more interesting than --- the amount of time with at least 99% utilized.
Maybe 0.9 is pretty boring, it is really unlikely that a lot of time the utilization will be in [0.9, 1) without this showing up in the samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused by this, you're replacing a histogram with a counterVec w/ explicit buckets? Why not just reduce the buckets in the existing histogram? The only difference between a counterVec and a histogram is the aggregate metrics you get with a histogram (you get two additional summary metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that histogram/summary compute quantiles from reported observations. This is not what we precisely care about. Because what we want is to say what percentage of the "real time" we were X% saturated.
@MikeSpreitzer - value of "1" was kind-of special until each request was occupying 1 seat. Now, it's no longer that special, because we may not be able to consume more requests even with occupancy less than 1.
Now - as a cluster operator I would like to be able to use those metrics not just to signal me we're out of capacity, but also to be able to tune them e.g. on organic growth of the load. I don't have very strong preference about exact numbers, but 0.5 is kind of useful value, and 0.9 and 0.99 are "stop-the-gaps".
Name: "priority_level_seat_count_watermarks", | ||
Help: "Watermarks of the number of seats occupied for any stage of execution (but only initial stage for WATCHes)", | ||
Buckets: []float64{0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, | ||
Name: "priority_level_seat_count_band_secs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we probably shouldn't be changing the type of the already exposed metric...
We should deprecate the historical one and introduce a new one instead.
@dgrisonnet @logicalhan - for thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your intuition that we should deprecate the old one and introduce the new one is mostly correct, but since you are renaming this metric, you're effectively deleting the old alpha metric and creating a brand new one. Since there is a memory usage issue with the old one, I'm actually okay with this approach, but definitely we should note this in the release notes, since anyone ingesting the old metric is just going to stop receiving data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. We are replacing the watermark histograms with a few counters, there is no doubt in my mind about changing the type.
klog.Errorf("Time went backwards from %s to %s for labelValues=%#+v", lastSetS, whenS, saw.labelValues) | ||
} | ||
for acc.lastSetInt < whenInt { | ||
saw.samples.WithLabelValues(saw.labelValues...).Observe(acc.ratio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still struggling with this one a bit. Namely: what this metric really gives us.
Once we have a counter, we know how much time we actually spent in each of predefined buckets. So I don't really see how I would be supposed to use this histogram in addition to the counter above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample histograms have a complete set of buckets, the band counters are focused on just extremely high values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But - the bigger the sample period we take, the less usable that is (as it may be completely inaccurate).
And additionally, we're not really solving the core problem, because we're still reporting a bunch of metrics here.
Also - I know that it gives us complete set of buckets - but how will I use it. What do I get from knowing that I spend 20% of time in 0.2 bucket instead of 0.3 bucket?
I guess my point is - if we add 0.5 in our counter (and maybe one more small value like 0.1 or sth) I don't know how I would ever want to use this metric.
if wellOrdered { | ||
bucket := findBucket(saw.countBuckets, saw.ratio) | ||
if saw.lastBucket >= 0 { | ||
saw.counts.WithLabelValues(saw.countLabelValues[saw.lastBucket]...).Add(dt.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has a problem that we report it only for a given bucket.
This seems fine as long as we name the bucket in a clear way e.g. not just the end of the bucket, but rather the whole bucket.
Something like "0.9-1.0" (or sth like that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is .9 representative of the 90% percentile?
Why not use a summary metric if that's the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my response above - we don't want quantiles from the observations.
@MikeSpreitzer: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Several of the comments are rooted in the replacement of a complete histogram with counters covering just a few bands of possible values. I also find this dissatisfying. Remember that a histogram is just a collection of counters that follow a certain pattern. I wondered if I could get the behavior I want (a histogram with Add instead of Observe) by manipulating actual counters whose names, labels, and semantics follow the same pattern. I was stopped by the following thoughts. A scrape has these
I was not enthused about replicating the logic that keeps a member in a Vec for each combination of labels in use --- efficiently. Actually, this is not a blocker; the sample-and-watermark histograms file is already keeping an object per label combination in use. In a histogram, one observation causes an increment in several counters. Maybe that is not a prohibitive cost? Or maybe I could synthesize that by attacking this at a lower level that allows me to do the sums at gather time rather than Add time. |
Actually, for utilization, both 0 and 1 are interesting values to distinguish from all others. So reversing the polarity only changes which one of them is not easy to distinguish. But there is another simple hack. Using utilization buckets closed on the top end (as in histograms today), have a bucket boundary at 0.999999 as well as at 1. The boundary at 1 is not even needed for a normal histogram, because the implicit |
On second thought, the way to represent accumulated time is obvious --- because the accumulator is not necessarily a |
On closer examination, there is no choice. The pattern that @beorn7 showed (https://github.com/MikeSpreitzer/kubernetes/blob/6b31109557c48bf985d2af7b2098c50c5412f360/staging/src/k8s.io/component-base/metrics/prometheusextension/sampling-histogram.go#L173-L183) requires the use of Also, alarmingly, https://github.com/prometheus/client_golang/blob/8dfa334295e85f9b1e48ce862fae5f337faa6d2f/prometheus/histogram.go#L615-L616 says the |
/triage accepted |
/triage accepted |
@MikeSpreitzer: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR is moot, we are taking a more fundamental whack at the problem in the PR series including #110104 . |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR replaces the use of watermarking with counting time spent in bands of utilization, and also increases the sampling period by a factor of 10. The goal is to reduce the amount of runtime CPU spent on these metrics, as well as reduce the volume of these metrics.
This is hoped to partially address #108272
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: