-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy: change buckets used by NetworkProgrammingLatency #80218
Conversation
refs kubernetes/perf-tests#640 We have too fine buckets granularity for lower latencies, at cost of the higher latecies (7+ minutes). This is causing spikes in SLI calculated based on that metrics. I don't have strong opinion about actual values - those seemed to be better matching our need. But let's have discussion about them. Values: 0.015 s 0.030 s 0.060 s 0.120 s 0.240 s 0.480 s 0.960 s 1.920 s 3.840 s 7.680 s 15.360 s 30.720 s 61.440 s 122.880 s 245.760 s 491.520 s 983.040 s 1966.080 s 3932.160 s 7864.320 s
/hold |
/hold cancel |
If we want even less granularity for lower latencies, using 0.02 as a start would give us those buckets: 0.020 s |
pkg/proxy/metrics/metrics.go
Outdated
// TODO(mm4tt): Reevaluate buckets before 1.14 release. | ||
// The last bucket will be [0.001s*2^20 ~= 17min, +inf) | ||
Buckets: prometheus.ExponentialBuckets(0.001, 2, 20), | ||
Buckets: prometheus.ExponentialBuckets(0.015, 2, 20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple thoughts from me:
-
for the purpose of this SLI, I don't think anything smaller than 100ms is what we really care about it (I don't think we will see non-neglighble amount of values lower than 100ms). So I would just start with 100ms.
I'm wondering if even 100ms is something we should really care about. Maybe 250ms is also enough? -
O(7800s) is also waaay to high that something we care about - if programming networking takes more than 2 hours it's by far non-acceptable. I don't think we care about anything bigger than small minutes - then it's obviously wrong anyway and I don't think it matters whether it was 20m or 2h.
-
Factor of 2 is quite significant - I would suggest using something smaller than 2.
So my first suggestion would be:
(0.1, 1.5, 20) [that translates to (100ms->221s) interval]
Though ideally, I would like to have even smaller step than 1.5 (in many cases there is a difference between say 6s and 8.5 which will be treated equally in this case).
I also think that we will be targeting with SLO to a value smaller than 1m (And this will probably be somewhat round number like 30s).
So my second proposal is more custom buckets, like we did for api-call latencies:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L88
Here probably ranging from [0.1 or maybe even 0.25, to O(1-2m)]
So maybe sth like:
[0.25, 0.5, 1, 2, 3, 4, 5, 7.5, 10, 12.5, 15, 20, 25, 30, 35, 40, 45, 50, 60, 90]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your insights, Wojtek, especially around numbers.
I gave a second thought to it and I agree with you and @mm4tt, that exponential buckets with factor of 2 doesn't provide much value. For this particular case exponential buckets might not make sense at all. As you mentioned, we don't really care about value below XXXms and I agree it doesn't matter whether the value is 20m and 2h. However I think we should have some buckets for values in order of minutes. In few samples collected so far there were values above 4 minutes. From debugging perspective it'd be good to distinct between broken programming (~minutes) and broken metric/measurement (~hours).
Going with (0.1, 1.5, 20) or getting the same buckets as api-call latency doesn't actually solve the problem, that sparked this PR. Both don't provide enough resolution for minutes latencies. What about adding few more thresholds at the end: like 120, 180, 240, 360?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean after what I suggested:
[0.25, 0.5, 1, 2, 3, 4, 5, 7.5, 10, 12.5, 15, 20, 25, 30, 35, 40, 45, 50, 60, 90]
?
That sounds reasonable to me. The question is whether changing the number of buckets isn't a breaking change. Do you know that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant adding 120, 180, 240, 360 to what you propose. So in the end buckets would look like:
[0.25, 0.5, 1, 2, 3, 4, 5, 7.5, 10, 12.5, 15, 20, 25, 30, 35, 40, 45, 50, 60, 90, 120, 180, 240, 360]
Sorry for not being precise.
This is breaking change in a similar way to changing specification of the buckets. If someone is queries specific buckets or labels or have some assumptions about their number backed in someone changing this will break their code. However for usual queries over Prometheus histograms this shouldn't have any effect (e.g. current implementation of network programming latency SLI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through conversation above, is the suggestion to change from exp to linear buckets? Also, as I understand it, there is really very little cost to having lots of buckets for a global metrics like this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm suggesting isn't even linear - it's somewhat custom.
Also, as I understand it, there is really very little cost to having lots of buckets for a global metrics like this...
So you're suggesting making even more buckets?
I remember a discussion with prometheus folks and they said that having more than 30 buckets would cause issues, but now when I recall it was for the metric that has multiple lables with many potential values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for this counter, as I understand, it's would just be an array of N doubles and the update operation just (handful) of increments, so we might as well have 100 values to get the granularity we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that sounds reasonable to me (given that we don't have any metrics on that label).
When I'm thinking more about it, I actually think the values in large clusters may exceed 30s, because of long calls to iptables - we've seen those taking 20-30s in the past.
I'm still for something a bit custom though, so let's say:
Buckets: []float64{prometheus.LinearBuckets(0.25, 0.25, 20)..., prometheus.LinearBuckets(6, 1, 55)..., prometheus.LinearBuckets(65, 5, 12)...}
(roughly, requires checking what exactly it will generate).
But would give us really good granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've slightly update what you propose, Wojtek. I removed some lower values and added more of higher ones. While I don't expect we'd ever used them in the regular work of k8s, they are useful to debug problems around the metric itself.
prometheus.LinearBuckets(1, 1, 59), // 1s, 2s, 3s, ... 59s | ||
prometheus.LinearBuckets(60, 5, 12), // 60s, 65s, 70s, ... 115s | ||
prometheus.LinearBuckets(120, 30, 7), // 2min, 2.5min, 3min, ..., 5min | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this.
@bowei - WDYT?
/priority important-soon |
/lgtm At some point we would hopeful need more granularity in the lower end, but one can dream :-) |
I would love to :) but I'm afraid we have a bunch of work to do before then... /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: oxddr, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
11 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
We have too fine buckets granularity for lower latencies, at cost of the higher latencies (7+ minutes). This is causing spikes in SLI calculated based on that metrics.
refs kubernetes/perf-tests#640
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
/sig scalability