Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy: change buckets used by NetworkProgrammingLatency #80218

Merged
merged 2 commits into from Aug 4, 2019

Conversation

oxddr
Copy link
Contributor

@oxddr oxddr commented Jul 16, 2019

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
We have too fine buckets granularity for lower latencies, at cost of the higher latencies (7+ minutes). This is causing spikes in SLI calculated based on that metrics.
refs kubernetes/perf-tests#640

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

/sig scalability

refs kubernetes/perf-tests#640

We have too fine buckets granularity for lower latencies, at cost of the higher
latecies (7+ minutes). This is causing spikes in SLI calculated based on that
metrics.

I don't have strong opinion about actual values - those seemed to be better
matching our need. But let's have discussion about them.

Values:

0.015 s
0.030 s
0.060 s
0.120 s
0.240 s
0.480 s
0.960 s
1.920 s
3.840 s
7.680 s
15.360 s
30.720 s
61.440 s
122.880 s
245.760 s
491.520 s
983.040 s
1966.080 s
3932.160 s
7864.320 s
@k8s-ci-robot k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 16, 2019
@oxddr
Copy link
Contributor Author

oxddr commented Jul 16, 2019

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2019
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jul 16, 2019
@oxddr
Copy link
Contributor Author

oxddr commented Jul 17, 2019

/area kube-proxy
/kind cleanup
/assign @wojtek-t
/cc @mm4tt

@oxddr
Copy link
Contributor Author

oxddr commented Jul 17, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 17, 2019
@oxddr
Copy link
Contributor Author

oxddr commented Jul 17, 2019

If we want even less granularity for lower latencies, using 0.02 as a start would give us those buckets:

0.020 s
0.040 s
0.080 s
0.160 s
0.320 s
0.640 s
1.280 s
2.560 s
5.120 s
10.240 s
20.480 s
40.960 s
81.920 s
163.840 s
327.680 s
655.360 s
1310.720 s
2621.440 s
5242.880 s
10485.760 s

@mm4tt
Copy link
Contributor

mm4tt commented Jul 17, 2019

I'm not sure if I follow how this is supposed to help.
Starting with 0.015 instead of 0.001 basically just eliminates first few buckets and instead adds few more at the end (but those would be for recording values of 20min+, which probably never would be used / useful).

OWFKXoz13QA

What's the gain in that?

// TODO(mm4tt): Reevaluate buckets before 1.14 release.
// The last bucket will be [0.001s*2^20 ~= 17min, +inf)
Buckets: prometheus.ExponentialBuckets(0.001, 2, 20),
Buckets: prometheus.ExponentialBuckets(0.015, 2, 20),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple thoughts from me:

  1. for the purpose of this SLI, I don't think anything smaller than 100ms is what we really care about it (I don't think we will see non-neglighble amount of values lower than 100ms). So I would just start with 100ms.
    I'm wondering if even 100ms is something we should really care about. Maybe 250ms is also enough?

  2. O(7800s) is also waaay to high that something we care about - if programming networking takes more than 2 hours it's by far non-acceptable. I don't think we care about anything bigger than small minutes - then it's obviously wrong anyway and I don't think it matters whether it was 20m or 2h.

  3. Factor of 2 is quite significant - I would suggest using something smaller than 2.

So my first suggestion would be:
(0.1, 1.5, 20) [that translates to (100ms->221s) interval]

Though ideally, I would like to have even smaller step than 1.5 (in many cases there is a difference between say 6s and 8.5 which will be treated equally in this case).

I also think that we will be targeting with SLO to a value smaller than 1m (And this will probably be somewhat round number like 30s).
So my second proposal is more custom buckets, like we did for api-call latencies:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L88
Here probably ranging from [0.1 or maybe even 0.25, to O(1-2m)]
So maybe sth like:
[0.25, 0.5, 1, 2, 3, 4, 5, 7.5, 10, 12.5, 15, 20, 25, 30, 35, 40, 45, 50, 60, 90]

@freehan @bowei - for thoughts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your insights, Wojtek, especially around numbers.

I gave a second thought to it and I agree with you and @mm4tt, that exponential buckets with factor of 2 doesn't provide much value. For this particular case exponential buckets might not make sense at all. As you mentioned, we don't really care about value below XXXms and I agree it doesn't matter whether the value is 20m and 2h. However I think we should have some buckets for values in order of minutes. In few samples collected so far there were values above 4 minutes. From debugging perspective it'd be good to distinct between broken programming (~minutes) and broken metric/measurement (~hours).

Going with (0.1, 1.5, 20) or getting the same buckets as api-call latency doesn't actually solve the problem, that sparked this PR. Both don't provide enough resolution for minutes latencies. What about adding few more thresholds at the end: like 120, 180, 240, 360?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean after what I suggested:
[0.25, 0.5, 1, 2, 3, 4, 5, 7.5, 10, 12.5, 15, 20, 25, 30, 35, 40, 45, 50, 60, 90]
?

That sounds reasonable to me. The question is whether changing the number of buckets isn't a breaking change. Do you know that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant adding 120, 180, 240, 360 to what you propose. So in the end buckets would look like:
[0.25, 0.5, 1, 2, 3, 4, 5, 7.5, 10, 12.5, 15, 20, 25, 30, 35, 40, 45, 50, 60, 90, 120, 180, 240, 360]
Sorry for not being precise.

This is breaking change in a similar way to changing specification of the buckets. If someone is queries specific buckets or labels or have some assumptions about their number backed in someone changing this will break their code. However for usual queries over Prometheus histograms this shouldn't have any effect (e.g. current implementation of network programming latency SLI).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through conversation above, is the suggestion to change from exp to linear buckets? Also, as I understand it, there is really very little cost to having lots of buckets for a global metrics like this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm suggesting isn't even linear - it's somewhat custom.

Also, as I understand it, there is really very little cost to having lots of buckets for a global metrics like this...

So you're suggesting making even more buckets?
I remember a discussion with prometheus folks and they said that having more than 30 buckets would cause issues, but now when I recall it was for the metric that has multiple lables with many potential values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for this counter, as I understand, it's would just be an array of N doubles and the update operation just (handful) of increments, so we might as well have 100 values to get the granularity we need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that sounds reasonable to me (given that we don't have any metrics on that label).
When I'm thinking more about it, I actually think the values in large clusters may exceed 30s, because of long calls to iptables - we've seen those taking 20-30s in the past.

I'm still for something a bit custom though, so let's say:

Buckets: []float64{prometheus.LinearBuckets(0.25, 0.25, 20)..., prometheus.LinearBuckets(6, 1, 55)..., prometheus.LinearBuckets(65, 5, 12)...}

(roughly, requires checking what exactly it will generate).
But would give us really good granularity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've slightly update what you propose, Wojtek. I removed some lower values and added more of higher ones. While I don't expect we'd ever used them in the regular work of k8s, they are useful to debug problems around the metric itself.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 26, 2019
@oxddr oxddr changed the title kube-proxy: change buckets used by NetworkProgrammingLatency WIP kube-proxy: change buckets used by NetworkProgrammingLatency Jul 26, 2019
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 26, 2019
prometheus.LinearBuckets(1, 1, 59), // 1s, 2s, 3s, ... 59s
prometheus.LinearBuckets(60, 5, 12), // 60s, 65s, 70s, ... 115s
prometheus.LinearBuckets(120, 30, 7), // 2min, 2.5min, 3min, ..., 5min
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this.

@bowei - WDYT?

@oxddr
Copy link
Contributor Author

oxddr commented Jul 31, 2019

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 31, 2019
@oxddr oxddr changed the title WIP kube-proxy: change buckets used by NetworkProgrammingLatency kube-proxy: change buckets used by NetworkProgrammingLatency Jul 31, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 31, 2019
@bowei
Copy link
Member

bowei commented Aug 2, 2019

/lgtm

At some point we would hopeful need more granularity in the lower end, but one can dream :-)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2019
@wojtek-t
Copy link
Member

wojtek-t commented Aug 2, 2019

At some point we would hopeful need more granularity in the lower end, but one can dream :-)

I would love to :) but I'm afraid we have a bunch of work to do before then...

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oxddr, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

11 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 695190d into kubernetes:master Aug 4, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 4, 2019
@oxddr oxddr deleted the kubeproxy-buckets branch October 3, 2019 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants