Increase api latency threshold for cluster-scoped list calls #52732

shyamjvs · 2017-09-19T15:31:56Z

Recent change from @smarterclayton (#52237) added scope to apiserver metrics. As a result, our current threshold for list calls is no longer sufficient for all-namespace calls which are now being measured separately from namespaced lists. For e.g (from our last 5k run):

WARNING Top latency metric: {Resource:pods Subresource: Verb:LIST Scope:cluster Latency:{Perc50:4.498374s Perc90:7.548079s Perc99:8.169389s Perc100:0s} Count:1400}

cc @kubernetes/sig-scalability-misc @kubernetes/sig-api-machinery-misc @wojtek-t

dims · 2017-09-19T23:00:37Z

/test all

dims · 2017-09-20T10:40:40Z

/retest

gmarek · 2017-09-20T11:19:41Z

/lgtm

dims · 2017-09-20T20:07:38Z

/retest

dims · 2017-09-20T20:52:40Z

@liggitt @smarterclayton - Can one of you PTAL?

dims · 2017-09-20T20:55:26Z

/test pull-kubernetes-e2e-gce-gpu

lavalamp · 2017-09-20T22:18:44Z

test/e2e/framework/metrics_util.go

+	// as list response sizes are bigger in general for big clusters. We also use a higher threshold
+	// for list calls with cluster scope (all namespaces).
+	apiListCallLatencyThreshold      time.Duration = 5 * time.Second
+	apiClusterScopeListCallThreshold time.Duration = 10 * time.Second


Do you literally mean cluster scoped (i.e., node objects) or cross-namespace lists? If the latter, please choose a less confusing name.

I meant the former (which includes both non-namespaced and all-namespaced calls) - changed the comment to make it clearer.

lavalamp · 2017-09-20T22:20:58Z

test/e2e/framework/metrics_util.go

+			if !isListCall ||
+				!isBigCluster ||
+				(!isClusterScopedCall && latency > apiListCallLatencyThreshold) ||
+				(latency > apiClusterScopeListCallThreshold) {


This is a really confusing logic statement. I suggest one if/switch to set the threshold, and then this if should read if latency > threshold.

Changed it. LG?

smarterclayton · 2017-09-21T14:07:23Z

Yay, more precise metrics! :)

spiffxp · 2017-09-21T16:37:09Z

/approve no-issue
/lgtm
per /lgtm from an approver above

@shyamjvs @gmarek a reminder that if the PR links to an issue (not PR) in the description, you don't need to /approve no-issue

k8s-github-robot · 2017-09-21T16:37:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gmarek, shyamjvs, spiffxp

Associated issue requirement bypassed by: spiffxp

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~test/OWNERS~~ [gmarek,spiffxp]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-09-21T16:57:07Z

/test all [submit-queue is verifying that this PR is safe to merge]

smarterclayton · 2017-09-21T17:34:52Z

/test pull-kubernetes-kubemark-e2e-gce-big

k8s-github-robot · 2017-09-21T17:49:53Z

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here..

mbohlool · 2017-09-21T20:38:32Z

cc @jpbetz

shyamjvs added the release-note-none Denotes a PR that doesn't merit a release note. label Sep 19, 2017

shyamjvs added this to the v1.8 milestone Sep 19, 2017

shyamjvs assigned gmarek Sep 19, 2017

shyamjvs requested a review from gmarek September 19, 2017 15:31

apelisse added this to Backlog in 1.8 Failing tests Sep 19, 2017

apelisse removed this from Backlog in 1.8 Failing tests Sep 19, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 20, 2017

lavalamp reviewed Sep 20, 2017

View reviewed changes

Increase api latency threshold for cluster-scoped list calls

f373645

shyamjvs force-pushed the fix-metrics-perf-tests branch from 6738e88 to f373645 Compare September 21, 2017 11:33

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2017

shyamjvs mentioned this pull request Sep 21, 2017

delete pods API call latencies shot up on large cluster tests #51899

Closed

k8s-ci-robot assigned spiffxp Sep 21, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2017

shyamjvs added the retest-not-required label Sep 21, 2017

k8s-github-robot merged commit 5424861 into kubernetes:master Sep 21, 2017

shyamjvs deleted the fix-metrics-perf-tests branch September 21, 2017 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase api latency threshold for cluster-scoped list calls #52732

Increase api latency threshold for cluster-scoped list calls #52732

shyamjvs commented Sep 19, 2017

dims commented Sep 19, 2017

dims commented Sep 20, 2017

gmarek commented Sep 20, 2017

dims commented Sep 20, 2017

dims commented Sep 20, 2017

dims commented Sep 20, 2017

lavalamp Sep 20, 2017

shyamjvs Sep 21, 2017

lavalamp Sep 20, 2017

shyamjvs Sep 21, 2017

smarterclayton commented Sep 21, 2017

spiffxp commented Sep 21, 2017

k8s-github-robot commented Sep 21, 2017

k8s-github-robot commented Sep 21, 2017

smarterclayton commented Sep 21, 2017

k8s-github-robot commented Sep 21, 2017

mbohlool commented Sep 21, 2017

Increase api latency threshold for cluster-scoped list calls #52732

Increase api latency threshold for cluster-scoped list calls #52732

Conversation

shyamjvs commented Sep 19, 2017

dims commented Sep 19, 2017

dims commented Sep 20, 2017

gmarek commented Sep 20, 2017

dims commented Sep 20, 2017

dims commented Sep 20, 2017

dims commented Sep 20, 2017

lavalamp Sep 20, 2017

Choose a reason for hiding this comment

shyamjvs Sep 21, 2017

Choose a reason for hiding this comment

lavalamp Sep 20, 2017

Choose a reason for hiding this comment

shyamjvs Sep 21, 2017

Choose a reason for hiding this comment

smarterclayton commented Sep 21, 2017

spiffxp commented Sep 21, 2017

k8s-github-robot commented Sep 21, 2017

k8s-github-robot commented Sep 21, 2017

smarterclayton commented Sep 21, 2017

k8s-github-robot commented Sep 21, 2017

mbohlool commented Sep 21, 2017