Randomize apiserver timeout #8574

bprashanth · 2015-05-20T17:41:41Z

Connections get a random timeout between min/max instead of a hard 5 minute timout.
@lavalamp @fgrzadkowski @wojtek-t

wojtek-t · 2015-05-20T17:51:20Z

I think the proposal is great, I'm just wondering what the values of "minTimeout" and "maxTimeout" should be. Why did you choose 5 and 8 minutes - was it some guess or was it based on some data?

bprashanth · 2015-05-20T18:01:33Z

In practice I've seen the connection break at < 5m. This is something I'm currently tracking down.

That aside, I left the lower bound at whatever it is currently is. For the upper bound I observed a correlation between slow lists and > 30-50 concurrent lists, so I bucketed 100 nodes giving them each their own minute (didn't experiment more than that). I think anywhere between 6-8 should be Ok.

wojtek-t · 2015-05-20T18:20:28Z

I asked because my feeling is that maxTimeout = 10minutes = 2 * minTimeout would be more intuitive (we will be spreading the load "uniformly").
Is there any argument not to use 10 minutes? Is it too long? Or maybe we can use 4 minutes & 8 minutes?

bprashanth · 2015-05-20T18:25:13Z

Yeah, not sure but I think this is easily changable and requires some experimentation, so I can set it to 10 or 4 depending on what other voters prefer. The upper limit is a balance between how paranoid we want to be about dropping watch events vs how likely it is to happen on some set of nodes.

wojtek-t · 2015-05-20T21:42:37Z

cmd/kube-apiserver/app/server.go

@@ -410,11 +420,12 @@ func (s *APIServer) Run(_ []string) error {
 	}

 	if secureLocation != "" {
+		timeout := time.Duration(MinTimeoutSecs+rand.Intn(MaxTimeoutSecs-MinTimeoutSecs)) * time.Second


don't we want the same timeout for all http servers within one Kubelet?
I think it definitely makes sense to differentiate it between Kubelets, but maybe within a Kubelet they should be the same?

lavalamp · 2015-05-20T21:42:51Z

This LGTM.

lavalamp · 2015-05-20T21:50:39Z

These numbers seem fine at distributing load: http://play.golang.org/p/3FSbYtLmLT

wojtek-t · 2015-05-20T23:08:26Z

Sorry - my comment above doesn't make much sense.

However - I'm afraid that I stopped understanding this PR. Basically, what you're doing with it is you are setting timeout at the apiserver level to be a random between 5 and 8 minutes. But it's still the same for all Kubelets. In other words, I think that instead of a burst after 5 minutes, we will have a burst after some random time between 5 and 8 minutes.
Am I missing something?

[What we would like to do is to setup the timeout at the request level in each Kubelet separately].

bprashanth · 2015-05-20T23:52:43Z

Hold on, @wojtek-t I need an associated timeout in each watch server.

bprashanth · 2015-05-21T17:14:27Z

No joken this time >.<
PTAL (still e2e-ing)

wojtek-t · 2015-05-21T17:52:43Z

Thanks - will take a look later today after my meetings.

lavalamp · 2015-05-21T22:32:20Z

cmd/kube-apiserver/app/server.go

+const (
+	// Maximum duration before timing out read/write requests
+	// Set to a value larger than the timeouts in each watch server.
+	ReadWriteTimeoutMins = time.Minute * 60


Remove "Mins" from name, it's a time.Duration, not a count of minutes.

k8s-bot · 2015-05-21T23:24:29Z

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain @ixdy.

bprashanth · 2015-05-22T03:54:52Z

Nits addressed, stil trying to run density on a 100 node cluster (it seems to be stuck polling for fluentd pods in some pre-test phase even on head today, will debug tomorrow) though e2e passed.

bprashanth · 2015-05-22T03:55:49Z

ok to test

wojtek-t · 2015-05-22T15:59:00Z

LGTM

Randomize apiserver timeout

googlebot added the cla: yes label May 20, 2015

yujuhong assigned wojtek-t May 20, 2015

wojtek-t reviewed May 20, 2015
View reviewed changes

bprashanth force-pushed the apiserver_rand branch from 78b6082 to f75c6e2 Compare May 21, 2015 17:11

lavalamp reviewed May 21, 2015
View reviewed changes

bprashanth force-pushed the apiserver_rand branch from f75c6e2 to aa6f55f Compare May 22, 2015 03:51

Randomize apiserver watch timeouts

8a5445d

bprashanth force-pushed the apiserver_rand branch from aa6f55f to 8a5445d Compare May 22, 2015 03:52

lavalamp added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2015

dchen1107 added a commit that referenced this pull request May 22, 2015

Merge pull request #8574 from bprashanth/apiserver_rand

9772726

Randomize apiserver timeout

dchen1107 merged commit 9772726 into kubernetes:master May 22, 2015

bprashanth mentioned this pull request May 27, 2015

Make 99% of API calls return in less than 1s; constant time to number of nodes and pods #4521

Closed

liggitt mentioned this pull request Jun 11, 2015

active log streaming times out after 5 minutes #9013

Closed

bprashanth deleted the apiserver_rand branch October 26, 2015 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomize apiserver timeout #8574

Randomize apiserver timeout #8574

bprashanth commented May 20, 2015

wojtek-t commented May 20, 2015

bprashanth commented May 20, 2015

wojtek-t commented May 20, 2015

bprashanth commented May 20, 2015

wojtek-t May 20, 2015

lavalamp May 20, 2015

lavalamp commented May 20, 2015

lavalamp commented May 20, 2015

wojtek-t commented May 20, 2015

bprashanth commented May 20, 2015

bprashanth commented May 21, 2015

wojtek-t commented May 21, 2015

lavalamp May 21, 2015

k8s-bot commented May 21, 2015

bprashanth commented May 22, 2015

bprashanth commented May 22, 2015

wojtek-t commented May 22, 2015

Randomize apiserver timeout #8574

Randomize apiserver timeout #8574

Conversation

bprashanth commented May 20, 2015

wojtek-t commented May 20, 2015

bprashanth commented May 20, 2015

wojtek-t commented May 20, 2015

bprashanth commented May 20, 2015

wojtek-t May 20, 2015

Choose a reason for hiding this comment

lavalamp May 20, 2015

Choose a reason for hiding this comment

lavalamp commented May 20, 2015

lavalamp commented May 20, 2015

wojtek-t commented May 20, 2015

bprashanth commented May 20, 2015

bprashanth commented May 21, 2015

wojtek-t commented May 21, 2015

lavalamp May 21, 2015

Choose a reason for hiding this comment

k8s-bot commented May 21, 2015

bprashanth commented May 22, 2015

bprashanth commented May 22, 2015

wojtek-t commented May 22, 2015