Watch shouldn't break every 5 minutes #6513

wojtek-t · 2015-04-07T11:55:56Z

Currently Watch is broken every 5 minutes. It is then "recreated", which sometimes requires listing all elements under watch (if there were more than 1000 "uninteresting" etcd writes since then).

I digged deeper into it and it seems this is not related to etcd itself. What happens is:

a component (e.g. scheduler) starts watch sending http request to apiserver
apiserver initiates a watch on etcd
after 5 minutes, the request to apiserver finished (this is something I don't fully understand - is this a timeout?)
this triggers "canceling" the request to etcd

cc @fgrzadkowski

wojtek-t · 2015-04-07T12:25:41Z

I think that 3 is because of:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cmd/kube-apiserver/app/server.go#L386

I think that we should change it so that all "Watch" requests has a much bigger timeout (but don't change the timeout of all other gets).
Is it possible to specify it at the client level by customizing its "transport" part?

What do you think about it?
It shouldn't conflict with #6207, because we're ignoring watches there, right?

cc @brendanburns @davidopp @lavalamp

ghost · 2015-04-07T15:59:46Z

Yes, I can confirm that we're ignoring watches in #6207:

fs.StringVar(&s.LongRunningRequestRE, "long_running_request_regexp", "[._/watch$][^\/proxy._]", "A regular expression matching long running requests which should be excluded from maximum inflight request handling.")

And yes, I think that watches should similarly be excluded from API server timeouts, but would like @bgrant0607 to comment on what the intended semantics and implementation of long-term watches are. Do they have any timeout? Is there intended to be a limit on the number of concurrent long-term watches open against the API server?

timothysc · 2015-04-07T16:12:19Z

@wojtek-t could you keep me in the loop on new items created | possibly label as perf b/c I think we are chasing the same things.

bgrant0607 · 2015-04-07T16:20:22Z

cc @smarterclayton

wojtek-t · 2015-04-07T16:33:09Z

@timothysc will do

smarterclayton · 2015-04-07T16:52:36Z

I think we need to fix the etcd problem. Working around it on our end is just too painful, and doesn't help us (short of aggregating all watches on our end, which is a larger chunk of work).

My recommendation is to work with upstream etcd to propose the minimal change for etcd-io/etcd#2048 required to fix the issue on our end (a client option that requests that etcd indicate the window has closed and provide the latest etcd index at that time). That fixes the short term problem in a way that everyone benefits.

timothysc · 2015-04-07T16:52:41Z

@wojtek-t fwiw we had seen that only after a watch was reset from a timeout, would other actions continue if the cluster was in a stalled state, but we have not checked if recent etcd modifications re: #6059 have had an effect.

wojtek-t · 2015-04-07T17:04:05Z

@smarterclayton I don't understand your comment - I didn't observe any problem at the etcd level - the watch on etcd doesn't break itself
the only problem I observe is the "scheduler -> apiserver" http request that is timing out [the etcd issue that you mentioned is only the consequence of out internal request timing out]

@timothysc: sure - I will look into it as soon as the new etcd release is built

smarterclayton · 2015-04-07T17:05:58Z

@wojtek-t the problem you alluded to of being outside the window of watchable events. The disconnect would not be a problem if the client knows to ask for resource version X+10000. The disconnect is only a problem because the client only knows to ask for resource version X.

It is then "recreated", which sometimes requires listing all elements under watch (if there were more than 1000 "uninteresting" etcd writes since then).

This is only a problem because clients don't get an updated resource version after 1000 uninteresting writes. #2048 is intended to fix that problem.

smarterclayton · 2015-04-07T17:06:27Z

The 5 minute timeout is our maximum http.Server timeout. I don't think we should increase this, because it doesn't fix the problem from #2048, which affects every watcher.

bgrant0607 · 2015-04-07T17:54:44Z

cc @hchoudh

wojtek-t · 2015-04-07T18:30:51Z

OK - I agree that the issue you mentioned is more important to fix.
However, I think that having longer timeout for watches (or for some special requests) might be beneficial [or in other words - having shorter timeouts for other requests].

smarterclayton · 2015-04-07T18:37:38Z

On the other hand, dead clients are going to be consuming resources, especially now that we have limit pools. I do not want someone to be able to mount a DOS attack against the API server by opening watches and starving traffic. Having to reestablish their connection periodically is a part of ensuring that clients that exceed their rate limit are throttled (vs being able to open a watch forever).

----- Original Message -----

OK - I agree that the issue you mentioned is more important to fix.
However, I think that having longer timeout for watches (or for some special
requests) might be beneficial [or in other words - having shorter timeouts
for other requests].

Reply to this email directly or view it on GitHub:
#6513 (comment)

davidopp · 2015-04-07T19:01:27Z

@smarterclayton can you explain this attack mechanism a bit more? Regardless of watch timeout, can't someone sucessfully attack by running a lot of clients that keep renewing? Statistically some non-attack clients will get through if the attackers keep having to renew, but if the attacker runs enough clients then I think they can keep that number close to zero.

davidopp · 2015-04-07T19:01:57Z

Also, can you send a pointer to the PR/issue that introduced "limit pools"?

smarterclayton · 2015-04-07T19:26:11Z

On Apr 7, 2015, at 3:01 PM, David Oppenheimer notifications@github.com wrote:

@smarterclayton can you explain this attack mechanism a bit more? Regardless of watch timeout, can't someone sucessfully attack by running a lot of clients that keep renewing? Statistically some non-attack clients will get through if the attackers keep having to renew, but if the attacker runs enough clients then I think they can keep that number close to zero.

If you have a rate limiter based on source ip or client credentials, you could ask a lot of watches at once, and refresh them within your rate window. The longer watches are, the lower the rate given to individual clients are. On the other hand, if reestablishing a watch is cheap (because clients update their window) then a lower timeout turns over the connected clients more often.

—
Reply to this email directly or view it on GitHub.

davidopp · 2015-04-07T19:56:57Z

I see, so this helps if you have a rate limiter based on something that the attacker cannot easily generate a lot of. Do we have this in the system currently?

smarterclayton · 2015-04-07T20:07:17Z

We do not, although I would like to extend Brendan's initial work to that eventually.

On Apr 7, 2015, at 3:57 PM, David Oppenheimer notifications@github.com wrote:

I see, so this helps if you have a rate limiter based on something that the attacker cannot easily generate a lot of. Do we have this in the system currently?

—
Reply to this email directly or view it on GitHub.

lavalamp · 2015-04-07T20:25:15Z

We have a time limit in apiserver, it could be lengthened. Note there's also a time limit in nginx.

I don't really think that a 5 minute timeout should be a problem. My thinking is that to get a substantial gain here, say 10x, we have to extend this period to 50 minutes. That seems clearly too long to me.

lavalamp · 2015-04-07T20:26:12Z

Really, to be correct, clients need to re-list occasionally, and they have to be able to handle the watch closing at any time. I think the five minute forced close is fine because it forces clients to do it right.

lavalamp · 2015-04-07T20:26:54Z

Essentially I think this behavior is by design, not a bug, and shouldn't adversely affect performance.

timothysc · 2015-04-07T21:16:58Z

@wojtek-t @lavalamp - the lumpiness in the response curve was our old graph:

It shows overall bursts of up to 30 pods/sec and our goal was to eliminate the lumps.
What is interesting is the slope difference between *old here and new #6207

davidopp · 2015-04-07T21:34:14Z

@lavalamp thinks the system should be able to handle a re-list every 5 min so this is WAI.

smarterclayton · 2015-04-07T21:48:16Z

I would clarify that based on etcd 2048. The system should be able to handle a re-list every five minutes. It should not require all clients to re-list every five minutes. Re-listing is a client responsibility.

----- Original Message -----

@lavalamp thinks the system should be able to handle a re-list every 5 min so
this is WAI.

Reply to this email directly or view it on GitHub:
#6513 (comment)

wojtek-t · 2015-04-08T12:12:18Z

@smarterclayton @davidopp
I agree that fixing the original issue on the etcd level is much better (and cleaner) solution. But IIUC, this is not of high priority for etcd and it will not happen any time soon. However, in 100-node cluster and each Kubelet having at least a watch on all pods, this may generate a significant load.
So I still think that we might need to consider a temporary workaround for it in the near future.

cc @fgrzadkowski

smarterclayton · 2015-04-08T12:26:30Z

If it's a high priority for us we should help them fix it. I suspect that it's a fairly easy fix once the api details are sorted out (the server can return 204 No content with the updated etcd index when the watch window is exceeded, client can recognize that and return an error, we read the error and return a typed error back to clients, who update their resource watch version)

On Apr 8, 2015, at 8:12 AM, Wojciech Tyczynski notifications@github.com wrote:

@smarterclayton @davidopp
I agree that fixing the original issue on the etcd level is much better (and cleaner) solution. But IIUC, this is not of high priority for etcd and it will not happen any time soon. However, in 100-node cluster and each Kubelet having at least a watch on all pods, this may generate a significant load.
So I still think that we might need to consider a temporary workaround for it in the near future.

cc @fgrzadkowski

—
Reply to this email directly or view it on GitHub.

gatici · 2019-07-19T07:51:06Z

Dear all,
I am making a big installation with helm charts and my installation is failing everytime with below error. I am using rke cluster. Is it possible to increase the timeout time of watch ?
How can I do it ?

Thanks,

chart: so-4.0.0
description: 'Release "dev-so" failed post-install: watch closed before UntilWithoutRetry
timeout'
revision: 1
status: FAILED
updated: Fri Jul 19 06:14:40 2019

wojtek-t added this to the v1.0 milestone Apr 7, 2015

wojtek-t added team/master priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Apr 7, 2015

wojtek-t self-assigned this Apr 7, 2015

wojtek-t changed the title ~~Watch shouldn't be break every 5 minutes~~ Watch shouldn't break every 5 minutes Apr 7, 2015

wojtek-t mentioned this issue Apr 7, 2015

Get pods for many pods is very heavy #6514

Closed

wojtek-t removed their assignment Apr 7, 2015

davidopp closed this as completed Apr 7, 2015

wojtek-t mentioned this issue Apr 8, 2015

Umbrella bug: Too many lists in k8s cluster #6564

Closed

3 tasks

rjnagal mentioned this issue May 29, 2015

active log streaming times out after 5 minutes #9013

Closed

alena1108 mentioned this issue Nov 7, 2019

"[ERROR] watcher error unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)" repeated and not descriptive rancher/rancher#23957

Closed

clux mentioned this issue Feb 25, 2020

EOF errors from reqwest .next() kube-rs/kube#146

Closed

JacobHenner mentioned this issue Mar 3, 2020

Reconnect watcher if server ends empty response tomplus/kubernetes_asyncio#96

Open

lgnashold mentioned this issue Sep 2, 2020

IOException when Watching a Resource kubernetes-client/java#1213

Closed

dimbleby mentioned this issue Nov 5, 2020

Wrong timeouts for API requests? kube-rs/kube#334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch shouldn't break every 5 minutes #6513

Watch shouldn't break every 5 minutes #6513

wojtek-t commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

ghost commented Apr 7, 2015

timothysc commented Apr 7, 2015

bgrant0607 commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

timothysc commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

bgrant0607 commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

davidopp commented Apr 7, 2015

davidopp commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

davidopp commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

lavalamp commented Apr 7, 2015

lavalamp commented Apr 7, 2015

lavalamp commented Apr 7, 2015

timothysc commented Apr 7, 2015

davidopp commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

wojtek-t commented Apr 8, 2015

smarterclayton commented Apr 8, 2015

gatici commented Jul 19, 2019

Watch shouldn't break every 5 minutes #6513

Watch shouldn't break every 5 minutes #6513

Comments

wojtek-t commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

ghost commented Apr 7, 2015

timothysc commented Apr 7, 2015

bgrant0607 commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

timothysc commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

bgrant0607 commented Apr 7, 2015

wojtek-t commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

davidopp commented Apr 7, 2015

davidopp commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

davidopp commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

lavalamp commented Apr 7, 2015

lavalamp commented Apr 7, 2015

lavalamp commented Apr 7, 2015

timothysc commented Apr 7, 2015

davidopp commented Apr 7, 2015

smarterclayton commented Apr 7, 2015

wojtek-t commented Apr 8, 2015

smarterclayton commented Apr 8, 2015

gatici commented Jul 19, 2019