-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding proper timeouts. #10656
Adding proper timeouts. #10656
Conversation
Ideally we should push these timeouts down to the specific handlers and not have it at the top level. I also may be unnecessary supporting http.Hijacker and http.CloseNotifier. We may only need those on connections that aren't subject to timeouts any way. To support that, there was a combinatoric explosion of types creates (4 in total). If we can eventually push timeouts down to individual handlers that need them, and those specific ones don't need close notification or hijacking, this can greatly simplify. |
cc @dchen1107 |
GCE e2e build/test passed for commit 8fa775c9fe62d070c097159a7c6279b7426417e6. |
Assigning back to yourself. Assign the oncall when you take "WIP" off. |
Thanks for digging into this one |
Since still WIP, this has missed the 1.0 window. Can go into the next release. |
I have added tests. @dchen1107 Do you think some timeout handling should be done in kubelet too? |
GCE e2e build/test passed for commit 2f788c6bdd809881a1ca7f5ac509d9069cce926d. |
GCE e2e build/test passed for commit 7e5919f26d964dacc4d699439b550a1efdf485ca. |
GCE e2e build/test passed for commit 1d033b9. |
|
||
type timeoutHandler struct { | ||
handler http.Handler | ||
timeout func(*http.Request) (<-chan time.Time, string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the expectations of this function. (in particular that returning nil is allowed and cauases it to early terminate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually returning nil will not cause it to terminate early. A nil return will in effect be no timeout. And this is documented in that last sentence of the public function 7 lines above.
one small nit. otherwise LGTM. |
@@ -65,8 +65,6 @@ func ListenAndServeKubeletServer(host HostInterface, address net.IP, port uint, | |||
s := &http.Server{ | |||
Addr: net.JoinHostPort(address.String(), strconv.FormatUint(uint64(port), 10)), | |||
Handler: &handler, | |||
ReadTimeout: 5 * time.Minute, | |||
WriteTimeout: 5 * time.Minute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this drops the timeouts on the kubelet server, but I don't see how the new TimeoutHandler is getting set up for the kubelet, only the API server... am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It is not set up. I was under the impression that the API calls were proxied through the API server, so they would handle the timeouts.
longRunningTimeout := func(req *http.Request) (<-chan time.Time, string) { | ||
// TODO unify this with apiserver.MaxInFlightLimit | ||
if longRunningRE.MatchString(req.URL.Path) || req.URL.Query().Get("watch") == "true" { | ||
return nil, "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this make long-running requests of unlimited duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the client is still connected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also note that Read and Write timeouts on the server object don't actually close connections. They only cause errors on read and write. It's up to the error handling code to close it in that case. The bugs mentioned in this PR were two that actually held connections open for a long time, but didn't close until there was IO interaction to trigger it.
@krousey You mentioned that we might have a few advanced use case. None of them actually happened over the last year. Can I assume that we actually do not need them? This implementation is too racy. Can we switch to use something simpler like https://golang.org/pkg/net/http/#TimeoutHandler or a slightly modified version of that? |
@xiang90 do you have specific issues you can reference?
That impl is simpler, but does not actually work with the mix of handlers we have (long running, chunk writing, streaming, etc). From the godoc:
I think any timeout handler that does work with those interfaces is likely to be more complex |
#29001 (comment)
No. The timeout handler can wrap any handlers. You can have a timeout handler per handler. |
Did you read the comment about TimeoutHandler buffering all writes to memory, and not working with Hijacker interfaces? |
Why you assume I did not? Can you please read #10656 (comment) and my comment at #10656 (comment)? I am actually not sure what you are trying to explain. If you know there is a case that we need hijacker or flush, probably you should just post it here. Or if you think the current implement is not racy, you can just tell me why. If you want to fix the race, go ahead and do it. Thanks. |
We use hijack for exec, attach/run, and port forwarding currently. On Sunday, July 24, 2016, Xiang Li notifications@github.com wrote:
|
@ncdc Hijacking and timeout cannot work well together I think. Once we hijack it, we have to cancel the timeout handler and set TCP timeout on the conn. The timeout handler cannot assume the inner hijacked handler speaks http anymore. Writing HTTP timeout headers once it timeouts makes no sense after the hijacking. That is part of the reason standard lib does not support timeout + hijacking i believe. I did |
@xiang90 see kubernetes/pkg/util/httpstream/spdy/upgrade.go Lines 54 to 65 in ef0c9f0
I agree we don't want to timeout a hijacked connection (the current spdy implementation for exec/attach/port-forward has its own idle timeout functionality). |
@ncdc Oh. I missed that one... Was assuming it is a dependency :(. OK. Then that makes sense. The spdy thing will go away once we use HTTP2 by default which does multiplexing on TCP? What is your suggestion for moving forward for now? also /cc @smarterclayton |
We will need time to engineer the HTTP2 replacement, plus we'll have to go through a deprecation period for the spdy implementation. The go HTTP2 implementation allows you to read from/write to the request and response bodies without having to go through any sort of hijacking. We'll have to write our own multiplexing to support multiple "streams" (stdin, stdout, stderr, error, resize, etc), however. |
@ncdc OK. So we still need this. How do you think about cancelling http timeout once we hijacked the connection from timeout writer? |
@xiang90 there are a few routes that are currently excluded from the timeout handler, including logs, exec, attach, and portforward:
kubernetes/pkg/apiserver/handlers.go Line 186 in 411c32b
Can we continue to do this? |
@ncdc OK... So that means that the timeout handler does not need to support hijacking at all, since the one use hijacking does not under timeout handler? |
@xiang90 I think that's correct, but let's make sure @liggitt and @smarterclayton agree too 😄 |
possible. I also wonder about the watch handling endpoints, which write chunked data and expect it to be flushed to the client, but also be able to timeout. |
Watch uses hijack for web sockets, but only needs to flush for chunked. I On Mon, Jul 25, 2016 at 10:56 AM, Jordan Liggitt notifications@github.com
|
OK. So we do not need timeout for spdy upgrade as @ncdc mentioned. We do not need timeout for watch, which requires flush, since it should set timeout on connection. The we probably should just use the standard timeout handler (which does not support hijacking and flush for good reasons) instead of the customized one. I can make the change if @smarterclayton @liggitt @ncdc all agree. |
@ncdc do you know whether our current timeout is exercised for proxy paths? |
I just want to be careful that we don't regress spdy connections (exec/attach), websocket connections (exec/attach/log/watch), or streaming connections (log, watch, proxy). I also wonder about the perf impact of buffering all responses in memory... some responses can get quite large, especially for controllers that list across all namespaces. |
@liggitt which timeout in particular? for exec/attach/pf? |
@liggitt First, we need to make things correct. Then we can start taking care about performance, optimization. Now it is broken. |
@ncdc if the current timeout handler is being used in proxy paths, or if proxy gets exempted from timeouts |
An attempt to properly fix #9013 and #9180.
cc @bgrant0607 @thockin