Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding proper timeouts. #10656

Merged
merged 1 commit into from
Jul 27, 2015
Merged

Adding proper timeouts. #10656

merged 1 commit into from
Jul 27, 2015

Conversation

krousey
Copy link
Contributor

@krousey krousey commented Jul 2, 2015

An attempt to properly fix #9013 and #9180.

cc @bgrant0607 @thockin

@krousey
Copy link
Contributor Author

krousey commented Jul 2, 2015

Ideally we should push these timeouts down to the specific handlers and not have it at the top level.

I also may be unnecessary supporting http.Hijacker and http.CloseNotifier. We may only need those on connections that aren't subject to timeouts any way. To support that, there was a combinatoric explosion of types creates (4 in total).

If we can eventually push timeouts down to individual handlers that need them, and those specific ones don't need close notification or hijacking, this can greatly simplify.

@krousey
Copy link
Contributor Author

krousey commented Jul 2, 2015

cc @dchen1107

@k8s-bot
Copy link

k8s-bot commented Jul 2, 2015

GCE e2e build/test passed for commit 8fa775c9fe62d070c097159a7c6279b7426417e6.

@zmerlynn
Copy link
Member

zmerlynn commented Jul 2, 2015

Assigning back to yourself. Assign the oncall when you take "WIP" off.

@thockin
Copy link
Member

thockin commented Jul 4, 2015

Thanks for digging into this one

@bgrant0607
Copy link
Member

Since still WIP, this has missed the 1.0 window. Can go into the next release.

@bgrant0607 bgrant0607 added this to the v1.0-post milestone Jul 6, 2015
@krousey
Copy link
Contributor Author

krousey commented Jul 7, 2015

I have added tests.

@dchen1107 Do you think some timeout handling should be done in kubelet too?

@k8s-bot
Copy link

k8s-bot commented Jul 8, 2015

GCE e2e build/test passed for commit 2f788c6bdd809881a1ca7f5ac509d9069cce926d.

@krousey krousey changed the title WIP: Adding proper timeouts. Adding proper timeouts. Jul 10, 2015
@krousey krousey removed their assignment Jul 10, 2015
@k8s-bot
Copy link

k8s-bot commented Jul 10, 2015

GCE e2e build/test passed for commit 7e5919f26d964dacc4d699439b550a1efdf485ca.

@k8s-bot
Copy link

k8s-bot commented Jul 10, 2015

GCE e2e build/test passed for commit 1d033b9.


type timeoutHandler struct {
handler http.Handler
timeout func(*http.Request) (<-chan time.Time, string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document the expectations of this function. (in particular that returning nil is allowed and cauases it to early terminate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually returning nil will not cause it to terminate early. A nil return will in effect be no timeout. And this is documented in that last sentence of the public function 7 lines above.

@brendandburns brendandburns self-assigned this Jul 22, 2015
@brendandburns
Copy link
Contributor

one small nit. otherwise LGTM.

@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@brendandburns brendandburns added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2015
gmarek added a commit that referenced this pull request Jul 27, 2015
@gmarek gmarek merged commit 00cd52d into kubernetes:master Jul 27, 2015
@@ -65,8 +65,6 @@ func ListenAndServeKubeletServer(host HostInterface, address net.IP, port uint,
s := &http.Server{
Addr: net.JoinHostPort(address.String(), strconv.FormatUint(uint64(port), 10)),
Handler: &handler,
ReadTimeout: 5 * time.Minute,
WriteTimeout: 5 * time.Minute,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this drops the timeouts on the kubelet server, but I don't see how the new TimeoutHandler is getting set up for the kubelet, only the API server... am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It is not set up. I was under the impression that the API calls were proxied through the API server, so they would handle the timeouts.

longRunningTimeout := func(req *http.Request) (<-chan time.Time, string) {
// TODO unify this with apiserver.MaxInFlightLimit
if longRunningRE.MatchString(req.URL.Path) || req.URL.Query().Get("watch") == "true" {
return nil, ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make long-running requests of unlimited duration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the client is still connected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also note that Read and Write timeouts on the server object don't actually close connections. They only cause errors on read and write. It's up to the error handling code to close it in that case. The bugs mentioned in this PR were two that actually held connections open for a long time, but didn't close until there was IO interaction to trigger it.

@krousey krousey deleted the timeouts branch August 21, 2015 02:12
@krousey krousey added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 11, 2015
@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@krousey You mentioned that we might have a few advanced use case. None of them actually happened over the last year. Can I assume that we actually do not need them? This implementation is too racy. Can we switch to use something simpler like https://golang.org/pkg/net/http/#TimeoutHandler or a slightly modified version of that?

@liggitt
Copy link
Member

liggitt commented Jul 25, 2016

This implementation is too racy

@xiang90 do you have specific issues you can reference?

Can we switch to use something simpler like https://golang.org/pkg/net/http/#TimeoutHandler

That impl is simpler, but does not actually work with the mix of handlers we have (long running, chunk writing, streaming, etc). From the godoc:

TimeoutHandler buffers all Handler writes to memory and does not support the Hijacker or Flusher interfaces.

I think any timeout handler that does work with those interfaces is likely to be more complex

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@liggitt

do you have specific issues you can reference?

#29001 (comment)
#29001 (comment)

That impl is simpler, but does not actually work with the mix of handlers

No. The timeout handler can wrap any handlers. You can have a timeout handler per handler.

@liggitt
Copy link
Member

liggitt commented Jul 25, 2016

Did you read the comment about TimeoutHandler buffering all writes to memory, and not working with Hijacker interfaces?

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@liggitt

Why you assume I did not?

Can you please read #10656 (comment) and my comment at #10656 (comment)?

I am actually not sure what you are trying to explain. If you know there is a case that we need hijacker or flush, probably you should just post it here. Or if you think the current implement is not racy, you can just tell me why. If you want to fix the race, go ahead and do it. Thanks.

@ncdc
Copy link
Member

ncdc commented Jul 25, 2016

We use hijack for exec, attach/run, and port forwarding currently.
Eventually I'd like to move to an HTTP/2 implementation so hijacking
wouldn't be required any more.

On Sunday, July 24, 2016, Xiang Li notifications@github.com wrote:

@liggitt https://github.com/liggitt

Why you assume I did not?

Can you please read #10656 (comment)
#10656 (comment)
and my comment at #10656 (comment)
#10656 (comment)?

I am actually not sure what you are trying to explain. If you know there
is a case that we need hijacker or flush, probably you should just post it
here. Or if you think the current implement is not racy, you can just tell
me why. If you want to fix the race, go ahead and do it. Thanks.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#10656 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAABYuAio9NKZ72LkCYWj9uZt_m9X6FNks5qZCKUgaJpZM4FQR6U
.

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@ncdc Hijacking and timeout cannot work well together I think. Once we hijack it, we have to cancel the timeout handler and set TCP timeout on the conn. The timeout handler cannot assume the inner hijacked handler speaks http anymore. Writing HTTP timeout headers once it timeouts makes no sense after the hijacking. That is part of the reason standard lib does not support timeout + hijacking i believe.

I did grep Hijacker -r . and failed to find a case that we really need hijacker. Maybe I was wrong.

@ncdc
Copy link
Member

ncdc commented Jul 25, 2016

@xiang90 see

hijacker, ok := w.(http.Hijacker)
if !ok {
w.WriteHeader(http.StatusInternalServerError)
fmt.Fprintf(w, "unable to upgrade: unable to hijack response")
return nil
}
w.Header().Add(httpstream.HeaderConnection, httpstream.HeaderUpgrade)
w.Header().Add(httpstream.HeaderUpgrade, HeaderSpdy31)
w.WriteHeader(http.StatusSwitchingProtocols)
conn, _, err := hijacker.Hijack()

I agree we don't want to timeout a hijacked connection (the current spdy implementation for exec/attach/port-forward has its own idle timeout functionality).

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@ncdc Oh. I missed that one... Was assuming it is a dependency :(. OK. Then that makes sense. The spdy thing will go away once we use HTTP2 by default which does multiplexing on TCP? What is your suggestion for moving forward for now? also /cc @smarterclayton

@ncdc
Copy link
Member

ncdc commented Jul 25, 2016

We will need time to engineer the HTTP2 replacement, plus we'll have to go through a deprecation period for the spdy implementation.

The go HTTP2 implementation allows you to read from/write to the request and response bodies without having to go through any sort of hijacking. We'll have to write our own multiplexing to support multiple "streams" (stdin, stdout, stderr, error, resize, etc), however.

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@ncdc OK. So we still need this. How do you think about cancelling http timeout once we hijacked the connection from timeout writer?

@ncdc
Copy link
Member

ncdc commented Jul 25, 2016

@xiang90 there are a few routes that are currently excluded from the timeout handler, including logs, exec, attach, and portforward:

defaultLongRunningRequestRE = "(/|^)((watch|proxy)(/|$)|(logs?|portforward|exec|attach)/?$)"

if longRunningRequestCheck(req) {

if after == nil {

Can we continue to do this?

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@ncdc OK... So that means that the timeout handler does not need to support hijacking at all, since the one use hijacking does not under timeout handler?

@ncdc
Copy link
Member

ncdc commented Jul 25, 2016

@xiang90 I think that's correct, but let's make sure @liggitt and @smarterclayton agree too 😄

@liggitt
Copy link
Member

liggitt commented Jul 25, 2016

possible. I also wonder about the watch handling endpoints, which write chunked data and expect it to be flushed to the client, but also be able to timeout.

@smarterclayton
Copy link
Contributor

Watch uses hijack for web sockets, but only needs to flush for chunked. I
thought for watch we set timeouts on the connection ourselves.

On Mon, Jul 25, 2016 at 10:56 AM, Jordan Liggitt notifications@github.com
wrote:

possible. I also wonder about the watch handling endpoints, which write
chunked data and expect it to be flushed to the client, but also be able to
timeout.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#10656 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p8dbdvNNTjSFG12Y0kj0aXTLDypXks5qZM6ngaJpZM4FQR6U
.

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@liggitt @smarterclayton

OK. So we do not need timeout for spdy upgrade as @ncdc mentioned. We do not need timeout for watch, which requires flush, since it should set timeout on connection.

The we probably should just use the standard timeout handler (which does not support hijacking and flush for good reasons) instead of the customized one.

I can make the change if @smarterclayton @liggitt @ncdc all agree.

@liggitt
Copy link
Member

liggitt commented Jul 25, 2016

@ncdc do you know whether our current timeout is exercised for proxy paths?

@liggitt
Copy link
Member

liggitt commented Jul 25, 2016

I just want to be careful that we don't regress spdy connections (exec/attach), websocket connections (exec/attach/log/watch), or streaming connections (log, watch, proxy). I also wonder about the perf impact of buffering all responses in memory... some responses can get quite large, especially for controllers that list across all namespaces.

@ncdc
Copy link
Member

ncdc commented Jul 25, 2016

do you know whether our current timeout is exercised for proxy paths?

@liggitt which timeout in particular? for exec/attach/pf?

@xiang90
Copy link
Contributor

xiang90 commented Jul 25, 2016

@liggitt First, we need to make things correct. Then we can start taking care about performance, optimization. Now it is broken.

@liggitt
Copy link
Member

liggitt commented Jul 25, 2016

@ncdc if the current timeout handler is being used in proxy paths, or if proxy gets exempted from timeouts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

active log streaming times out after 5 minutes