Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upnet: document that Conn.Read/Write should return error that Is(context.DeadlineExceeded) for deadline exceeded #31449
Comments
This comment has been minimized.
This comment has been minimized.
Also filed #31490 to report that tls.DialWithDialer() doesn't respect .KeepAlive. |
This comment has been minimized.
This comment has been minimized.
It is documented that we enable keep-alives if that field is unset/zero. The choice of not specifying 15s in the docs was deliberate and is actually documented in the commit message: 5bd7e9c |
This comment has been minimized.
This comment has been minimized.
I believe this is a new bug in 1.12, so a fix should be backported to a 1.12.x release. cc @ianlancetaylor |
This comment has been minimized.
This comment has been minimized.
Do you know what makes this new in 1.12? I'm not seeing it. |
This comment has been minimized.
This comment has been minimized.
This decided on keepalive by default in 1.12: #23459. That causes deadline handlers in existing code to see non-deadline (i.e. keepalive) errors. |
This comment has been minimized.
This comment has been minimized.
If I understand you correctly, you are saying that the bug has existed for a long time for programs that enable TCP keepalive by setting the |
This comment has been minimized.
This comment has been minimized.
Yes. It's probably rare to use both deadlines and explicit keepalive, so the bug wasn't reported. Now any code with long deadlines (relatively common) will wrongly detect deadline events due to implicit keepalive. |
This comment has been minimized.
This comment has been minimized.
Go 1.13 also enables Keep-Alives by default on the net.Listen side (1abf3aa), so this might be worth fixing now, before exposing a new wave of applications to it. |
This comment has been minimized.
This comment has been minimized.
@costela, your keepalive patch will trigger this bug... @gopherbot please open backport 1.12 |
This comment has been minimized.
This comment has been minimized.
Backport issue(s) opened: #33137 (for 1.12). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
This comment has been minimized.
This comment has been minimized.
I can't seem to find anything in the docs saying that a keepalive timeout should not set This isn't to say the new default behavior wouldn't introduce a potentially breaking change, but I'm not sure changing the error to be |
This comment has been minimized.
This comment has been minimized.
Docs say "A zero value for [deadline] means I/O operations will not time out." Don't worry about it; it's been assigned. Sorry to bother you. |
This comment has been minimized.
This comment has been minimized.
Deadlines and keep-alive errors are deeply different: the former are fully recoverable, the latter aren't, for example. It seems unlikely any code would ever want to handle them the same way. Keep-alives are more akin to the connection dropping, so I don't think marking them as Timeouts makes sense. I'll make this change tomorrow. |
This comment has been minimized.
This comment has been minimized.
@FiloSottile, let's make the Also I opened a 1.12.x backport issue. |
This comment has been minimized.
This comment has been minimized.
I looked into this, and as far as I can tell, there is no way to single out keep-alive errors: they just surface a ETIMEDOUT from the What we might do, is isolate deadline errors, which as far as I can tell (although I am no expert on that part of the code) are handled by runtime timers, and make those more uniquely recognizable. For example, by making them an exported error type. We might even decide to make only those have If I'm right, this is not a small fix to a specific error value anymore, and I don't think it's something we should ship at the very end of the freeze. /cc @golang/osp-team |
This comment has been minimized.
This comment has been minimized.
I believe that if a read from a network connection fails due to a deadline, the read will return |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Seems a fair bit simpler to change |
This comment has been minimized.
This comment has been minimized.
That seems workable for 1.13. |
This comment has been minimized.
This comment has been minimized.
First, we are only looking at keep-alive errors now, because they are the only ones tied to a Go 1.13 behavior change. We are not touching anything else months into the freeze. Second, I am growing unconvinced we can/should fix this.
The simplest change, removing ETIMEDOUT from I suppose we could special-case ETIMEDOUT from From another perspective, though, keep-alives always returned |
This comment has been minimized.
This comment has been minimized.
Two issues were filed re spurious deadline events in 1.12. Such bugs can be hard to detect unless you check the elapsed time for deadlines! A lot more folks would notice them if they start appearing in server-side apps in 1.13. Let's revert both keep-alive commits for 1.13, and back-port a revert to 1.12. It's reasonable to ask users to explicitly enable keep-alive if required, and many of us already use deadlines as a context-sensitive keep-alive technique. |
This comment has been minimized.
This comment has been minimized.
@networkimprov Can you point us to the two bugs filed against 1.12? I'm not excited about changing the 1.12 behavior at this point. |
This comment has been minimized.
This comment has been minimized.
I note that it would be feasible to handle |
This comment has been minimized.
This comment has been minimized.
#32735 and this one (I encountered spurious deadlines in an app with long-running links to servers.) This bug won't stop the app, but does change the way it was intended to work, with rather variable consequences. |
This comment has been minimized.
This comment has been minimized.
Change https://golang.org/cl/188657 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
For the record, https://golang.org/cl/188657 shows the change to not return |
This comment has been minimized.
This comment has been minimized.
ETIMEDOUT.Timeout() has been true for many releases. It seems quite clear we cannot redefine that now. I was sympathetic to CL 188657 until I searched the Linux kernel sources for ETIMEDOUT, as Filippo mentioned above. It gets returned from an enormous number of places, most of them having nothing to do with TCP, and many of them having nothing to do with networking. So translating ETIMEDOUT to ErrKeepAlive is clearly incorrect in the overwhelming majority of places where it appears in the kernel source. I don't believe we can be more precise here without a tremendous amount of system-specific code. Keepalives have been in previous releases and turned into errors with err.Timeout() == true. I realize they are a little more common now, but even so it does not seem terrible to continue the behavior of the past 13 releases into a 14th. I suggest we leave this alone for Go 1.13. For Go 1.14 maybe the answer is to change our net.Conn implementations to return an error that Is(context.DeadlineExceeded), to allow a more precise check than Timeout. |
This comment has been minimized.
This comment has been minimized.
Thanks, moving to 1.14. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
In my opinion we should cover the issue in the release notes. I think many more people will be helped by turning on keep-alive by default than will be hurt by the confusion around the |
This comment has been minimized.
This comment has been minimized.
A note like: "Code using SetDeadline() should generally disable keep-alive, which is now on by default for all TCP connections, both dialed and accepted" ...? After many years of keep-alives off by default, I don't understand the urgency for default-on before these errors can be discerned from deadline events. |
This comment has been minimized.
This comment has been minimized.
Because keep-alive is already on by default for 1.12, and people are using 1.12. Rolling back and then forward again is also confusing. |
This comment has been minimized.
This comment has been minimized.
Change https://golang.org/cl/188819 mentions this issue: |
Updates #31449 Change-Id: I4d7075b20cd8171bc792e40b388f4215264a3317 Reviewed-on: https://go-review.googlesource.com/c/go/+/188819 Reviewed-by: Filippo Valsorda <filippo@golang.org>
This comment has been minimized.
This comment has been minimized.
The text added to the release notes should also live in the docs, and be referenced by net.Dial() & .Listen(), net.Error.Timeout(), and net.Conn.Set*Deadline(). The docs are currently explicit that only deadlines cause "time out". See quote in issue text. |
This comment has been minimized.
This comment has been minimized.
Change https://golang.org/cl/189757 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
I sent a CL to update the docs for |
Updates #31449 Change-Id: I76490c5e83eb2f7ba529b387a57ba088428aece5 Reviewed-on: https://go-review.googlesource.com/c/go/+/189757 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Filippo Valsorda <filippo@golang.org>
This comment has been minimized.
This comment has been minimized.
Back in #31449 (comment), I wrote:
We did not get into this, and we are close to the freeze. It seems like we should probably put this off to next cycle. But I will retitle so it is easier to understand what this is about. |
This is a bug. A keepalive error is a connection failure, not a deadline event.
net Docs say:
For a TLS connection that's been severed, Conn.Read() returns a net.Error with
.Timeout()==true
due to KeepAlive failure. (Go 1.12, Linux, amd64)The Error should give
.Timeout()==false
to comply with the docs. Code that checks for.Timeout()==true
would generally assume that an explicit deadline had passed.The
.Error()
string should mention keepalive. It's currently:"read tcp n.n.n.n:p->n.n.n.n:p: read: connection timed out"
Related: net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn't documented in package net.
cc @bradfitz