Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net, os: Set*Deadline() expiration error should be unique, as .Timeout() is true for keepalive, etc #31449

Closed
networkimprov opened this issue Apr 12, 2019 · 59 comments

Comments

@networkimprov
Copy link

@networkimprov networkimprov commented Apr 12, 2019

This is a bug. A keepalive error is a connection failure, not a deadline event.

net Docs say:

A zero value for [deadline] means I/O operations will not time out.

A zero value for [read deadline] means Read will not time out.

KeepAlive specifies the keep-alive period for an active network connection.
If zero, keep-alives are enabled if supported by the protocol and operating system.
Network protocols or operating systems that do not support keep-alives ignore this field.
If negative, keep-alives are disabled.

For a TLS connection that's been severed, Conn.Read() returns a net.Error with .Timeout()==true due to KeepAlive failure. (Go 1.12, Linux, amd64)

The Error should give .Timeout()==false to comply with the docs. Code that checks for .Timeout()==true would generally assume that an explicit deadline had passed.

The .Error() string should mention keepalive. It's currently:
"read tcp n.n.n.n:p->n.n.n.n:p: read: connection timed out"

Related: net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn't documented in package net.

cc @bradfitz

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Apr 16, 2019

Also filed #31490 to report that tls.DialWithDialer() doesn't respect .KeepAlive.

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Apr 17, 2019

Since 1.11, net.Dialer.KeepAlive gets 15*time.Second if unset/zero. This isn't documented in package net.

It is documented that we enable keep-alives if that field is unset/zero. The choice of not specifying 15s in the docs was deliberate and is actually documented in the commit message: 5bd7e9c

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Jun 24, 2019

I believe this is a new bug in 1.12, so a fix should be backported to a 1.12.x release.
Also reported in #32735

cc @ianlancetaylor
@gopherbot add release-blocker

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 24, 2019

Do you know what makes this new in 1.12? I'm not seeing it.

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Jun 24, 2019

This decided on keepalive by default in 1.12: #23459. That causes deadline handlers in existing code to see non-deadline (i.e. keepalive) errors.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 24, 2019

If I understand you correctly, you are saying that the bug has existed for a long time for programs that enable TCP keepalive by setting the KeepAlive field in net.Dialer, but that it is more likely to occur in 1.12 because now that field is set by default. Is that correct?

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Jun 24, 2019

Yes. It's probably rare to use both deadlines and explicit keepalive, so the bug wasn't reported.

Now any code with long deadlines (relatively common) will wrongly detect deadline events due to implicit keepalive.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Jul 15, 2019

Go 1.13 also enables Keep-Alives by default on the net.Listen side (1abf3aa), so this might be worth fixing now, before exposing a new wave of applications to it.

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Jul 16, 2019

@costela, your keepalive patch will trigger this bug...

@gopherbot please open backport 1.12

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 16, 2019

Backport issue(s) opened: #33137 (for 1.12).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@costela
Copy link
Contributor

@costela costela commented Jul 16, 2019

The Error should give .Timeout()==false to comply with the docs. Code that checks for .Timeout()==true would generally assume that an explicit deadline had passed.

I can't seem to find anything in the docs saying that a keepalive timeout should not set .Timeout()==true. IMHO this is not necessarily obvious and should be clarified in the docs if it's indeed intended.

This isn't to say the new default behavior wouldn't introduce a potentially breaking change, but I'm not sure changing the error to be .Timeout()==false is the right approach. Just as there might be code depending on .Timeout()==true for detecting deadlines, there might be code depending on the same behavior for explicitly set keepalives. Or am I missing something obvious?

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Jul 16, 2019

Docs say "A zero value for [deadline] means I/O operations will not time out."

Don't worry about it; it's been assigned. Sorry to bother you.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Jul 17, 2019

Deadlines and keep-alive errors are deeply different: the former are fully recoverable, the latter aren't, for example. It seems unlikely any code would ever want to handle them the same way.

Keep-alives are more akin to the connection dropping, so I don't think marking them as Timeouts makes sense. I'll make this change tomorrow.

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Jul 17, 2019

@FiloSottile, let's make the .Error() string mention keep-alive. It's currently:
"read tcp n.n.n.n:p->n.n.n.n:p: read: connection timed out"

Also I opened a 1.12.x backport issue.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Jul 24, 2019

I looked into this, and as far as I can tell, there is no way to single out keep-alive errors: they just surface a ETIMEDOUT from the read().

What we might do, is isolate deadline errors, which as far as I can tell (although I am no expert on that part of the code) are handled by runtime timers, and make those more uniquely recognizable. For example, by making them an exported error type.

We might even decide to make only those have Timeout() true. Currently Timeout() is true for e == EAGAIN || e == EWOULDBLOCK || e == ETIMEDOUT and for some DNS resolution errors. That line has last been touched in 2011, and I don't feel confident about how each of those ids map to a timeout.

If I'm right, this is not a small fix to a specific error value anymore, and I don't think it's something we should ship at the very end of the freeze. /cc @golang/osp-team

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2019

I believe that if a read from a network connection fails due to a deadline, the read will return internal/poll.ErrTimeout. I believe that if a read fails due to a keep-alive error, the read will return syscall.ETIMEDOUT.

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Jul 24, 2019

Ian, can we make net.OpError.Timeout() check internal/poll.ErrTimeout?

Altho we could undo #23378 for 1.13, we still need to fix this in 1.12.x & 1.13 due to #23459.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2019

Seems a fair bit simpler to change internal/poll.ErrTimeout to not return true for the Timeout method. Or just not implement the Timeout method. Might be a good idea to change the name, of course.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 9, 2019

Change https://golang.org/cl/189757 mentions this issue: net: document that a keep-alive failure also returns a timeout

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 9, 2019

I sent a CL to update the docs for SetDeadline. That seems to be the relevant place. This kind of detail would be out of place in the other locations mentioned.

gopherbot pushed a commit that referenced this issue Aug 11, 2019
Updates #31449

Change-Id: I76490c5e83eb2f7ba529b387a57ba088428aece5
Reviewed-on: https://go-review.googlesource.com/c/go/+/189757
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
@rsc
Copy link
Contributor

@rsc rsc commented Oct 9, 2019

Back in #31449 (comment), I wrote:

For Go 1.14 maybe the answer is to change our net.Conn implementations to return an error that Is(context.DeadlineExceeded), to allow a more precise check than Timeout.

We did not get into this, and we are close to the freeze. It seems like we should probably put this off to next cycle. But I will retitle so it is easier to understand what this is about.

@rsc rsc changed the title net: Conn.Read() returns Error with .Timeout()==true on KeepAlive failure net: document that Conn.Read/Write should return error that Is(context.DeadlineExceeded) for deadline exceeded Oct 9, 2019
@rsc rsc modified the milestones: Go1.14, Go1.15 Oct 9, 2019
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 17, 2020

Above @rsc suggests that we change net.Conn.Read/Write to return context.DeadlineExceeded if they time out due to exceeding a deadline. That will change the error when printed from i/o timeout to context deadline exceeded. I don't think that would be an ideal change, as I think it is confusing to refer to a context when no context is involved.

I suggest that we instead add net.ErrDeadlineExceeded and os.ErrDeadlineExceeded and return those.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 17, 2020

Actually, let's just add os.ErrDeadlineExceeded.

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Apr 17, 2020

Maybe go vet should flag use of err.Timeout() after Set*Deadline(), since that is not a reliable way to check for deadline-expired.

Why is this error related to package os? Aren't deadlines a net.Conn concept?

EDIT: I forgot that deadlines also exist for os.File.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 17, 2020

Change https://golang.org/cl/228644 mentions this issue: net: return context.DeadlineExceeded if past context deadline

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 17, 2020

Any change to "go vet" can be a separate issue. Although I suspect that that change is not feasible as it would make some amount of existing code non-vet-compliant, which we can't really do.

The os package has SetDeadline just as the net package does.

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Apr 17, 2020

Could net.ErrDeadlineExceeded be added as an alias to os.ErrDeadlineExceeded?

Filed #38508 for the vet issue.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 17, 2020

Sure, we could have both net.ErrDeadlineExceeded and os.ErrDeadlineExceeded, but I don't see a reason to do that. The net package already depends on the os package. I'm open to it if there is a good reason for it.

@networkimprov
Copy link
Author

@networkimprov networkimprov commented Apr 17, 2020

It avoids the need to import "os" when calling net.Conn.Set*Deadline(). Your first instinct was to provide it :-)

Almost all code using Set*Deadline() now needs to be updated. It's odd that those changes should require an import unless "os" is already imported.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 17, 2020

My guess is that very little code that uses SetDeadline will need to be updated. The only code that needs to be updated is code that needs to reliably determine whether the connection failed due to an exceeded deadline. Most programs will just see that the connection failed and carry on.

I would like to hear someone else's opinion on this question.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 17, 2020

Change https://golang.org/cl/228645 mentions this issue: os, net: define and use os.ErrDeadlineExceeded

@networkimprov networkimprov changed the title net: document that Conn.Read/Write should return error that Is(context.DeadlineExceeded) for deadline exceeded net, os: Set*Deadline() expiration error should be unique, as .Timeout() is true for keepalive, etc Apr 18, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Apr 22, 2020

Ian's CL seems worth trying. Let's just be ready to roll it back if there are problems.

@gopherbot gopherbot closed this in d422f54 Apr 25, 2020
xujianhai666 added a commit to xujianhai666/go-1 that referenced this issue May 21, 2020
If an I/O operation fails because a deadline was exceeded,
return os.ErrDeadlineExceeded. We used to return poll.ErrTimeout,
an internal error, and told users to check the Timeout method.
However, there are other errors with a Timeout method that returns true,
notably syscall.ETIMEDOUT which is returned for a keep-alive timeout.
Checking errors.Is(err, os.ErrDeadlineExceeded) should permit code
to reliably tell why it failed.

This change does not affect the handling of net.Dialer.Deadline,
nor does it change the handling of net.DialContext when the context
deadline is exceeded. Those cases continue to return an error
reported as "i/o timeout" for which Timeout is true, but that error
is not os.ErrDeadlineExceeded.

Fixes golang#31449

Change-Id: I0323f42e944324c6f2578f00c3ac90c24fe81177
Reviewed-on: https://go-review.googlesource.com/c/go/+/228645
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
@ianlancetaylor ianlancetaylor mentioned this issue Jun 12, 2020
133 of 133 tasks complete
@gopherbot
Copy link

@gopherbot gopherbot commented Jun 24, 2020

Change https://golang.org/cl/239705 mentions this issue: net: consistently document deadline handling

gopherbot pushed a commit that referenced this issue Jun 24, 2020
After CL 228645 some mentions of the Deadline methods referred
to the Timeout method, and some to os.ErrDeadlineExceeded.
Stop referring to the Timeout method, to encourage ErrDeadlineExceeded.

For #31449

Change-Id: I27b8ff34f31798f38b06437546886af8cce98ca4
Reviewed-on: https://go-review.googlesource.com/c/go/+/239705
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.