Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: dial to a non-existent address doen't return an error #8276

Closed
gopherbot opened this issue Jun 24, 2014 · 19 comments

Comments

Projects
None yet
5 participants
@gopherbot
Copy link

commented Jun 24, 2014

by coocood:

Before filing a bug, please check whether it has been fixed since the
latest release. Search the issue tracker and check that you're running the
latest version of Go:

Run "go version" and compare against
http://golang.org/doc/devel/release.html  If a newer version of Go exists,
install it and retry what you did to reproduce the problem.

Thanks.

What does 'go version' print?

go version go1.3 linux/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

http://play.golang.org/p/PVBGNsPLN1

What happened?

returned error is nil when dial a non-exists address.

What should have happened instead?

the error should not be nil.

Please provide any additional information below.

It happens only on Go 1.3, Go 1.2 doesn't have this issue, so I compared the source
code, and found out that this was caused by a change to "connect" method in
"net/fd_unix.go" file. So I edited this method with the old version, then
recompiled Go 1.3, the issue disappeared.
@davecheney

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2014

Comment 1:

Your example code does not compile. Did you paste the correct version ?

Status changed to WaitingForReply.

@gopherbot

This comment has been minimized.

Copy link
Author

commented Jun 24, 2014

Comment 2 by coocood:

I pasted to play.golang.org from my local file and forgot to import package.
Here is the correct one
http://play.golang.org/p/T4FsbNsqkc
@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2014

Comment 3:

interesting, could you please show/tell us;
- the output of attached tcpinfo.go on your environment that happens your issue (you
need to run "go get github.com/mikioh/tcp" first),
- outputs of all routing information by using iproute2 (ip r/n/l/a) or similar command
(pls don't forget to anonymize your private identifiers such as mac64/eui64 addresses),
- how did you verify that address (100.100.100.100) is a non-existent address on your
environment, your ip packet routing domain.

Attachments:

  1. tcpinfo.go (487 bytes)
@gopherbot

This comment has been minimized.

Copy link
Author

commented Jun 24, 2014

Comment 4 by coocood:

The address is randomly picked, I've tried other addresses, any non-existent address
will do.
my 'ip r' output is :
fe80::21f1:62c0:1c01:76a1 dev eth0 lladdr a4:1f:72:6b:52:e9 router STALE
10.12.104.1 dev wlan0 lladdr 44:2b:03:7d:15:4a STALE
10.12.113.1 dev eth0 lladdr 44:2b:03:7d:15:53 REACHABLE
the output of tcpinfo.go is like:
dial tcp 100.100.100.100:10000: i/o timeout
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ec00 SysInfo:0xc208018870}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ecc0 SysInfo:0xc2080188c0}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ed80 SysInfo:0xc208018910}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m58.388s LastDataReceived:4h31m58.388s LastAckReceived:4h31m58.388s
CC:0xc20800ee40 SysInfo:0xc208018960}
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m59.388s LastDataReceived:4h31m59.388s LastAckReceived:4h31m59.388s
CC:0xc20800ef60 SysInfo:0xc2080189b0}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h31m59.388s LastDataReceived:4h31m59.388s LastAckReceived:4h31m59.388s
CC:0xc20800f020 SysInfo:0xc208018a00}
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h32m0.388s LastDataReceived:4h32m0.388s LastAckReceived:4h32m0.388s
CC:0xc20800f140 SysInfo:0xc208018a50}
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h32m0.388s LastDataReceived:4h32m0.388s LastAckReceived:4h32m0.388s
CC:0xc20800f200 SysInfo:0xc208018aa0}
dial tcp 100.100.100.100:10000: i/o timeout
&{State:syn-sent Options:[] PeerOptions:[] RTT:0 RTTVar:250ms RTO:1s ATO:0
LastDataSent:4h32m1.392s LastDataReceived:4h32m1.392s LastAckReceived:4h32m1.392s
CC:0xc20800f320 SysInfo:0xc208018af0}
...
@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2014

Comment 5:

nice, pretty interesting, looks like
> State:syn-sent
something similar to half-tcp hole punching happens. which version of linux are you
running? also, if possible, pls let us know the detail of your network environment
configuration (especially for ip-layer: sysctl states, iptables for nat44,
6rd/nat64/xlat for address translations and link-layer) for repro. i just tried to repro
it on freebsd but failed.
@gopherbot

This comment has been minimized.

Copy link
Author

commented Jun 24, 2014

Comment 6 by coocood:

This issue was first caught on production environment which is CentOS 6.3.
Our application works like sentinel who detects other application's failure by dialing
its address, if the error is nil, we consider this application is alive.
My local machine is Ubuntu 14.04 which can also reproduce this issue, and I didn't
change any system variable since installed.
So I guess it has nothing to do with linux version or configuration.
@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2014

Comment 7:

thanks, looping in dmitriy.
dmitriy: in short, on linux, looks like there's some possibility that somewhere in
net.pollDesc.WaitWrite->net.pollDesc.Wait('w')->runtime.runtime_pollWait(..,
'w')->netpollunblock misses to call epoll_wait. can you identify the location?
@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2014

Comment 8:

i wrote a workaroud cl, https://golang.org/cl/105400043/, as you suggested. can
you try this and report back the result? thanks.

Status changed to New.

@gopherbot

This comment has been minimized.

Copy link
Author

commented Jun 24, 2014

Comment 9 by coocood:

I tried this CL, and it fixes this issue.
I hope there will be 1.3.1 release soon.
Thank you.
@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2014

Comment 10:

glad to hear that but that's a workaround. the heart of this issue is, on linux we
sometimes miss to call epoll_wait. that means that we'll face more disaster when we use
tcp fastopen protocol or similar stuff.
@gopherbot

This comment has been minimized.

Copy link
Author

commented Jul 3, 2014

Comment 11 by garton:

I have also hit this same issue.
Is the real issue and fix understood now?
If so, I'll stop debugging this.  If not, I'll carry on and post my findings.
@gopherbot

This comment has been minimized.

Copy link
Author

commented Jul 12, 2014

Comment 12 by garton:

It may be obvious already, but in case it helps:
The initial commit that broke this is this:
https://code.google.com/p/go/source/detail?r=5f662f12d550
Reverting the change (against release 1.3) appears to solve the problem for me.  Judging
from the history this possibly re-instates some other issue though.
@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2014

Comment 13:

a fix; https://golang.org/cl/120820043/
though, not tested on dragonfly (async-connect enabled platform) yet.
@gopherbot

This comment has been minimized.

Copy link
Author

commented Jul 28, 2014

Comment 14:

CL https://golang.org/cl/120820043 mentions this issue.
@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2014

Comment 15:

Labels changed: added release-go1.3.1.

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2014

Comment 16:

This issue was closed by revision c0325f5.

Status changed to Fixed.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2014

Comment 17:

Merging with 8426 because the same CL claims to fix both.

Status changed to Duplicate.

Merged into issue #8426.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2014

Comment 18:

Labels changed: added release-none, removed release-go1.3.1.

@adg

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2014

Comment 19:

This issue was closed by revision 073fc578434b.

Status changed to Fixed.

adg added a commit that referenced this issue May 11, 2015

[release-branch.go1.3] net: prevent spurious on-connect events via ep…
…oll on linux

««« CL 120820043 / 06a4b59c1393
net: prevent spurious on-connect events via epoll on linux

On Linux, adding a socket descriptor to epoll instance before getting
the EINPROGRESS return value from connect system call could be a root
cause of spurious on-connect events.

See golang.org/issue/8276, golang.org/issue/8426 for further information.

All credit to Jason Eggleston <jason@eggnet.com>

Fixes #8276.
Fixes #8426.

LGTM=dvyukov
R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman
CC=golang-codereviews
https://golang.org/cl/120820043
»»»

TBR=r, rsc
CC=golang-codereviews
https://golang.org/cl/128110045

@golang golang locked and limited conversation to collaborators Jun 25, 2016

wheatman added a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018

net: prevent spurious on-connect events via epoll on linux
On Linux, adding a socket descriptor to epoll instance before getting
the EINPROGRESS return value from connect system call could be a root
cause of spurious on-connect events.

See golang.org/issue/8276, golang.org/issue/8426 for further information.

All credit to Jason Eggleston <jason@eggnet.com>

Fixes golang#8276.
Fixes golang#8426.

LGTM=dvyukov
R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman
CC=golang-codereviews
https://golang.org/cl/120820043

wheatman added a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018

net: prevent spurious on-connect events via epoll on linux
On Linux, adding a socket descriptor to epoll instance before getting
the EINPROGRESS return value from connect system call could be a root
cause of spurious on-connect events.

See golang.org/issue/8276, golang.org/issue/8426 for further information.

All credit to Jason Eggleston <jason@eggnet.com>

Fixes golang#8276.
Fixes golang#8426.

LGTM=dvyukov
R=dvyukov, golang-codereviews, adg, dave, iant, alex.brainman
CC=golang-codereviews
https://golang.org/cl/120820043

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.