New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestDialCancel is flaky on linux/arm64-buildlet #15191

Open
mikioh opened this Issue Apr 8, 2016 · 4 comments

Comments

Projects
None yet
4 participants

@bradfitz bradfitz added this to the Unplanned milestone Apr 8, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 8, 2016

Odd:

--- FAIL: TestDialCancel (0.01s)
    dial_test.go:870: dial error after 0 ticks (5 before cancel sent): dial tcp 198.18.0.254:1234: getsockopt: network is unreachable
FAIL
FAIL    net 1.745s

Why would it return ENETUNREACH, but only sometimes? Amusingly, getsockopt's man page (http://linux.die.net/man/2/getsockopt) doesn't even mention this error.

/cc @ianlancetaylor @davecheney @minux

@bradfitz bradfitz changed the title from net: TestDialCancel is flaky on linux/arm64-buidlet to net: TestDialCancel is flaky on linux/arm64-buildlet Apr 8, 2016

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Apr 9, 2016

If you look at the code in netFD.connect in fd_unix.go, you'll see that (most likely) getsockopt is not returning ENETUNREACH. Instead, getsockopt(SO_ERROR) is succeeding in retrieving the error associated with the socket, and that error is ENETUNREACH. The error is really coming from connect, and it means that there is no route to the IP address.

@bradfitz bradfitz added the Testing label Apr 12, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Apr 12, 2016

I'm just going to disable this test for now. I think that machine (on Linaro) has different routes than we've normally assumed for tests.

For the record,

root@r2-a25-go1:/home/brad.fitzpatrick# ifconfig 
eth0      Link encap:Ethernet  HWaddr 00:16:3e:0c:9c:8a  
          inet addr:10.20.3.110  Bcast:10.20.255.255  Mask:255.255.0.0
          inet6 addr: fe80::216:3eff:fe0c:9c8a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5151248 errors:0 dropped:0 overruns:0 frame:0
          TX packets:996113 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5114537899 (5.1 GB)  TX bytes:3534764800 (3.5 GB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:2065172 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2065172 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:6848213327 (6.8 GB)  TX bytes:6848213327 (6.8 GB)

lxcbr0    Link encap:Ethernet  HWaddr a6:d1:77:00:04:d2  
          inet addr:10.0.3.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::a4d1:77ff:fe00:4d2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:570 (570.0 B)

root@r2-a25-go1:/home/brad.fitzpatrick# ip route show
default via 10.20.0.1 dev eth0 
10.0.3.0/24 dev lxcbr0  proto kernel  scope link  src 10.0.3.1 
10.20.0.0/16 dev eth0  proto kernel  scope link  src 10.20.3.110 

gopherbot pushed a commit that referenced this issue Apr 12, 2016

net: skip TestDialCancel on linux-arm64-buildlet
These builders (on Linaro) have a different network configuration
which is incompatible with this test. Or so it seems.

Updates #15191

Change-Id: Ibfeacddc98dac1da316e704b5c8491617a13e3bf
Reviewed-on: https://go-review.googlesource.com/21901
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@paulzhol

This comment has been minimized.

Member

paulzhol commented Aug 7, 2017

I've started seeing these as well on freebsd-arm-paulzhol:
https://build.golang.org/log/d89169422b2e1c3f4765d7d9093faa56510e21ec
https://build.golang.org/log/6b83f0f1b1ba97c0efb74fae4bb05280c26b8a22
https://build.golang.org/log/664eb5529a95b2b051d874ebc6d4f0412266984e

I'm not sure why it started appearing now. There have been some changes in the environment: switched to a buildlet based builder, upgrade to FreeBSD 11.1 etc. But they don't seem to be related.

For my setup I can track the cause to the router/firewall replying with a TCP RST segment when dialing to the 198.18.0.0/15 subnet:

19:13:12.724990 IP 192.168.X.Y.29436 > 198.18.0.254.1234: Flags [S], seq 2301688538, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 157886 ecr 0], length 0
19:13:12.725786 IP 198.18.0.254.1234 > 192.168.X.Y.29436: Flags [R.], seq 0, ack 2301688539, win 0, length 0

It is OpenBSD pf firewall's recommended behavior according to https://www.openbsd.org/faq/pf/example1.html:

block in quick on egress from <martians> to any
block return out quick on egress from any to <martians>

Packets coming in on the egress interface should be dropped if they appear to be from the list of unroutable addresses we defined. Such packets were likely sent due to misconfiguration, or possibly as part of a spoofing attack. Similarly, our clients should not attempt to connect to such addresses. We'll specify the "return" action to prevent annoying timeouts for users. Note that this can cause problems if you're doing double NAT.

Where <martians> is a table containing 198.18.0.0/15 as well as other non-routable address ranges.
The block return rule's behavior according to the pf.conf manual is

This causes a TCP RST to be returned for TCP packets and an ICMP UNREACHABLE for other types of packets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment