Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestDialerDualStack fails when net.inet.tcp.blackhole=2 #12052

Closed
aperum opened this issue Aug 6, 2015 · 12 comments

Comments

Projects
None yet
5 participants
@aperum
Copy link

commented Aug 6, 2015

$ pkg version -e go
go-1.4.2,1

This is the stock ports version used to bootstrap the build.

$ uname -m
amd64
$ freebsd-version -ku
10.1-RELEASE-p16
10.1-RELEASE-p16

With a freshly cloned ~/golang:

~/golang$ git checkout go1.5rc1
~/golang/src$ CC=clang GOROOT_BOOTSTRAP=/usr/local/go ./all.bash
...
ok mime 0.011s
ok mime/multipart 0.123s
ok mime/quotedprintable 0.342s
panic: test timed out after 3m0s

goroutine 166 [running]:
testing.startAlarm.func1()
/home/aperum/golang/src/testing/testing.go:703 +0x132
created by time.goFunc
/home/aperum/golang/src/time/sleep.go:129 +0x3a

goroutine 1 [chan receive]:
testing.RunTests(0x7986c8, 0x88e4a0, 0x9c, 0x9c, 0x51af01)
/home/aperum/golang/src/testing/testing.go:562 +0x8ad
testing.(*M).Run(0xc820053ef8, 0x80109e658)
/home/aperum/golang/src/testing/testing.go:494 +0x70
net.TestMain(0xc820053ef8)
/home/aperum/golang/src/net/main_test.go:50 +0x2b
main.main()
net/_test/_testmain.go:410 +0x113

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/home/aperum/golang/src/runtime/asm_amd64.s:1696 +0x1

goroutine 185 [IO wait]:
net.runtime_pollWait(0x80109f7b0, 0x77, 0x4502d0)
/home/aperum/golang/src/runtime/netpoll.go:157 +0x60
net.(_pollDesc).Wait(0xc820107f00, 0x77, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:73 +0x3a
net.(_pollDesc).WaitWrite(0xc820107f00, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:82 +0x36
net.(_netFD).connect(0xc820107ea0, 0x0, 0x0, 0x80109e728, 0xc82011c240, 0xecd555c6f, 0xe36d93,
0x890fa0, 0x0, 0x0)
/home/aperum/golang/src/net/fd_unix.go:114 +0x1f6
net.(_netFD).dial(0xc820107ea0, 0x80109f928, 0x0, 0x80109f928, 0xc820117470, 0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0)
/home/aperum/golang/src/net/sock_posix.go:137 +0x351
net.socket(0x720be0, 0x3, 0x2, 0x1, 0x0, 0xc820117400, 0x80109f928, 0x0, 0x80109f928, 0xc820117470, ...)
/home/aperum/golang/src/net/sock_posix.go:89 +0x411
net.internetSocket(0x720be0, 0x3, 0x80109f928, 0x0, 0x80109f928, 0xc820117470, 0xecd555c6f, 0xc800e36d93, 0x890fa0, 0x1, ...)
/home/aperum/golang/src/net/ipsock_posix.go:160 +0x141
net.dialTCP(0x720be0, 0x3, 0x0, 0xc820117470, 0xecd555c6f, 0xc800e36d93, 0x890fa0, 0x2, 0x0, 0x0)
/home/aperum/golang/src/net/tcpsock_posix.go:171 +0x11e
net.dialSingle(0xc8201f4800, 0x80109e590, 0xc820117470, 0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:364 +0x3f5
net.dialSerial.func1(0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:336 +0x75
net.dial(0x720be0, 0x3, 0x80109e590, 0xc820117470, 0xc8201dba40, 0xecd555c6f, 0xe36d93, 0x890fa0, 0x0, 0x0, ...)
/home/aperum/golang/src/net/fd_unix.go:40 +0x60
net.dialSerial(0xc8201f4800, 0xc82011c220, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:338 +0x760
net.(*Dialer).Dial(0xc8201dbf08, 0x720be0, 0x3, 0xc820011c30, 0xf, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/dial.go:232 +0x50f
net.TestDialerDualStack(0xc82016e2d0)
/home/aperum/golang/src/net/dial_test.go:662 +0x660
testing.tRunner(0xc82016e2d0, 0x88e638)
/home/aperum/golang/src/testing/testing.go:456 +0x98
created by testing.RunTests
/home/aperum/golang/src/testing/testing.go:561 +0x86d

goroutine 195 [IO wait]:
net.runtime_pollWait(0x80109f1b0, 0x72, 0xc8200920b0)
/home/aperum/golang/src/runtime/netpoll.go:157 +0x60
net.(_pollDesc).Wait(0xc8200e5790, 0x72, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:73 +0x3a
net.(_pollDesc).WaitRead(0xc8200e5790, 0x0, 0x0)
/home/aperum/golang/src/net/fd_poll_runtime.go:78 +0x36
net.(_netFD).accept(0xc8200e5730, 0x0, 0x80105a028, 0xc8201141c0)
/home/aperum/golang/src/net/fd_unix.go:408 +0x27c
net.(_TCPListener).AcceptTCP(0xc8200e0018, 0x722d80, 0x0, 0x0)
/home/aperum/golang/src/net/tcpsock_posix.go:254 +0x4d
net.(_TCPListener).Accept(0xc8200e0018, 0x0, 0x0, 0x0, 0x0)
/home/aperum/golang/src/net/tcpsock_posix.go:264 +0x3d
net.TestDialerDualStack.func2(0xc8200e5650, 0x80109f8f0, 0xc8200e0018)
/home/aperum/golang/src/net/dial_test.go:638 +0x27
net.(_dualStackServer).buildup.func1(0x7989e8, 0xc8200e5650, 0x1)
/home/aperum/golang/src/net/mockserver_test.go:138 +0x77
created by net.(*dualStackServer).buildup
/home/aperum/golang/src/net/mockserver_test.go:140 +0x73
FAIL net 180.011s
ok net/http 6.438s
ok net/http/cgi 0.641s
...

I can reproduce this also using release-branch.go1.5 and on two different machines.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2015

Does the failing test pass if you run it directly without a timeout?

go test -test.short net
@aperum

This comment has been minimized.

Copy link
Author

commented Aug 6, 2015

Yes it does with a few warnings i think are clang specific:

~/golang/src$ go test -test.short net

net

/usr/local/go/src/net/cgo_unix.go:53:31: warning: unknown attribute 'gcc_struct' ignored [-Wattributes]
} attribute((packed, gcc_struct)) *a = v;
^
/usr/local/go/src/net/cgo_unix.go:66:31: warning: unknown attribute 'gcc_struct' ignored [-Wattributes]
} attribute((packed, gcc_struct)) *a = v;
^
/usr/local/go/src/net/cgo_unix.go:75:31: warning: unknown attribute 'gcc_struct' ignored [-Wattributes]
} attribute((packed, gcc_struct)) *a = v;
^
/usr/local/go/src/net/cgo_unix.go:86:31: warning: unknown attribute 'gcc_struct' ignored [-Wattributes]
} attribute((packed, gcc_struct)) *a = v;
^
/usr/local/go/src/net/cgo_unix.go:103:31: warning: unknown attribute 'gcc_struct' ignored [-Wattributes]
} attribute((packed, gcc_struct)) *a = v;
^
5 warnings generated.

net

cc: warning: argument unused during compilation: '-pthread'
ok net 76.971s

@aperum

This comment has been minimized.

Copy link
Author

commented Aug 6, 2015

Just to be sure i redid with the correct CC and GOROOT_BOOSTRAP variables then everything seems fine:

~/golang/src$ CC=clang GOROOT_BOOTSTRAP=/usr/local/go go test -test.short net
ok net 76.920s

@ianlancetaylor ianlancetaylor added this to the Go1.6 milestone Aug 6, 2015

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2015

Marking for 1.6 to make sure the timeouts are long enough.

@mikioh mikioh changed the title go1.5rc1 net test times out on FreeBSD 10.1 amd64 net: test times out on FreeBSD 10.1 amd64 Aug 6, 2015

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2015

The set of net package test cases in short mode usually takes 2-4s, 10-20s on poor processors. @aperum can you show us the hardware profile (and load average) of your node under the test?

@aperum

This comment has been minimized.

Copy link
Author

commented Aug 10, 2015

Hardware is an i7 with 32GB RAM:

$ sysctl hw.model
hw.model: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
$ sysctl hw.ncpu
hw.ncpu: 8
$ sysctl hw.realmem
hw.realmem: 34359738368

This is a snapshot while building go:

vmstat 20
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs us sy id
4 2 0 6077M 8809M 4068 2 4 0 1687 138 0 0 118 1816 2717 6 1 93
2 0 0 6110M 8886M 109482 0 230 0 112048 338 103 101 236 163167 13895 72 10 18
0 0 0 6005M 8913M 40548 0 65 0 42661 233 46 43 105 110300 26903 15 4 82
0 0 0 6005M 8913M 149 0 0 0 162 219 20 20 55 277 987 0 0 100
0 0 0 6014M 8913M 254 0 0 0 267 231 20 20 47 425 903 0 0 100
0 0 0 6005M 8913M 359 0 0 0 412 231 23 23 56 380 1022 0 0 100
0 0 0 6165M 8891M 671 0 0 0 442 226 25 25 83 884 1343 0 0 100
0 0 0 6130M 8897M 307 0 0 0 292 226 23 23 97 618 1196 0 0 100
0 0 0 6103M 8900M 751 0 0 0 605 235 21 21 51 615 1163 0 0 100
0 0 0 5747M 8971M 3105 0 7 0 3683 228 23 23 66 27889 13073 4 0 96

The 100% idle lines are actually while the net tests are running. I also redid the test and made sure nothing is blocked by the firewall.

The process hanging around until the timeout hits seems to be this:

root 71283 0,0 0,0 46212 15896 11 S+ 1:44pm 0:00,33 /tmp/go-build754929443/net/http/_test/http.test -test.short=true -test.timeout=3m0s

Attaching truss to this process shows a lots of repeats of the following:

$ truss -f -s 255 -p 71283 -d
71283: -1439207173.792845586 SIGNAL 17 (SIGSTOP)
71283: 3.275998734 kevent(5,0x0,0,{0x7,EVFILT_READ,EV_CLEAR,0,0x1,0x8012bf9e8},64,0x0) = 1 (0x1)
71283: 3.276068462 clock_gettime(4,{1030936.806560565 }) = 0 (0x0)
71283: 3.276114131 _umtx_op(0xc8204d4110,0x10,0x1,0x0,0x0,0x0) = 0 (0x0)
71283: 3.276195355 kevent(5,0x0,0,{0x7,EVFILT_READ,EV_CLEAR,0,0x1fc1,0x8012bf9e8 0x9,EVFILT_WRITE,EV_CLEAR,0,0x1fc1,0x80
12bfaa8},64,{0.000000000 }) = 2 (0x2)
71283: 3.276228066 _umtx_op(0xaf5340,0x10,0x1,0x0,0x0,0x0) = 0 (0x0)
71283: 3.276268144 _umtx_op(0xc820034a10,0x10,0x1,0x0,0x0,0x7fffff1f8630) = 0 (0x0)
71283: 3.276346202 kevent(5,0x0,0,{},64,{0.000000000 }) = 0 (0x0)
71283: 3.276373134 read(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
71283: 3.277868526 clock_gettime(4,{1030936.808360401 }) = 0 (0x0)
71283: 3.279335541 write(9,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,280338) = 8129 (0x1fc1)
71283: 3.280824488 clock_gettime(0,{1439207175.487974993 }) = 0 (0x0)
71283: 3.280841429 write(9,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,272209) ERR#35 'Resource temporarily unavailable'
71283: 3.280940728 nanosleep({0.000020000 }) = 0 (0x0)
71283: 3.280956475 read(7,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,5802) = 4033 (0xfc1)
71283: 3.282432206 clock_gettime(4,{1030936.812924339 }) = 0 (0x0)
71283: 3.282460092 read(7,0xc8206ae000,4096) ERR#35 'Resource temporarily unavailable'
71283: 3.282486911 clock_gettime(0,{1439207175.489637741 }) = 0 (0x0)
71283: 3.282536165 nanosleep({0.000020000 }) = 0 (0x0)
71283: 3.282557953 clock_gettime(4,{1030936.813050807 }) = 0 (0x0)

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2015

@aperum,

Thanks. Can you try the following and let us know which one stalls for fault isolation.

cd $GOROOT/net
go test -c
./net.test -test.v -test.short

and

cd $GOROOT/net/http
go test -c
./http.test -test.v -test.short
@aperum

This comment has been minimized.

Copy link
Author

commented Aug 11, 2015

I used the freshly compiled go tool version to do these tests instead of the bootstrap version:

~/golang/src/net$ GOROOT=/root/golang ~/golang/bin/go test -c
~/golang/src/net$ ./net.test -test.v -test.short
...
--- FAIL: TestDialerDualStack (300.67s)
dial_test.go:630: got 1m15.001998257s; want <= 95ms

~/golang/src/net/http$ GOROOT=/root/golang ~/golang/bin/go test -c
~/golang/src/net/http$ ./http.test -test.v -test.short
...
=== RUN TestOnlyWriteTimeout
2015/08/11 09:24:11 http: panic serving 127.0.0.1:55095: runtime error: invalid memory address or nil pointer dereferenc
e
goroutine 360 [running]:
net/http.(_conn).serve.func1(0xc82041f1e0, 0x801379530, 0xc8205c8208)
/root/golang/src/net/http/server.go:1287 +0xb5
net/http_test.TestOnlyWriteTimeout.func1(0x801379720, 0xc82041f290, 0xc820661340)
/root/golang/src/net/http/serve_test.go:542 +0x2b4
net/http.HandlerFunc.ServeHTTP(0xc8205093a0, 0x801379720, 0xc82041f290, 0xc820661340)
/root/golang/src/net/http/server.go:1422 +0x3a
net/http/httptest.(_waitGroupHandler).ServeHTTP(0xc820509460, 0x801379720, 0xc82041f290, 0xc820661340)
/root/golang/src/net/http/httptest/server.go:200 +0xc2
net/http.serverHandler.ServeHTTP(0xc8204efbc0, 0x801379720, 0xc82041f290, 0xc820661340)
/root/golang/src/net/http/server.go:1862 +0x19e
net/http.(_conn).serve(0xc82041f1e0)
/root/golang/src/net/http/server.go:1361 +0xbee
created by net/http.(_Server).Serve
/root/golang/src/net/http/server.go:1910 +0x3f6
--- FAIL: TestOnlyWriteTimeout (295.93s)
serve_test.go:570: timeout waiting for Get error
...
=== RUN TestTransportClosesBodyOnError
--- FAIL: TestTransportClosesBodyOnError (641.82s)
transport_test.go:2338: timeout waiting for server handler to complete

There are a lot of http tests taking >600 secs. But i think this is a local problem as i can't reproduce them on another machine running the exact same OS version. So the only common failure is the TestDialerDualStack test from the net test suite.

Doing a tcpdump on the very long running http tests reveals countless packets 8128 bytes in length filled with 'a's. But as said on antoher machine the http test suite runs fine without a failure...

I uploaded the full transcript of the net and http tests, sans interface addresses, here:

http://pastebin.com/efAEkFwq
http://pastebin.com/hYXFU2e4

@aperum

This comment has been minimized.

Copy link
Author

commented Aug 11, 2015

Debugging this a bit further i extracted the dialClosedPort function from dial_test.go into a small standalone program and using this and tcpdump i saw that there were no RST packets sent.
This comes down to this setting:

$ sysctl net.inet.tcp.blackhole
net.inet.tcp.blackhole: 2

This usually prevents our SYN flooded servers from responding with a RST flood which is a good thing. Sadly this is a global option and not per interface, so this also happens on the loopback device.
Setting this to zero generates the correct answer to the SYN packets and the net testsuite concludes in just above 2 seconds.

This fixes the original problem. It seems this test was introduced with 1.5 so it never blew up before.

I still get the http test errors but as i only get them on one of two machines i'll consider them a local problem for now.

@mikioh mikioh changed the title net: test times out on FreeBSD 10.1 amd64 net: TestDialerDualStack fails when net.inet.tcp.blackhole=2 Aug 14, 2015

@mikioh mikioh added the Testing label Aug 14, 2015

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2015

@aperum, Thanks for the investigation. Please open a separate issue for net/http stuff if you have a such weird memory corruption? issue consistently.

/CC @pmarks-net

@pmarks-net

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2015

In theory, dialClosedPort could detect this configuration and skip all relevant tests, but this is probably working as intended: if you blackhole localhost, you're gonna have a bad time.

@aperum

This comment has been minimized.

Copy link
Author

commented Aug 14, 2015

It's quite common to use this option with hosts facing the internet but i agree it is a pity that this option also applies to loopback devices. But as you said this is more of an OS than a golang issue so i'll close this report, thanks for the debug help!

@aperum aperum closed this Aug 14, 2015

@golang golang locked and limited conversation to collaborators Aug 22, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.