New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestTCPSelfConnect reports failure after 34 hours of stress testing. #18290

Open
RLH opened this Issue Dec 12, 2016 · 6 comments

Comments

Projects
None yet
6 participants
@RLH
Contributor

RLH commented Dec 12, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

On a
rlh@rlh1:~/work/go/src$ go version
go version devel +d4b46aa Thu Dec 8 01:36:44 2016 +0000 linux/amd64

What operating system and processor architecture are you using (go env)?

rlh@rlh1:~/work/go/src$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/usr/local/google/home/rlh/work/code"
GORACE=""
GOROOT="/usr/local/google/home/rlh/work/go"
GOTOOLDIR="/usr/local/google/home/rlh/work/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build725383804=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
rlh@rlh1:~/work/go/src$ go version
go version devel +d4b46aa Thu Dec 8 01:36:44 2016 +0000 linux/amd64
rlh@rlh1:~/work/go/src$ 
rlh@rlh1:
This is a 6 core 12 thread machine.
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
stepping	: 2
siblings	: 12

What did you do?

I ran ./run.bash in a simple loop using the following.
GOTRACEBACK=crash ./run.bash -no-rebuild

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
See above.

What did you expect to see?

Success.

What did you see instead?

After a day (34 hours) 727 successful runs I got the following failure.

..snip. ...

ok mime/quotedprintable 0.162s
--- FAIL: TestTCPSelfConnect (0.13s)
tcpsock_test.go:661: Dial 127.0.0.1:48322 should fail
tcpsock_test.go:661: Dial 127.0.0.1:48324 should fail
tcpsock_test.go:661: Dial 127.0.0.1:48325 should fail
tcpsock_test.go:661: Dial 127.0.0.1:48326 should fail
tcpsock_test.go:661: Dial 127.0.0.1:48327 should fail
tcpsock_test.go:661: Dial 127.0.0.1:48328 should fail
tcpsock_test.go:661: Dial 127.0.0.1:48329 should fail
FAIL
FAIL net 1.527s
ok net/http 2.550s
ok net/http/cgi 0.495s

... snip ..

Note missing 48323.

I have not reproduced it. The only reason this is a concern is that it limits stress testing speculative parts of the GC and runtime.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 12, 2016

@bradfitz bradfitz added the Testing label Dec 12, 2016

@bradfitz bradfitz added this to the Go1.9 milestone Dec 12, 2016

@mikioh

This comment has been minimized.

Contributor

mikioh commented Jul 7, 2017

Challenging against TCP simultaneous open is a bit tough job. Let's push out the test from short-mode testing except when running on buildbots.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 7, 2017

If the test just rarely catches the bug, I'd rather not sweet it under the rug by making it run less often.

Is this not a real bug?

If so, what's the bug.
If not, I'd rather deflake it.

@mikioh

This comment has been minimized.

Contributor

mikioh commented Jul 7, 2017

Is this not a real bug?

I'm confused. The fix (https://codereview.appspot.com/5650071) was approved by you. I'm not sure what you are asking for.

My understanding is that TCP simultaneous open is a valid behavior of TCP and the fix written by @rsc was just a try to mitigate some undesirable side effect of the behavior for helping some application. But that might be a compromise. I think it's basically impossible to avoid TCP simultaneous open completely unless we have full control of each TCP FSM and IP control block inside the kernel.

@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017

@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Go1.9Maybe Aug 2, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 8, 2017

I don't see any GNU/Linux failures of TestTCPSelfConnect in the builders at all. I do see a few failures on Solaris; for example: https://build.golang.org/log/5315cb250600796a1ffe1c4c3a8c3f626769b692.

@mikesmitty

This comment has been minimized.

mikesmitty commented Aug 24, 2018

Happened to run into this one as well with 1.11rc2 on CentOS 7 x86_64:

--- FAIL: TestTCPSelfConnect (0.13s)
    tcpsock_test.go:665: Dial 127.0.0.1:41378 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41380 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41382 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41384 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41386 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41388 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41390 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41392 should fail
    tcpsock_test.go:665: Dial 127.0.0.1:41396 should fail
FAIL
FAIL	net	2.418s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment