Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
net: TestTCPReadWriteAllocs flakiness #8859
i now see the following on darwin w/ tip: go test -run=TestTCPReadWriteMallocs -cpu=1,2,4,8,16,32 --- FAIL: TestTCPReadWriteMallocs (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-2 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-4 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-8 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-16 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-32 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 FAIL not tested on all platforms yet, but freebsd and linux work fine. % hg id 920cde0a8b2d+ tip % uname -a Darwin airborne.local 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64
This still breaks on my OS X machine.
It is an intermittent failure, but it happens enough to be frustrating. Anecdotally, it happens more when running all.bash or the full net tests than just the lone test. I am in Portugal at the moment with flaky internet connections; it is possible that the failures are correlated to internet problems, although I have been unable to pin it down. I have also been unable to reproduce this with the test converted to a benchmark (I was hoping to use -memprofile to point to the source of the allocations).
I'm running 10.10.2 (14C109) with the pprof-enabling kernel patch.
If you have suggestions for other things to try to diagnose this, I'm happy to help.
Having this test fail, as it does reliably for me, makes working frustrating. Disable it for now, until we can diagnose the issue. Update issue #8859. Change-Id: I9dda30d60793e7a51f48f445c78ccb158068cc25 Reviewed-on: https://go-review.googlesource.com/6381 Reviewed-by: Brad Fitzpatrick <firstname.lastname@example.org>
I managed to get a stack trace, and I think I have a diagnosis.
Under some network conditions (I don't understand which), the
It's not obvious to me what the right fix is. Perhaps we should ignore
Incidentally, this has nothing to do with the kernel patch. The patch is extremely narrow in scope and concerned with the handling of profiling signals, which are not even in use when this test fails.
Thanks for the investigation. I now understand that some additional stuff (e.g., pprof_mac_fix) or environment condition to the kernel can easily shake scheduling for uio_vectors and make EAGAIN notifications. That means that assuming netpoll hotpaths sail on calm sea during test is pretty naive. Perhaps permitting netpoll hotpaths allocate a few stuff in TestReadWriteAllocs might make sense.
The previously-submitted https://go-review.googlesource.com/#/c/6701 didn't include dragonfly, freebsd, nacl, netbsd, openbsd, or solaris. (or things like darwin/arm or ppc64 or arm64) So do them all. Note I had to copy the function into tables_nacl.go. I found that preferable to creating a new file just to have suitable build tags. It's likely this function will be mirrored to plan9 and windows later too, each of the 4 with their own policy of which error values are common. The corresponding x/sys CL for this CL is https://golang.org/cl/8190 but it excludes nacl (not in x/sys) and solaris (already broken). Update Issue #8859 Change-Id: I91902615692b29b69c905edd9e126a26337294f6 Reviewed-on: https://go-review.googlesource.com/8192 Reviewed-by: Rob Pike <email@example.com> Run-TryBot: Brad Fitzpatrick <firstname.lastname@example.org> TryBot-Result: Gobot Gobot <email@example.com>