Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
net: TestTCPReadWriteAllocs flakiness #8859
i now see the following on darwin w/ tip: go test -run=TestTCPReadWriteMallocs -cpu=1,2,4,8,16,32 --- FAIL: TestTCPReadWriteMallocs (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-2 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-4 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-8 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-16 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 --- FAIL: TestTCPReadWriteMallocs-32 (0.02s) tcp_test.go:531: Got 1 allocs, want 0 FAIL not tested on all platforms yet, but freebsd and linux work fine. % hg id 920cde0a8b2d+ tip % uname -a Darwin airborne.local 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64
This still breaks on my OS X machine.
It is an intermittent failure, but it happens enough to be frustrating. Anecdotally, it happens more when running all.bash or the full net tests than just the lone test. I am in Portugal at the moment with flaky internet connections; it is possible that the failures are correlated to internet problems, although I have been unable to pin it down. I have also been unable to reproduce this with the test converted to a benchmark (I was hoping to use -memprofile to point to the source of the allocations).
I'm running 10.10.2 (14C109) with the pprof-enabling kernel patch.
If you have suggestions for other things to try to diagnose this, I'm happy to help.
I managed to get a stack trace, and I think I have a diagnosis.
Under some network conditions (I don't understand which), the
It's not obvious to me what the right fix is. Perhaps we should ignore
Incidentally, this has nothing to do with the kernel patch. The patch is extremely narrow in scope and concerned with the handling of profiling signals, which are not even in use when this test fails.
referenced this issue
Mar 5, 2015
Thanks for the investigation. I now understand that some additional stuff (e.g., pprof_mac_fix) or environment condition to the kernel can easily shake scheduling for uio_vectors and make EAGAIN notifications. That means that assuming netpoll hotpaths sail on calm sea during test is pretty naive. Perhaps permitting netpoll hotpaths allocate a few stuff in TestReadWriteAllocs might make sense.