New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestVariousDeadlines is flaky #19519

Open
fuzxxl opened this Issue Mar 12, 2017 · 7 comments

Comments

Projects
None yet
9 participants
@fuzxxl

fuzxxl commented Mar 12, 2017

What version of Go are you using (go version)?

go version go1.8 freebsd/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOOS="freebsd"
GOPATH="/home/fuz/src/go"
GORACE=""
GOROOT="/home/fuz/go"
GOTOOLDIR="/home/fuz/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build535249256=/tmp/go-build -gno-record-gcc-switches"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-I/home/fuz/include"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-L/home/fuz/lib"

uname -a output:

FreeBSD miso 11.0-RELEASE-p2 FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

ifconfig output:

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 34:e6:d7:60:b2:c5
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 1c:65:9d:0d:70:31
	inet 10.53.43.185 netmask 0xffff0000 broadcast 10.53.255.255 
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: IEEE 802.11 Wireless Ethernet MCS mode 11na
	status: associated
	ssid clt channel 100 (5500 MHz 11a ht/40+) bssid fc:5b:39:f6:fa:e9
	regdomain 106 indoor ecm authmode WPA2/802.11i privacy ON
	deftxkey UNDEF AES-CCM 2:128-bit AES-CCM 3:128-bit txpower 30 bmiss 7
	mcastrate 6 mgmtrate 6 scanvalid 60 ampdulimit 64k ampdudensity 8
	shortgi wme burst roaming MANUAL bintval 102
	groups: wlan 

sysctl hw.model output:

hw.model: Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz

What did you do?

I tried to update my Go 1.6 installation to Go 1.8 by removing the Go installation, downloading the source code from here and running all.bash.

What did you expect to see?

A successful build followed by a successful pass through the test suite.

What did you see instead?

A successful build followed by TestVariousDeadlines failing:

--- FAIL: TestVariousDeadlines (8.01s)
	timeout_test.go:877: 1ns run 1/1
	timeout_test.go:902: for 1ns run 1/1, good client timeout after 53.221µs, reading 0 bytes
	timeout_test.go:916: for 1ns run 1/1, timeout waiting for server to finish writing
FAIL
FAIL	net	9.515s

@bradfitz bradfitz added this to the Go1.9 milestone Mar 21, 2017

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 13, 2017

Not just freebsd. Just failed locally on darwin/amd64:

--- FAIL: TestVariousDeadlines (5.03s)
	timeout_test.go:878: 1ns run 1/1
	timeout_test.go:903: for 1ns run 1/1, good client timeout after 38.928µs, reading 0 bytes
	timeout_test.go:913: for 1ns run 1/1, server in 177.208µs wrote 32768: readfrom tcp4 127.0.0.1:59054->127.0.0.1:59055: write tcp4 127.0.0.1:59054->127.0.0.1:59055: write: broken pipe
	timeout_test.go:878: 2ns run 1/1
	timeout_test.go:903: for 2ns run 1/1, good client timeout after 4.776µs, reading 0 bytes
	timeout_test.go:913: for 2ns run 1/1, server in 89.252µs wrote 32768: readfrom tcp4 127.0.0.1:59054->127.0.0.1:59056: write tcp4 127.0.0.1:59054->127.0.0.1:59056: write: broken pipe
	timeout_test.go:878: 5ns run 1/1
	timeout_test.go:903: for 5ns run 1/1, good client timeout after 8.099µs, reading 0 bytes
	timeout_test.go:917: for 5ns run 1/1, timeout waiting for server to finish writing
FAIL
FAIL	net	6.371s

@josharian josharian removed the OS-FreeBSD label Apr 13, 2017

@josharian josharian changed the title from net: TestVariousDeadlines fails on FreeBSD to net: TestVariousDeadlines is flaky Apr 13, 2017

@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017

@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Unplanned Jan 3, 2018

@josharian josharian modified the milestones: Unplanned, Go1.11 Mar 30, 2018

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 30, 2018

This continues to be one of the more common flakes on the dashboard. Moving the milestone back to 1.11.

@mvdan

This comment has been minimized.

Member

mvdan commented Apr 7, 2018

Any hints to reproduce this issue? I just failed to do so on my linux/amd64 laptop, reaching 5000 successful runs with:

stress -p 256 ./net.test -test.run TestVariousDeadlines$ -test.cpu 10

Might be useful to start pasting builder failure logs here, to see if there is a pattern. For example, this might show up only on BSDs.

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 7, 2018

@bcmills

This comment has been minimized.

Member

bcmills commented Apr 26, 2018

Here's another repro in the builders (plan9-386):
https://build.golang.org/log/2f56b2bc471c2d8d68f862238d750ffcfc88c34d

@gopherbot gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018

@millerresearch

This comment has been minimized.

millerresearch commented Aug 6, 2018

The plan9_386 failure is very frequent but slightly different - it's always "timeout (5s) waiting for client to timeout (...) reading" where the reported freebsd and darwin failure is "timeout waiting for server to finish writing".

On plan9_386, I have never observed the failure on real hardware, or on qemu running on an otherwise idle server. This weekend I have managed to reproduce it (very intermittently) by simulating a busy server, by running qemu in parallel with many cpu-bound "nice --10" processes. My hypothesis is that it's the measurement of time that's flawed, not the network implementation.

@fuzxxl and @josharian , are your observed failures on real hardware or on virtual machines?

@fuzxxl

This comment has been minimized.

fuzxxl commented Aug 6, 2018

@millerresearch The failure was observed on real hardware (a Dell Precision M4800 running FreeBSD 11.0 I think). I am currently not able to reproduce it using Go 1.10 on the same machine now running FreeBSD 11.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment