New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: dial tests are flaky on BSD #15157

Open
bradfitz opened this Issue Apr 6, 2016 · 11 comments

Comments

Projects
None yet
@bradfitz
Member

bradfitz commented Apr 6, 2016

These have been flaking a lot on OpenBSD ...

https://storage.googleapis.com/go-build-log/f99ca413/openbsd-amd64-gce58_83506189.log

--- FAIL: TestDialTimeoutFDLeak (1.89s)
    dial_test.go:164: got 96; want >= 100
--- FAIL: TestDialerDualStackFDLeak (0.80s)
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
    dial_test.go:213: dial tcp 127.0.0.1:24574: i/o timeout
--- FAIL: TestReadUnixgramWithUnnamedSocket (0.19s)
    unixsock_test.go:60: read unixgram /tmp/go-nettest180765192: i/o timeout
--- FAIL: TestDialTimeoutMaxDuration (0.40s)
    timeout_test.go:140: #0: Dial didn't return in an expected time
FAIL
FAIL    net 12.082s

/cc @mikioh

@bradfitz bradfitz added the OS-OpenBSD label Apr 6, 2016

@bradfitz bradfitz added this to the Unplanned milestone Apr 6, 2016

gopherbot pushed a commit that referenced this issue Apr 6, 2016

net, runtime: skip flaky tests on OpenBSD
Flaky tests are a distraction and cover up real problems.

File bugs instead and mark them as flaky.

This moves the net/http flaky test flagging mechanism to internal/testenv.

Updates #15156
Updates #15157
Updates #15158

Change-Id: I0e561cd2a09c0dec369cd4ed93bc5a2b40233dfe
Reviewed-on: https://go-review.googlesource.com/21614
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>

@mikioh mikioh added the Testing label May 11, 2016

@bradfitz bradfitz changed the title from net: dial tests are flaky on OpenBSD to net: dial tests are flaky on BSD May 11, 2016

@bradfitz bradfitz added the OS-FreeBSD label May 11, 2016

@bradfitz bradfitz modified the milestones: Go1.7Maybe, Unplanned May 11, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented May 11, 2016

Also on FreeBSD:

https://build.golang.org/log/73446e673c8f780dbfbc11aaab2b4f8e4daefb68

ok      mime/quotedprintable    0.156s
--- FAIL: TestDialerDualStackFDLeak (0.58s)
    dial_test.go:172: dial tcp: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
    dial_test.go:172: dial tcp 127.0.0.1:59846: i/o timeout
FAIL
FAIL    net 2.256s
2016/05/11 17:45:36 Failed: exit status 1

@mikioh, @mdempsky, can one of you investigate?

@mdempsky

This comment has been minimized.

Member

mdempsky commented May 11, 2016

Yeah, I'll look into the OpenBSD failures today.

@gopherbot

This comment has been minimized.

gopherbot commented May 19, 2016

CL https://golang.org/cl/23244 mentions this issue.

gopherbot pushed a commit that referenced this issue May 19, 2016

net: deflake TestDialerDualStackFDLeak
Fixes #14717.
Updates #15157.

Change-Id: I7238b4fe39f3670c2dfe09b3a3df51a982f261ed
Reviewed-on: https://go-review.googlesource.com/23244
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

@adg adg modified the milestones: Go1.8, Go1.7Maybe Jul 18, 2016

@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016

@gopherbot

This comment has been minimized.

gopherbot commented Dec 20, 2016

CL https://golang.org/cl/34656 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 20, 2016

net: mark TestDialerDualStackFDLeak as flaky on OpenBSD
Updates #15157

Change-Id: Id280705f4382c3b2323f0eed786a400a184614de
Reviewed-on: https://go-review.googlesource.com/34656
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@josharian

This comment has been minimized.

Contributor

josharian commented Feb 2, 2017

Just saw what looked like a flake on a linux/amd64 race trybot run:

https://storage.googleapis.com/go-build-log/01e274f4/linux-amd64-race_4322dd7e.log

@mundaym

This comment has been minimized.

Member

mundaym commented Feb 8, 2017

Another possible flake on a linux/amd64 race trybot run:

https://storage.googleapis.com/go-build-log/f6b2f823/linux-amd64-race_be3b785d.log

@mdempsky

This comment has been minimized.

Member

mdempsky commented Feb 17, 2017

I spent a little time looking into the TestDialerDualStackFDLeak flake on OpenBSD last night.

I was able to repro the issue under ktrace, and Go doesn't appear to be doing anything obviously wrong. We make a non-blocking connect call, it returns EINPROGRESS, we register it with kqueue, and then the kevent syscall blocks for 6 seconds before the kernel reports that the connect failed.

The failure is always around 6 seconds, so I suspect it's something TCP timeout related.

I was next trying to repro the issue while tcpdump'ing lo0 (hoping it hints what part of the stack might be failing), but no success yet.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 17, 2017

@mdempsky, thanks for investigating.

If we can't make progress on this, though, the openbsd builders are doing more harm than good with flaky tests.

I think it might be time to slap on a bunch of testenv.SkipFlaky(t, 15157) to all these tests on OpenBSD.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Feb 17, 2017

@bradfitz For net flakes on OpenBSD, I'm inclined to agree.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Apr 12, 2017

Doesn't seem to be limited to BSD: https://storage.googleapis.com/go-build-log/fa1eb023/linux-amd64-race_e3f8fc0c.log

--- FAIL: TestDialTimeoutFDLeak (0.59s)
	dial_test.go:136: got 99; want >= 100
@gopherbot

This comment has been minimized.

gopherbot commented Apr 12, 2017

CL https://golang.org/cl/40498 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 12, 2017

net: delete TestDialTimeoutFDLeak
It's flaky and distracting.

I'm not sure what it's testing, either. It hasn't saved us before.

Somebody can resurrect it if they have time.

Updates #15157

Change-Id: I27bbfe51e09b6259bba0f73d60d03a4d38711951
Reviewed-on: https://go-review.googlesource.com/40498
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

lparth added a commit to lparth/go that referenced this issue Apr 13, 2017

net: delete TestDialTimeoutFDLeak
It's flaky and distracting.

I'm not sure what it's testing, either. It hasn't saved us before.

Somebody can resurrect it if they have time.

Updates golang#15157

Change-Id: I27bbfe51e09b6259bba0f73d60d03a4d38711951
Reviewed-on: https://go-review.googlesource.com/40498
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

@broady broady modified the milestones: Go1.9Maybe, Go1.9 Jul 17, 2017

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017

@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Unplanned Jan 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment