Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/ipv4: flaky on Windows #17655

Closed
bradfitz opened this issue Oct 28, 2016 · 16 comments
Closed

x/net/ipv4: flaky on Windows #17655

bradfitz opened this issue Oct 28, 2016 · 16 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Oct 28, 2016

Trybots on the x/net repo sometimes flake on Windows:

https://storage.googleapis.com/go-build-log/1bd39e79/windows-amd64-gce_2c5af62a.log

ok      golang.org/x/net/bpf    0.064s
ok      golang.org/x/net/context    0.186s
ok      golang.org/x/net/context/ctxhttp    0.166s
?       golang.org/x/net/dict   [no test files]
ok      golang.org/x/net/html   0.298s
ok      golang.org/x/net/html/atom  0.045s
ok      golang.org/x/net/html/charset   0.135s
ok      golang.org/x/net/http2  10.806s
?       golang.org/x/net/http2/h2i  [no test files]
ok      golang.org/x/net/http2/hpack    0.059s
ok      golang.org/x/net/icmp   0.177s
ok      golang.org/x/net/idna   0.044s
?       golang.org/x/net/internal/iana  [no test files]
ok      golang.org/x/net/internal/netreflect    0.037s
?       golang.org/x/net/internal/nettest   [no test files]
ok      golang.org/x/net/internal/timeseries    0.041s
FAIL
FAIL    golang.org/x/net/ipv4   0.059s
ok      golang.org/x/net/ipv6   0.039s
ok      golang.org/x/net/lex/httplex    0.046s
ok      golang.org/x/net/netutil    1.024s
ok      golang.org/x/net/proxy  0.080s
ok      golang.org/x/net/publicsuffix   0.190s
ok      golang.org/x/net/trace  0.120s
ok      golang.org/x/net/webdav 0.180s
ok      golang.org/x/net/webdav/internal/xml    0.079s
ok      golang.org/x/net/websocket  0.115s
ok      golang.org/x/net/xsrftoken  0.031s

But why?

That failure doesn't say much.

/cc @mikioh

@bradfitz bradfitz added this to the Go1.8 milestone Oct 28, 2016
@mikioh
Copy link
Contributor

@mikioh mikioh commented Oct 28, 2016

If the branch is Go 1.6 or below, it's probably because of some leaked goroutine in test cases. The testing package of Go 1.7 or above can capture such flakiness. If it happens with tip, have no clue, but we can guess that the failure happens at an intersection between testee packages and testing package.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Oct 28, 2016

It's tip.

@quentinmit quentinmit added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 7, 2016
@mikioh
Copy link
Contributor

@mikioh mikioh commented Nov 16, 2016

Just took a look at testing package and I now feel like this is not an issue of ipv4 package. The interesting FAIL followed by FAIL golang.org/x/net/ipv4 0.059s lines mean that the running test process is killed by an observer; in this case a Windows kernel, though, I have no clue on the root cause.

@mikioh mikioh removed their assignment Nov 16, 2016
@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 2, 2016

@aclements, is this something your tool can grep for?

I'm wondering if it's happening enough to worry about.

@aclements
Copy link
Member

@aclements aclements commented Dec 2, 2016

Unfortunately I don't have the logs for the x repos.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 6, 2016

@alexbrainman, are you able to reproduce this flake?

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Dec 7, 2016

are you able to reproduce this flake?

No I cannot. But I can see different error:

c:\dev\src\golang.org\x\net>go version
go version devel +94a4485 Tue Dec 6 22:33:48 2016 +0000 windows/amd64

c:\dev\src\golang.org\x\net>go get -u ./...

c:\dev\src\golang.org\x\net>go test -short ./...
ok      golang.org/x/net/bpf    0.066s
ok      golang.org/x/net/context        0.138s
ok      golang.org/x/net/context/ctxhttp        0.195s
?       golang.org/x/net/dict   [no test files]
ok      golang.org/x/net/html   0.122s
ok      golang.org/x/net/html/atom      0.058s
ok      golang.org/x/net/html/charset   0.051s
ok      golang.org/x/net/http2  9.439s
?       golang.org/x/net/http2/h2i      [no test files]
ok      golang.org/x/net/http2/hpack    0.075s
ok      golang.org/x/net/icmp   0.039s
ok      golang.org/x/net/idna   0.030s
?       golang.org/x/net/internal/iana  [no test files]
ok      golang.org/x/net/internal/netreflect    0.036s
?       golang.org/x/net/internal/nettest       [no test files]
ok      golang.org/x/net/internal/timeseries    0.031s
--- FAIL: TestRawConnMulticastSocketOptions (0.00s)
        multicastsockopt_test.go:88: operation not supported
--- FAIL: TestRawConnUnicastSocketOptions (0.00s)
        unicastsockopt_test.go:101: operation not supported
FAIL
FAIL    golang.org/x/net/ipv4   0.065s
ok      golang.org/x/net/ipv6   0.040s
ok      golang.org/x/net/lex/httplex    0.046s
ok      golang.org/x/net/nettest        0.436s
ok      golang.org/x/net/netutil        1.071s
ok      golang.org/x/net/proxy  0.056s
ok      golang.org/x/net/publicsuffix   0.178s
ok      golang.org/x/net/trace  0.110s
ok      golang.org/x/net/webdav 0.206s
ok      golang.org/x/net/webdav/internal/xml    0.051s
ok      golang.org/x/net/websocket      0.090s
ok      golang.org/x/net/xsrftoken      0.029s

c:\dev\src\golang.org\x\net>

this is running on Windows 7.

Alex

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 7, 2016

You want to send a CL to t.Skip those if they get an "operation not supported" error?

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 7, 2016

I can reproduce this with gomote.

$ gomote create windows-amd64-gce
$ export MOTE=user-bradfitz-windows-amd64-gce-0
$ gomote push $MOTE
$ gomote run $MOTE go/src/make.bat
$ tar -C ~/src/ -zc golang.org/x/net | gomote puttar -dir=gopath/src $MOTE
$ gomote run -e 'GOPATH=c:/workdir/gopath' $MOTE go/bin/go test -v -run=. -count=500 golang.org/x/net/ipv4  | tee log.txt

See the attached log.txt, which is a mystery: it has no FAIL mentions, except the end.

And the there's no Exit call in the program, and there's no panic anywhere, and no log.Fatal I can see, no Main, .... I'm not sure where it's failing yet.

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 7, 2016

@mikioh,

Just took a look at testing package and I now feel like this is not an issue of ipv4 package. The interesting FAIL followed by FAIL golang.org/x/net/ipv4 0.059s lines mean that the running test process is killed by an observer; in this case a Windows kernel, though, I have no clue on the root cause.

Why do you think that?

Maybe there's something in the net or syscall or runtime package that's crashing? Except then we'd expect some sort of stack trace, I assume.

Is there something like strace for Windows, @alexbrainman? If so, probably not a CLI tool, huh?

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 7, 2016

Update:

If I add:

func TestMain(m *testing.M) {
        fmt.Println("starting test")
        flag.Parse()
        v := m.Run()
        fmt.Println("test Run complete =", v)
        os.Exit(v)
}

Then it fails like:

=== RUN   TestRawConnUnicastSocketOptions
--- SKIP: TestRawConnUnicastSocketOptions (0.00s)
        unicastsockopt_test.go:86: no access to raw socket allowed on windows
FAIL
test Run complete = 1
exit status 1
FAIL    golang.org/x/net/ipv4   4.642s
Error running run: exit status 1

So it's not just getting killed by the kernel. The Go test does indeed intend to fail itself. But why?

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 7, 2016

And verified that stderr isn't being lost:

        unicastsockopt_test.go:86: no access to raw socket allowed on windows
FAIL
stdout test Run complete = 1
stderr test Run complete = 1
exit status 1
FAIL    golang.org/x/net/ipv4   4.904s

@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 7, 2016

I suspected it was test failures in goroutines, so I audited all 4 in the ipv4 tests. All four looked fine. Modifying the testing package to log a stack on failure, I found:

TEST FAIL:  goroutine 18877 [running]:
testing.(*common).Fail(0xc042923dc0)
        C:/workdir/go/src/testing/testing.go:468 +0xec
testing.(*common).Fail(0xc0428b70a0)
        C:/workdir/go/src/testing/testing.go:458 +0x259
testing.(*common).Error(0xc0428b70a0, 0xc042787fa0, 0x1, 0x1)
        C:/workdir/go/src/testing/testing.go:534 +0x76
golang.org/x/net/ipv4_test.acceptor(0xc0428b70a0, 0x689de0, 0xc04204e1b0, 0xc0426a5440)
        c:/workdir/gopath/src/golang.org/x/net/ipv4/mocktransponder_test.go:17 +0xcd
created by golang.org/x/net/ipv4_test.TestConnUnicastSocketOptions
        c:/workdir/gopath/src/golang.org/x/net/ipv4/unicastsockopt_test.go:34 +0x1a6

But that test looks fine:

        done := make(chan bool)                                                        
        go acceptor(t, ln, done)                                                       
                                                                                       
        c, err := net.Dial("tcp4", ln.Addr().String())                                 
        if err != nil {                                                                
                t.Fatal(err)                                                           
        }                                                                              
        defer c.Close()                                                                
                                                                                       
        testUnicastSocketOptions(t, ipv4.NewConn(c))                                   
                                                                                       
        <-done                                                                         

Where acceptor (which is only used once, and is in a separate file all by itself) is just:

func acceptor(t *testing.T, ln net.Listener, done chan<- bool) {  
        defer func() { done <- true }()  
  
        c, err := ln.Accept()  
        if err != nil {  
                t.Error(err)  
                return  
        }  
        c.Close()  
}  

It's an unbuffered channel, so I'm not sure why that synchronization isn't sufficient to make the message log something.

@bradfitz bradfitz self-assigned this Dec 7, 2016
@bradfitz
Copy link
Contributor Author

@bradfitz bradfitz commented Dec 7, 2016

Well, https://go-review.googlesource.com/34021 fixes the flakes, but I'm not sure why.

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 7, 2016

CL https://golang.org/cl/34021 mentions this issue.

@alexbrainman
Copy link
Member

@alexbrainman alexbrainman commented Dec 7, 2016

Is there something like strace for Windows, @alexbrainman? If so, probably not a CLI tool, huh?

I am not sure what you want. I have used https://technet.microsoft.com/en-us/sysinternals/processmonitor.aspx in old days. Also mksyscall._windows.go has -trace flag.

Alex

@golang golang locked and limited conversation to collaborators Dec 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants