net: TestNotTemporaryRead timeouts on aix/ppc64 #29685
Comments
The names used are confusing but I can't see that it matters in this case.
Go expects that calling the
As opposed to what? The Go code will work fine if The stack traces from the trybots suggest that the |
That's the problem. On AIX,
There is nothing similar on AIX. Therefore, once a socket is opened with Using the C program above, I've the following trace on AIX:
The last
If you look at TestDialContextCancelRace, the connect function explicitly returns EINPROGRESS if not error is returned. It's the case on AIX. I don't know if it's possible to change how Anyway, this difference only impacts TestNotTemporaryRead at the moment. Moreover, this bug only happens when a machine is very busy. Increasing the sleep duration and adding another sleep in the |
On GNU/Linux |
Change https://golang.org/cl/158038 mentions this issue: |
Well, I've submitted a CL to fix this test. I'll see later if I can provide a more general fix on that issue. |
On aix/ppc64, if the server closes before the client calls Accept, this test will fail. Increasing the time before the server closes should resolve this timeout. Updates #29685 Change-Id: Iebb849d694fc9c37cf216ce1f0b8741249b98016 Reviewed-on: https://go-review.googlesource.com/c/158038 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Another recent timeout in this test, again on the |
Yes, it seems the new sleep time is still not enough... This test isn't fully compatible with AIX behavior which is slightly different if an |
Yep, see if runtime.GOOS == "aix" {
testenv.SkipFlaky(t, 29685)
} |
Change https://golang.org/cl/185717 mentions this issue: |
This test sometimes times out when the machine is busy. The reason behind is still a bit blurry. But it seems to comes from the fact that on AIX, once a listen is performed a socket, every connection will be accepted even before an accept is made (which only occurs when a machine is busy). On Linux, a socket is created as a "passive socket" which seems to wait for the accept before allowing incoming connections. Updates #29685 Change-Id: I41b053b7d5f5b4420b72d6a217be72e41220d769 Reviewed-on: https://go-review.googlesource.com/c/go/+/185717 Run-TryBot: Clément Chigot <clement.chigot@atos.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Hi,
I'm trying to resolve timeouts occurring on aix/ppc64 with
net.TestNotTemporaryRead
.https://build.golang.org/log/45540cc03c1d37057e8f725d7f2dd431652ddf4c
https://build.golang.org/log/37d60c3b3cd46cf39d118f84e695049d390da40e
...
This timeout occurs because
Accept()
seems to be stuck in a infinite loop if the server is already closed. It's only a guess because I can't trigger the bug manually on my local machine. However, a similar behaviorr can be easily made with: https://play.golang.org/p/0IXrHf87i-2.It does work on
linux/amd64
but it times out onaix/ppc64
. This might not be the root of this bug but a possible workaround can be to increase the delay on the server.However, I've several questions:
Accept()
and the server doingDial()
? Is it supposed to be the opposite or it doesn't matter ? This is the case for some others tests of net_test.go.accept()
is made when the server is already closed (but the port is still listened) ?Should it succeed or not ? On aix/ppc64, accept syscall returns EAGAIN (because of O_NONBLOCK flag) and on linux/amd64 it does succeed.
I've also discovered that the behavior of
connect
is slightly different on AIX than on Linux (I don't know about others OSes). I've tried with the following C code (taken from #6828): accept_after_connect.c.txt. The firstconnect
doesn't return EINPROGRESS as on Linux. It doesn't seem a bug as a connection can result from thelisten
syscall.Does Go want EINPROGRESS to be returned ? (*netFD).connect will wait with netpoll if it is.
The text was updated successfully, but these errors were encountered: