Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestClientTimeout* failures with "missing timeout substring" on windows/arm* #43120

Open
bcmills opened this issue Dec 10, 2020 · 13 comments
Labels
arch-arm arch-arm64 NeedsInvestigation OS-Windows
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 10, 2020

2020-12-09T01:34:53-ae9b442/windows-arm-zx2c4
2020-12-07T17:19:41-7f9a2bc/windows-arm-zx2c4
2020-12-03T02:35:36-da54dfb/windows-arm-zx2c4
2020-12-02T20:17:57-2d0258d/windows-arm-zx2c4

--- FAIL: TestClientTimeout_h1 (0.20s)
    client_test.go:1277: error string = "context deadline exceeded"; missing timeout substring
--- FAIL: TestClientTimeout_h2 (0.20s)
    client_test.go:1277: error string = "context deadline exceeded"; missing timeout substring
FAIL
FAIL	net/http	43.328s

CC @zx2c4 @bradfitz @neild @empijei

@bcmills bcmills added OS-Windows NeedsInvestigation labels Dec 10, 2020
@bcmills bcmills added this to the Backlog milestone Dec 10, 2020
@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 10, 2020

We increased the timeout here: golang/build@b6220de

But I don't know if that change has been deployed yet. @dmitshur - do you know?

@zx2c4
Copy link
Contributor

@zx2c4 zx2c4 commented Dec 10, 2020

Oh, the latest log shows "GO_TEST_TIMEOUT_SCALE=3", so looks like it has been deployed.

So I'm not quite sure...

@dmitshur
Copy link
Contributor

@dmitshur dmitshur commented Dec 10, 2020

Yes, it has. You can find out the deployed version by visiting https://farmer.golang.org, it tells the version at the top:

[...] Uptime 10h21m44s. Version 90d5e7c418c099f6d88cfc0e6269a94fa4da7f0e.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Aug 31, 2021

Looks like it has started occurring on other builders too.

greplogs --dashboard -md -l -E 'missing timeout substring' --since=2020-12-10

2021-08-30T22:07:43-437362c/openbsd-amd64-68
2021-06-30T16:44:19-0fa3265/windows-arm64-10
2021-06-25T21:04:07-5160896/windows-arm64-aws
2021-06-18T16:56:48-57aaa19/windows-arm-zx2c4
2021-06-16T20:38:07-0e67ce3/windows-arm64-aws
2021-06-15T18:42:11-219fe9d/windows-arm64-aws
2021-06-11T16:09:15-e2dc6dd/windows-arm-zx2c4
2021-05-10T15:47:52-a9edda3/openbsd-amd64-68
2021-05-04T23:35:34-137be77/windows-arm-zx2c4
2021-04-30T19:59:10-faff49a/windows-arm-zx2c4
2021-04-13T21:13:12-c19759a/windows-arm-zx2c4
2021-04-12T18:08:47-849dba0/openbsd-amd64-68
2021-04-09T14:33:44-a690a5d/windows-arm-zx2c4
2021-04-07T13:24:10-4520da4/windows-arm-zx2c4
2021-04-07T02:05:55-8d77e45/windows-arm-zx2c4
2021-04-06T22:55:58-b084073/windows-arm-zx2c4
2021-04-01T00:51:23-3304b22/windows-arm-zx2c4
2021-03-30T16:13:36-89b141c/windows-arm-zx2c4
2021-03-29T08:11:13-565e70f/windows-arm-zx2c4
2021-03-29T06:21:31-d10241f/windows-arm-zx2c4
2021-03-22T20:20:58-d9691ff/windows-arm-zx2c4
2021-03-10T19:33:23-ccf9ace/windows-arm-zx2c4
2021-03-08T20:03:01-bd37284/windows-arm-zx2c4
2021-03-05T23:32:34-7205a4f/windows-arm-zx2c4
2021-03-04T23:37:01-96a96a9/windows-arm-zx2c4
2021-03-04T21:47:26-9d3718e/windows-arm-zx2c4
2021-02-24T18:49:08-b7f62da/windows-arm-zx2c4
2021-02-24T12:37:13-26001d1/windows-arm-zx2c4
2021-02-23T18:39:53-c584f42/windows-arm-zx2c4
2021-02-05T22:35:11-b54cd94/windows-arm-zx2c4
2021-01-06T15:02:50-d213170/windows-arm-zx2c4
2020-12-30T22:39:48-0ae2e03/windows-arm-zx2c4
2020-12-16T09:01:04-75e16f5/windows-arm-zx2c4

CC @neild

@bcmills bcmills changed the title net/http: TestClientTimeout* failures with "missing timeout substring" on windows-arm-zx2c4 net/http: TestClientTimeout* failures with "missing timeout substring" Aug 31, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 4, 2021

Curiously, this has still only ever occurred on windows-arm variants and openbsd-amd64-68. I don't know whether we can really discount the two OpenBSD failures, but the apparent specificity to windows-arm is puzzling. (Maybe this is a memory-model synchronization issue, like #48072?)

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2021-10-01

2021-11-03T18:37:22-cfd016d/windows-arm64-10
2021-10-22T21:26:30-5d414d1/windows-arm64-10
2021-10-14T18:43:59-765c911/windows-arm64-10
2021-10-14T04:18:44-9e4dc6f/windows-arm64-10
2021-10-05T15:47:57-017ffcd/windows-arm64-10

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 4, 2021

Change https://golang.org/cl/361275 mentions this issue: net/http: reduce TestClientTimeout_h{1,2} latency

gopherbot pushed a commit that referenced this issue Nov 4, 2021
The test had been setting an arbitrary 200ms timeout to allow the
server's handler to set up before timing out. That is not only
potentially flaky on slow machines, but also typically much longer
than necessary. Replace the hard-coded timeout with a much shorter
initial timeout, and use exponential backoff to lengthen it if needed.

This allows the test to be run about 20x faster in the typical case,
which may make it easier to reproduce rare failure modes by running
with a higher -count flag.

For #43120

Change-Id: I1e0d0ec99d5a107fff56e3bcc7174d686ec582d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/361275
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
@bcmills
Copy link
Member Author

@bcmills bcmills commented Jan 5, 2022

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2021-11-05

2021-12-29T19:23:11-8a306e2/windows-arm64-10
2021-11-12T23:34:09-fdee1b2/windows-arm64-10

@bcmills
Copy link
Member Author

@bcmills bcmills commented Jan 5, 2022

windows/arm64 is not currently listed as a first class port, so — given the consistency of the failing platform — I think a skip may be in order to reduce windows/arm64 builder flakiness.

I honestly don't know what to make of the couple of openbsd-amd64-68 failures, though.

(CC @bufflig)

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 5, 2022

Change https://golang.org/cl/375635 mentions this issue: net/http: skip TestClientTimeout_h{1,2} on windows/arm and windows/arm64

@bcmills
Copy link
Member Author

@bcmills bcmills commented Jan 5, 2022

Marking as release-blocker until the skip is added, at which point this can be moved back to the backlog.

@bcmills bcmills removed this from the Backlog milestone Jan 5, 2022
@bcmills bcmills added this to the Go1.18 milestone Jan 5, 2022
@bcmills bcmills removed this from the Go1.18 milestone Jan 5, 2022
@bcmills bcmills added this to the Backlog milestone Jan 5, 2022
gopherbot pushed a commit that referenced this issue Jan 5, 2022
These tests are empirically flaky on the windows/arm and windows/arm64
builders, with a consistent (but rare) failure mode.

This change skips the test if that particular failure mode is
encountered on those platforms; the skip can be removed if and when
someone has the time to pin down the root cause.

For #43120

Change-Id: Ie3a9a06bf47e3a907c7b07441acc1494a4631135
Reviewed-on: https://go-review.googlesource.com/c/go/+/375635
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@bcmills
Copy link
Member Author

@bcmills bcmills commented Jan 5, 2022

The skip is merged. Back to the Backlog. 😩

@bcmills
Copy link
Member Author

@bcmills bcmills commented Jan 18, 2022

Apparently TestClientTimeout_Headers_h[12] are also affected by the same underlying bug:

--- FAIL: TestClientTimeout_Headers_h2 (0.01s)
    client_test.go:1342: error string = "Get \"https://127.0.0.1:53626\": context deadline exceeded"; missing timeout substring
--- FAIL: TestClientTimeout_Headers_h1 (0.01s)
    client_test.go:1342: error string = "Get \"http://127.0.0.1:53627\": context deadline exceeded"; missing timeout substring
2022/01/17 01:29:00 http: TLS handshake error from 127.0.0.1:53639: read tcp 127.0.0.1:53638->127.0.0.1:53639: wsarecv: An existing connection was forcibly closed by the remote host.
FAIL
FAIL	net/http	6.571s

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2022-01-06

2022-01-17T09:23:25-897b3da/windows-arm64-10

@bcmills bcmills removed this from the Backlog milestone Jan 18, 2022
@bcmills bcmills added this to the Go1.18 milestone Jan 18, 2022
@gopherbot
Copy link

@gopherbot gopherbot commented Jan 18, 2022

Change https://golang.org/cl/379156 mentions this issue: net/http: skip TestClientTimeout_Headers_h{1,2} on windows/arm and windows/arm64

@bcmills bcmills changed the title net/http: TestClientTimeout* failures with "missing timeout substring" net/http: TestClientTimeout* failures with "missing timeout substring" on windows/arm* Jan 18, 2022
@bcmills bcmills added arch-arm arch-arm64 OS-Windows and removed release-blocker labels Jan 18, 2022
@bcmills bcmills removed this from the Go1.18 milestone Jan 18, 2022
@bcmills bcmills added this to the Backlog milestone Jan 18, 2022
gopherbot pushed a commit that referenced this issue Jan 18, 2022
…ndows/arm64

This extends the skip added in CL 375635 to the "_Headers" variant of
the test, since we have observed similar failures in that variant on
the builders.

For #43120

Change-Id: Ib1c97fbb776b576271629272f3194da77913a941
Reviewed-on: https://go-review.googlesource.com/c/go/+/379156
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
These tests are empirically flaky on the windows/arm and windows/arm64
builders, with a consistent (but rare) failure mode.

This change skips the test if that particular failure mode is
encountered on those platforms; the skip can be removed if and when
someone has the time to pin down the root cause.

For golang#43120

Change-Id: Ie3a9a06bf47e3a907c7b07441acc1494a4631135
Reviewed-on: https://go-review.googlesource.com/c/go/+/375635
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
…ndows/arm64

This extends the skip added in CL 375635 to the "_Headers" variant of
the test, since we have observed similar failures in that variant on
the builders.

For golang#43120

Change-Id: Ib1c97fbb776b576271629272f3194da77913a941
Reviewed-on: https://go-review.googlesource.com/c/go/+/379156
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm arch-arm64 NeedsInvestigation OS-Windows
Projects
None yet
Development

No branches or pull requests

4 participants