New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build,cmd/go: frequent connection timeouts to github.com since 2022-03-04 #52545
Comments
Not that I'm aware of. |
As with the other network-related failures, this failure mode is much more common on the
|
|
|
|
This is causing a huge amount of noise in the At this failure rate, I suspect that if this issue affected end users we'd be hearing about it by now. So my hypothesis is that this is some kind of GCE networking issue that affects the builder hosts. Marking as release-blocker for Go 1.19. We need to figure out what's happening with the network and either fix it (if it's a GCE or builder problem) or figure out a workaround (if it's a GitHub problem). (Note that most of these failures come from the |
When these timeouts occur, the tests are consistently at 120s or longer of running time. |
Here's a theory, inspired by https://askubuntu.com/questions/218728/connection-timeout-when-accessing-github. Maybe this timeout is some kind of rate-limiting countermeasure on the GitHub side, spurred by bursts of traffic from the Go builders running more SlowBots and post-commit CLs. Maybe the 5-month gap from October 2021 to March 2022 is an artifact of the Go development cycle: the Go release freeze started 2021-11-01 and ended 2022-02-28. The last failure before the gap was a couple of weeks before the freeze began, and the first failure after the gap was about a week after it ended. During the freeze, the CL rate is naturally much lower, which might keep the builders below whatever threshold GitHub is using for rate-limiting. |
Hi @theojulienne, do you know who might be able to cast an eye on this from the GitHub side? |
I filed a GitHub support ticket (#1616480), but their conclusion was that it's probably a network issue on the GCE side. Unfortunately, the GCE network architecture is pretty opaque — I think we need to escalate this on the GCE side to at least get enough routing information to identify where along the route the connection is being dropped. |
Change https://go.dev/cl/405714 mentions this issue: |
Go tests don't include timestamps by default, but we would like to have them in order to correlate builder failures with server and network logs. Since many of the Go tests with external network and service dependencies are script tests for the 'go' command, logging timestamps here adds a lot of logging value with one simple and very low-risk change. For #50541. For #52490. For #52545. For #52851. Change-Id: If3fa86deb4a216ec6a1abc4e6f4ee9b05030a729 Reviewed-on: https://go-review.googlesource.com/c/go/+/405714 Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Bryan Mills <bcmills@google.com>
The failures starting 2022-05-11 have timestamps, so we (at least theoretically) now have something we can correlate with network and server logs. |
greplogs --dashboard -md -l -e 'github.com.* Connection timed out' --since=2021-01-01
2022-04-22T19:02:29-808d40d/linux-386-longtest🤔 ]
2022-04-22T15:44:57-1899472/linux-386-longtest
2022-04-22T04:42:23-1e59876/linux-386-longtest
2022-04-22T00:25:08-c9031a4/linux-amd64-longtest
2022-04-22T00:05:54-c510cd9/linux-386-longtest
2022-04-22T00:05:54-c510cd9/linux-amd64-longtest
2022-04-19T15:07:49-caa4631/linux-amd64-longtest
2022-04-15T19:02:54-df08c9a/linux-386-longtest
2022-04-14T18:00:13-dd97871/linux-386-longtest
2022-04-14T18:00:13-dd97871/linux-amd64-longtest
2022-04-13T17:48:12-517781b/linux-386-longtest
2022-04-13T01:15:22-b55a2fb/linux-386-longtest
2022-04-12T22:35:20-fc1d4c1/linux-amd64-longtest
2022-04-12T22:32:01-6183920/linux-386-longtest
2022-04-12T05:46:57-2b31abc/linux-amd64-longtest
2022-04-11T16:31:45-e4e033a/linux-386-longtest
2022-04-11T16:31:43-036b615/linux-amd64-longtest
2022-04-11T16:31:33-494b79f/linux-amd64-longtest
2022-04-08T18:30:53-3a19102/linux-386-longtest
2022-03-15T17:04:57-9b112ce/linux-amd64-longtest
2022-03-07T13:47:51-0e2f1ab/linux-386-longtest
2022-03-04T20:02:41-2b8aa2b/linux-386-longtest
[Note the 5-month gap here!
2021-10-08T14:08:12-5b9206f/linux-386-longtest
2021-10-08T14:08:12-5b9206f/linux-amd64-longtest
2021-09-07T21:39:06-dcf3545/linux-386-longtest
2021-09-07T20:37:05-8078355/linux-386-longtest
2021-09-07T20:27:30-23f4f0d/linux-386-longtest
2021-09-07T19:39:04-d92101f/linux-386-longtest
2021-06-22T02:44:43-197a5ee/linux-amd64-longtest
2021-04-02T05:24:14-aebc0b4/linux-amd64-longtest
2021-03-18T14:43:33-e726e2a/linux-amd64-longtest
2021-03-17T17:50:50-8628bf9/linux-amd64-longtest
2021-03-17T17:13:50-0bd308f/linux-amd64-longtest
2021-03-17T16:53:00-70d54df/linux-386-longtest
2021-03-17T16:53:00-70d54df/linux-amd64-longtest
2021-03-17T16:19:21-2f3db22/linux-386-longtest
2021-03-05T18:46:36-51d8d35/linux-amd64-longtest
@golang/release: were there changes to the
linux-.*-longtest
image or network configuration around 2022-03-04 that might explain the increase in timeouts at that point?The text was updated successfully, but these errors were encountered: