Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: frequent HTTP2 INTERNAL_ERROR errors during module zip download since 2021-10-06 #51323

Open
bcmills opened this issue Feb 22, 2022 · 17 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Feb 22, 2022

#!watchflakes
post <- `golang\.org/[^@]+@v\d+\.\d+\.\d+[a-z0-9-.]*: stream error: stream ID \d+; INTERNAL_ERROR; received from peer`

The builders are seeing a lot of INTERNAL_ERROR results since 2021-10-06, typically for google.golang.org/protobuf but sometimes for other golang.org paths as well.

This may be related to #50541, which saw an uptick in errors from go.googlesource.com in about the same timeframe.

CC @golang/release

../../../../pkg/mod/cloud.google.com/go/datastore@v1.2.0/client.go:25:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/internal/trace/trace.go:23:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/google.golang.org/grpc@v1.39.0/status/status.go:34:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go/datastore@v1.2.0/save.go:26:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/cloudbuild/apiv1/v2/cloud_build_client.go:33:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/longrunning/autogen/operations_client.go:31:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go@v0.88.0/iam/iam.go:30:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer
../../../../pkg/mod/cloud.google.com/go/storage@v1.10.0/iam.go:24:2: google.golang.org/genproto@v0.0.0-20210726143408-b02e89920bf0: stream error: stream ID 3; INTERNAL_ERROR; received from peer

greplogs --dashboard -md -l -e 'golang\.org/[^@]+@v\d+\.\d+\.\d+[a-z0-9-.]*: stream error: stream ID \d+; INTERNAL_ERROR; received from peer' --since=2021-01-01

2022-02-19T16:23:54-3962a08-0261fa6/dragonfly-amd64
2022-02-10T22:36:44-59536be-0a6cf87/linux-amd64-wsl
2022-02-09T18:09:24-9862752-6a70ee2/darwin-arm64-11_0-toothrot
2022-02-09T16:29:28-dad3315-0a6cf87/darwin-arm64-12_0-toothrot
2022-02-09T16:29:28-095f870-0a6cf87/darwin-arm64-12_0-toothrot
2022-02-08T17:01:10-9b156ee-ef06a5f/plan9-amd64-0intro
2022-02-08T16:42:58-ac99473-e2277c8/darwin-arm64-12_0-toothrot
2022-02-03T19:55:40-cd36cc0-896df42/plan9-amd64-0intro
2022-01-25T22:56:45-c20fd7c-6eb58cd/plan9-amd64-0intro
2022-01-24T21:27:33-6944b10-671e115/plan9-amd64-0intro
2022-01-24T21:27:33-5e0467b-671e115/plan9-amd64-0intro
2022-01-14T16:41:18-4bd3f69-b41185c/plan9-386-0intro
2022-01-07T02:32:39-e83268e-2bb7f6b/linux-amd64-wsl
2022-01-05T22:15:55-43762dc-a845a56/darwin-arm64-12_0-toothrot
2022-01-03T23:45:12-be6af36-95b240b/plan9-amd64-0intro
2022-01-02T14:27:43-6944b10-c886143/plan9-amd64-0intro
2021-12-15T20:26:19-be6af36-07ed86c/darwin-arm64-12_0-toothrot
2021-12-14T18:18:45-ecdc095-becaeea/plan9-arm
2021-12-14T01:48:22-6944b10-1afa432/plan9-amd64-0intro
2021-11-10T19:35:55-03971e3-b954f58/darwin-arm64-11_0-toothrot
2021-10-28T14:25:03-103d89b-a3bb28e/darwin-amd64-10_15
2021-10-26T11:58:05-d418f37-1e2820a/openbsd-arm64-jsing
2021-10-14T15:16:55-a66eb64-ad99d88/linux-amd64-wsl
2021-10-14T01:51:22-e13a265-276fb27/windows-arm64-10
2021-10-13T00:11:47-281050f-4fb2e1c/darwin-amd64-11_0
2021-10-11T20:46:14-089bfa5-7023535/darwin-amd64-11_0
2021-10-07T20:37:43-86761ae-ef2ebbe/darwin-arm64-11_0-toothrot
2021-10-07T20:37:43-59d4e92-ef2ebbe/darwin-arm64-11_0-toothrot
2021-10-06T18:44:56-40a54f1-195945a/darwin-amd64-nocgo

@gopherbot gopherbot added this to the Unreleased milestone Feb 22, 2022
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 22, 2022
@dmitshur
Copy link
Contributor

From logs, it's also happened for github.com dependencies such as github.com/yuin/goldmark in 2022-02-08T16:42:58-ac99473-e2277c8/darwin-arm64-12_0-toothrot.

Downloading happens with GOPROXY=https://proxy.golang.org set, so perhaps the go.googlesource.com and golang.org servers aren't relevant (they just happen to be where most of dependencies are coming from).

I've learned that INTERNAL_ERROR refers to an HTTP/2 error code that is described as "Implementation fault", but it's not clear to me if the server or client is at fault. CC @neild.

@dmitshur dmitshur changed the title x/website: frequent HTTP2 INTERNAL_ERROR errors from *.golang.org since 2021-10-06 net/http, x/net/http2, proxy.golang.org: frequent HTTP2 INTERNAL_ERROR errors during module zip download since 2021-10-06 Feb 22, 2022
@olix0r
Copy link

olix0r commented Mar 21, 2022

Recently, we've encountered this fairly frequently (a few times per day) in GitHub Actions, usually for github.com URLs. For example:

#11 39.00 go mod download: github.com/coreos/etcd@v3.3.13+incompatible: stream error: stream ID 4057; INTERNAL_ERROR; received from peer

@heschi
Copy link
Contributor

heschi commented Mar 21, 2022

The best explanation I have for this is some kind of infrequent incompatibility between the Go HTTP libraries and the Google front end servers.

@bcmills
Copy link
Member Author

bcmills commented Mar 30, 2022

It occurs to me that this could be related to #49645 (CC @neild), although the timeline doesn't quite match up because the fix for that landed Nov. 17.

greplogs --dashboard -md -l -e 'stream error: stream ID \d+; INTERNAL_ERROR; received from peer' --since=2022-02-23 --omit=plan9

2022-03-29T15:43:06-e96d8cf-ae9ce82/windows-amd64-longtest
2022-03-23T17:12:00-87a8611-dd211cf/freebsd-arm64-dmgk

@mintunitish
Copy link

Is there any update on this issue! Experiencing too frequently

@bcmills
Copy link
Member Author

bcmills commented May 19, 2022

This could possibly be a duplicate of #36759 (reported as a regression Jan. 2020).

@bcmills
Copy link
Member Author

bcmills commented May 23, 2022

Assigning to @neild (net/http owner) on the theory that this may be reproducing a previously-reported net/http client bug (#36759).

@bcmills bcmills changed the title net/http, x/net/http2, proxy.golang.org: frequent HTTP2 INTERNAL_ERROR errors during module zip download since 2021-10-06 net/http: frequent HTTP2 INTERNAL_ERROR errors during module zip download since 2021-10-06 May 23, 2022
@bcmills bcmills modified the milestones: Unreleased, Go1.19 May 23, 2022
@neild
Copy link
Contributor

neild commented Jun 9, 2022

stream error: stream ID N; INTERNAL_ERROR; received from peer means an HTTP/2 stream (a request) was closed by the remote endpoint with the error code INTERNAL_ERROR.

There are various other error codes to tell the client that it is misbehaving, such as PROTOCOL_ERROR. INTERNAL_ERROR is supposed to indicate an internal error on the remote side.

So whatever is doing the module zip download is claiming that it's unhealthy. It's not inconceivable that this is somehow the result of a net/http client bug, but if it is the remote side is doing a very bad job of telling us about it.

Maybe if we can reproduce the failure with GODEBUG=http2debug=2 enabled we might see something that helps indicate the source of the problem, but at the moment my suspicion is that the problem lies on the remote end of the connection.

@neild
Copy link
Contributor

neild commented Jun 9, 2022

googleapis/google-cloud-go#784 seems like the same issue. From 2017, eventual "fix" was to notice INTERNAL_ERROR and retry the read.

@heschi
Copy link
Contributor

heschi commented Jun 9, 2022

The other end of the connection here is the GFEs; should we try to escalate to them internally?

@neild
Copy link
Contributor

neild commented Jun 9, 2022

Yeah, I think so.

The possibilities I see are:

  • The net/http client is conjuring a peer-originated error out of thin air. This seems incredibly unlikely. I can find only one code path that adds "received from peer" and I don't see any way for it to lie.
  • The net/http client is misbehaving (very easy to believe) and the GFE is somehow responding with INTERNAL_ERROR. That's still a GFE bug, even if it's our fault for tickling it.
  • The net/http client is doing everything right, and the problem is entirely at or beyond the GFE.

At the same time, perhaps we should have a retry loop on module downloads. Connections break for various other reasons; doesn't seem like it would hurt to retry when they do.

@Tomperez98
Copy link

Any idea how to solve this? Getting the following error when running go install golang.org/x/tools/gopls@latest.

/hpath/to/go/pkg/mod/golang.org/x/tools@v0.1.11-0.20220513164230-dfee1649af67/internal/lsp/source/hover.go:23:2: golang.org/x/text@v0.3.7: stream error: stream ID 9; INTERNAL_ERROR; received from peer

@ianlancetaylor
Copy link
Contributor

@neild What is the current status here? This issue is currently in the 1.19 milestone. Should it move to 1.20? To Backlog? Thanks.

@neild
Copy link
Contributor

neild commented Jun 24, 2022

Our current belief is that the root cause is the GFE sending an INTERNAL_ERROR, probably due to the backend behind the GFE breaking the response stream.

This is not an net/http bug; net/http seems to be accurately reporting the error code sent by the GFE. (The error message could perhaps be clearer, but that's a secondary issue.)

Whatever tools are using net/http to fetch large files should be more robust in the handling of stream errors. There are many reasons a large fetch might break mid-stream, and it's unlikely we can entirely address this situation entirely by reducing the number of broken streams. This means we should add some form of retry to module zip downloads.

I think it's okay to move this to the 1.20 milestone.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.19, Go1.20 Jun 24, 2022
@neild neild removed their assignment Jul 18, 2022
@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
post <- `golang\.org/[^@]+@v\d+\.\d+\.\d+[a-z0-9-.]*: stream error: stream ID \d+; INTERNAL_ERROR; received from peer`
2022-09-23 16:47 linux-ppc64le-power9osu website@465c4925 go@2b9596cb (log)
internal/tour/fmt.go:16:2: golang.org/x/tools@v0.1.5: stream error: stream ID 2431; INTERNAL_ERROR; received from peer
internal/tour/local.go:22:2: golang.org/x/tools@v0.1.5: stream error: stream ID 2431; INTERNAL_ERROR; received from peer
internal/web/tmpl.go:15:2: golang.org/x/tools@v0.1.5: stream error: stream ID 2431; INTERNAL_ERROR; received from peer

watchflakes

@gopherbot gopherbot modified the milestones: Go1.20, Go1.21 Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Status: No status
Development

No branches or pull requests

9 participants