Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: Race in handler execution results in zero-byte data frame, causing incompatibility with gRPC #56317

Closed
LINKIWI opened this issue Oct 19, 2022 · 3 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@LINKIWI
Copy link

LINKIWI commented Oct 19, 2022

What version of Go are you using (go version)?

$ go version
go version go1.19.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE=""
GOENV=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH=""
GOPRIVATE=""
GOPROXY=""
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/tmp/bug/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3528461531=/tmp/go-build -gno-record-gcc-switches"

What did you do?

In a proprietary HTTP + gRPC reverse proxy, when issuing unary gRPC calls, I observe intermittent occurrences of the error:

rpc error: code = Internal desc = server closed the stream without sending trailers

The occurrence of this error is not deterministically reproducible, and affects only a small percentage of requests. Usually, a client retry of the RPC alleviates the problem.

See the investigation notes after the survey questions in this issue.

What did you expect to see?

I expect to see no occurrences of this error under regular operation.

What did you see instead?

I see this error affecting 1 - 5% of requests.


Context

I'm working with a proprietary HTTP reverse proxy with built-in support for gRPC over HTTP/2.

Example

The proxy is proprietary, but its core logic is demonstrated below.

package main

import (
	"crypto/tls"
	"net"
	"net/http"
	"net/http/httputil"

	"golang.org/x/net/http2"
	"golang.org/x/net/http2/h2c"
)

func main() {
	tr := &http2.Transport{
		AllowHTTP: true,
		DialTLS: func(network string, addr string, cfg *tls.Config) (net.Conn, error) {
			return net.Dial("tcp", "localhost:7001")
		},
	}

	rp := &httputil.ReverseProxy{
		Transport:     tr,
		FlushInterval: -1,
		Director: func(req *http.Request) {
			req.URL.Scheme = "http"
		},
	}

	handler := h2c.NewHandler(rp, &http2.Server{})

	srv := &http.Server{
		Addr:    "localhost:7000",
		Handler: handler,
	}

	srv.ListenAndServe()
}

Symptom

Clients issuing gRPC calls through the proxy that return gRPC application-level errors intermittently (non-determinstically) observe errors from the grpc-go library server closed the stream without sending trailers.

GODEBUG=http2debug=2 reveals that the issue manifests only when Go's http2.Server writes a HEADERS frame with flag END_HEADERS followed by a zero-byte DATA frame with flag END_STREAM.

The issue does not manifest (i.e. the application-level error is propagated correctly) when Go's http2.Server writes a HEADERS frame with flags END_HEADERS | END_STREAM.

Note that there are no trailers included in this message.

Example trace with no errors (RPC returns successfully)

2022/10/18 00:12:57 http2: Transport encoding header ":method" = "POST"
2022/10/18 00:12:57 http2: Transport encoding header ":path" = "/service/Method"
2022/10/18 00:12:57 http2: Transport encoding header ":scheme" = "http"
2022/10/18 00:12:57 http2: Transport encoding header "te" = "trailers"
2022/10/18 00:12:57 http2: Transport encoding header "grpc-timeout" = "998862u"
2022/10/18 00:12:57 http2: Transport encoding header "content-type" = "application/grpc"
2022/10/18 00:12:57 http2: Transport encoding header "user-agent" = "grpc-go/1.49.0"
2022/10/18 00:12:57 http2: Transport encoding header "x-forwarded-for" = "127.0.0.1"
2022/10/18 00:12:57 http2: Transport encoding header "accept-encoding" = "gzip"
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote HEADERS flags=END_HEADERS stream=7 len=17
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote DATA stream=7 len=12 data="\x00\x00\x00\x00\a\n\x05\b\x02\x12\x01a"
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote DATA flags=END_STREAM stream=7 len=0 data=""
2022/10/18 00:12:57 http2: Framer 0xc000366000: read WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:57 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:57 http2: Framer 0xc000366000: read PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:57 http2: Transport received PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote PING flags=ACK len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:57 http2: Framer 0xc000366000: read HEADERS flags=END_STREAM|END_HEADERS stream=7 len=4
2022/10/18 00:12:57 http2: decoded hpack field header field ":status" = "200"
2022/10/18 00:12:57 http2: decoded hpack field header field "content-type" = "application/grpc"
2022/10/18 00:12:57 http2: decoded hpack field header field "grpc-status" = "5"
2022/10/18 00:12:57 http2: decoded hpack field header field "grpc-message" = "open /tmp/a: no such file or directory"
2022/10/18 00:12:57 http2: Transport received HEADERS flags=END_STREAM|END_HEADERS stream=7 len=4

Example trace with error (internal error raised by grpc-go)

2022/10/18 00:12:55 http2: Transport encoding header ":method" = "POST"
2022/10/18 00:12:55 http2: Transport encoding header ":path" = "/service/Method"
2022/10/18 00:12:55 http2: Transport encoding header ":scheme" = "http"
2022/10/18 00:12:55 http2: Transport encoding header "grpc-timeout" = "998585u"
2022/10/18 00:12:55 http2: Transport encoding header "content-type" = "application/grpc"
2022/10/18 00:12:55 http2: Transport encoding header "user-agent" = "grpc-go/1.49.0"
2022/10/18 00:12:55 http2: Transport encoding header "te" = "trailers"
2022/10/18 00:12:55 http2: Transport encoding header "x-forwarded-for" = "127.0.0.1"
2022/10/18 00:12:55 http2: Transport encoding header "accept-encoding" = "gzip"
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote HEADERS flags=END_HEADERS stream=5 len=18
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote DATA stream=5 len=12 data="\x00\x00\x00\x00\a\n\x05\b\x02\x12\x01a"
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote DATA flags=END_STREAM stream=5 len=0 data=""
2022/10/18 00:12:55 http2: Framer 0xc000366000: read WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:55 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:55 http2: Framer 0xc000366000: read PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:55 http2: Transport received PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote PING flags=ACK len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:55 http2: Framer 0xc000366000: read HEADERS flags=END_STREAM|END_HEADERS stream=5 len=4
2022/10/18 00:12:55 http2: decoded hpack field header field ":status" = "200"
2022/10/18 00:12:55 http2: decoded hpack field header field "content-type" = "application/grpc"
2022/10/18 00:12:55 http2: decoded hpack field header field "grpc-status" = "5"
2022/10/18 00:12:55 http2: decoded hpack field header field "grpc-message" = "open /tmp/a: no such file or directory"
2022/10/18 00:12:55 http2: Transport received HEADERS flags=END_STREAM|END_HEADERS stream=5 len=4
2022/10/18 00:12:55 http2: server encoding header ":status" = "200"
2022/10/18 00:12:55 http2: server encoding header "content-type" = "application/grpc"
2022/10/18 00:12:55 http2: server encoding header "grpc-message" = "open /tmp/a: no such file or directory"
2022/10/18 00:12:55 http2: server encoding header "grpc-status" = "5"
2022/10/18 00:12:55 http2: server encoding header "date" = "Tue, 18 Oct 2022 07:12:55 GMT"
2022/10/18 00:12:55 http2: Framer 0xc0003900e0: wrote HEADERS flags=END_HEADERS stream=1 len=89
2022/10/18 00:12:55 http2: Framer 0xc0003900e0: wrote DATA flags=END_STREAM stream=1 len=0 data=""

RCA

I believe this is due to a race caused by concurrent http.Handler execution in http2/server.go.

In the case that handler execution completes before headers are written, rws.handlerDone is true and Go includes END_STREAM in the initial HEADERS frame. In the case that handler execution is still in-progress when the first write occurs, the HEADERS frame is written without END_STREAM, and a subsequent write sends a zero-byte data frame with END_STREAM, acting purely as a control message.

Ultimately this causes non-determinism where the specific scenario that unary gRPC methods that return errors quickly are disproportionately affected.

According to gRPC specification, END_STREAM should be included in the last HEADERS frame to indicate termination of the response. In grpc-go, encountering END_STREAM in a data frame is an explicit error case. However, HTTP/2 protocol specification itself doesn't prohibit this.

Proposal

A similar (identical?) issue was identified in nghttp2 (see: nghttp2/nghttp2#588). The submitted fix was to include END_STREAM in the HEADERS payload if the body is empty and no trailers exist. I'm not sure if a similar approach is feasible in http2.Server.

@dr2chase
Copy link
Contributor

@neild can you have a look at this?

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 19, 2022
@bcmills bcmills added this to the Backlog milestone Oct 20, 2022
@LINKIWI
Copy link
Author

LINKIWI commented Oct 22, 2022

I did a bit more investigation into this issue, and found that the original RCA hypothesis is not entirely correct. This bug seems to be a side effect of FlushInterval: -1 in httputil.ReverseProxy configuration, which causes it to immediately and explicitly flush buffered contents to the client during the copyResponse phase.

This results in the possibility that Flush() can be called before the handler has completed execution, which results in the outcome described in the original issue (HEADERS are written without END_STREAM, requiring a subsequent zero-byte DATA frame to flag END_STREAM).

The solution in my case is to simply omit FlushInterval from httputil.ReverseProxy or set its value to zero.

While the behavior is inconvenient, I think all components are WAI:

  • httputil.ReverseProxy is properly respecting FlushInterval.
  • http2.Server is sending frames to the client in accordance with HTTP/2 specification.
  • grpc-go correctly implements the gRPC-over-HTTP/2 protocol specification, despite the inconvenient directive that zero-byte DATA frames with END_STREAM are illegal.

@neild
Copy link
Contributor

neild commented Dec 14, 2022

To recap:

gRPC wants a response to either consist of either a HEADERS frame, DATA frames, and a final HEADERS frame with END_STREAM set; or just a HEADERS frame with END_STREAM set. (It calls this second case "trailers only", but from an HTTP/2 perspective this is headers only. There are no trailers.)

The Go HTTP/2 server will send initial HEADERS with no END_STREAM set if the headers are flushed before data is written.

When httputil.ReverseProxy.FlushInterval is set to -1, it sets a negative-duration timer that will flush the headers. When proxying a zero-length response, if this timer goes off before the ReverseProxy handler returns, the headers are flushed and the response is then terminated with a zero-length DATA frame with END_STREAM set. gRPC considers this a protocol violation.

The fix is to not set FlushInterval.

As @LINKIWI concludes, while this is an unfortunate interaction, everything is working properly: Flushing headers before the handler returns needs to send HEADERS with no END_STREAM (because the server doesn't yet know if data will be written to the stream), and setting a negative ReverseProxy.FlushInterval is supposed to flush the response immediately.

Perhaps ReverseProxy could detect a zero-length response and skip copying (and flushing) the response.

Otherwise, I don't think there's anything that needs doing here.

@neild neild closed this as completed Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants