-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "server closed the stream without sending trailers" error #52
Fix "server closed the stream without sending trailers" error #52
Conversation
9319a0f
to
2814519
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome find! My guess is that there's a race condition somewhere within the ReverseProxy/Transport code making this reproducible only sometimes. We should probably report this upstream.
This could be somewhat related to golang/go#34413. Maybe the backend closes the stream after sending an error. I'm not sure if I should invest more time into a repro scenario. Would be cool to fix something in golang though :) |
Lint fails:
|
@@ -329,11 +331,42 @@ func sendDirectResponse(w http.ResponseWriter, r *http.Request, | |||
// so we can use that. | |||
switch { | |||
case strings.HasPrefix(r.Header.Get(hdrContentType), hdrTypeGrpc): | |||
w.Header().Set("Grpc-Status", strconv.Itoa(int(codes.Internal))) | |||
w.Header().Set("Grpc-Message", errInfo) | |||
w.Header().Set(hdrGrpcStatus, strconv.Itoa(int(codes.Internal))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any link handy to allow me to better understand whcih headers we're setting here and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a very good doc, but that's the best I could find: https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md
The Grpc-Status
is the field that holds the numerical value for the gRPC specific states (e.g. OK
, INTERNAL
, UNKNOWN
, UNIMPLEMENTED
, UNAVAILABLE
and so on). The Grpc-Message
field contains the actual error message in case the gRPC status isn't OK
(0). Those two fields should be sent as a HTTP/2 trailer field and not header field. That's what we tell the golang http2
library by setting the Trailer
field with the name of those two fields.
The Content-Length
and :status
are probably not necessary as they would be set anyway by the proxy. Maybe I should remove them as they are from trying out stuff.
|
||
resp, err := l.next.RoundTrip(req) | ||
if resp != nil && len(resp.Trailer) == 0 { | ||
if len(resp.Header.Values(hdrGrpcStatus)) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a comment stating that the OK
code is zero, so anything beyond that is actually considered an error: https://pkg.go.dev/google.golang.org/grpc/codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll rewrite this to make it a bit more clear. We don't look at the actual value. This just checks "if there is no Trailer
(as in HTTP/2 trailer data field) set but we have a Grpc-Status
Header
field, then, per gRPC convention, we need to set the same value in the Trailer
part of the message". This is the actual fix as the client complains that there are no trailers set when the proxy closes the connection.
@@ -195,6 +213,17 @@ func TestProxyGRPC(t *testing.T) { | |||
runGRPCTest(t, tc) | |||
}) | |||
} | |||
|
|||
for i := 0; i < 20; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the goal w/ this added iteration of the test case? Given that the iteration index (or no other new information) is actually propagated into the test context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is only hit every 3 or 4 tries. That's why suspect a race condition in the underlying code. Without the fix we'd never get 20 successful runs, so this just "brute forces" the test to make sure it doesn't flake anymore.
if !haveNPN { | ||
tlsConf.NextProtos = append(tlsConf.NextProtos, "h2") | ||
} | ||
tlsConf.NextProtos = append(tlsConf.NextProtos, "h2-14") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this serve to resolve those issues we observed when we attempted to expose Aperture directly over the load balancer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe, yes. I think we should look into that again.
rpc SayHello (HelloRequest) returns (HelloReply) {} | ||
rpc SayHelloNoAuth (HelloRequest) returns (HelloReply) {} | ||
rpc SayHello (HelloRequest) returns (HelloReply); | ||
rpc SayHelloNoAuth (HelloRequest) returns (HelloReply); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, new proto syntax change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. You can use the {}
notation if you add additional metadata (for example the REST options) or use a semicolon directly if there is no "body".
Fixed the linter by bumping the golang versions. |
Argh, now there's a data race in the built in library... EDIT: All fixed now. |
636b55d
to
e583f54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🌛
This PR adds a test to reproduce the elusive
server closed the stream without sending trailers
error that we first discovered through user feedback in lightninglabs/pool#225.Apparently this is a bug in the
net/http/httputil
package itself, so we can not really fix it but add a workaround.The tl;dr is that sometimes, the HTTP/2 trailer fields aren't sent back to the client correctly when an error occurs on the proxied connection. We add our own HTTP transport implementation that fixes the problem.