-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/http2: server sends RST_STREAM w/ PROTOCOL_ERROR to clients it incorrectly believes have violated max advertised num streams #42777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @fraenkel |
I agree with your explanation however it is not guaranteed that the above will fix most of the racing that occurs. The problem is that between writeFrame and sending the frameWriteResult, one can receive the readFrame. BTW, this the reverse of what the client does between the bodyWriter and the readLoop. A testcase would go a long way. |
@fraenkel yeah that makes sense - I was thinking along those lines too. Weird thing was that when I made a hacky fix that used a mutex to lock, I still saw some PROTOCOL_ERRORs.. But there's also a chance that my hax were wrong since I'm not as familiar with this part of the code. Did you still need a repro from me, or do you think you have a sufficient understanding of the failure mode here? Happy to try and fixings on our PROD system to see if they fix things too (and can do a bit of debugging to see what else could be up). |
I understand the failure. The trick is coming up with a test case which may not be so simple. |
Quick ping on this - is this issue on the cards any time soon? Still seeing this issue in production. |
Change https://golang.org/cl/330909 mentions this issue: |
For golang/go#42777 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+/330909 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Trust: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot please open backport issues for 1.16 and 1.17. |
Backport issue(s) opened: #47691 (for 1.16), #47692 (for 1.17). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
For golang/go#42777 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+/330909 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Trust: Dmitri Shuralyov <dmitshur@golang.org> (cherry picked from commit 60bc85c)
Hi! |
Doing backports now. Thanks for the reminder! |
Change https://golang.org/cl/346889 mentions this issue: |
Change https://golang.org/cl/346890 mentions this issue: |
…te before reading the next frame Updates golang/go#42777 Fixes golang/go#47691 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+/330909 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Trust: Dmitri Shuralyov <dmitshur@golang.org> (cherry picked from commit 60bc85c) Reviewed-on: https://go-review.googlesource.com/c/net/+/346889 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
…te before reading the next frame Updates golang/go#42777 Fixes golang/go#47692 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+/330909 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Trust: Dmitri Shuralyov <dmitshur@golang.org> (cherry picked from commit 60bc85c) Reviewed-on: https://go-review.googlesource.com/c/net/+/346890 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
Change https://golang.org/cl/347010 mentions this issue: |
For #42777. Change-Id: I963db8c666e8bcf0fc4f390b359db6408a0f792b Reviewed-on: https://go-review.googlesource.com/c/go/+/347010 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
Change https://golang.org/cl/347033 mentions this issue: |
If a server sends a stream error of type "protocol error" to a client, that's the server saying "you're speaking http2 wrong". At that point, regardless of whether we're in the right or not (that is, regardless of whether the Transport is bug-free), clearly there's some confusion and one of the two parties is either wrong or confused. There's no point pushing on and trying to use the connection and potentially exacerbating the confusion (as we saw in golang/go#47635). Instead, make the client "poison" the connection by setting a new "do not reuse" bit on it. Existing streams can finish up but new requests won't pick that connection. Also, make those requests as retryable without the caller getting an error. Given that golang/go#42777 existed, there are HTTP/2 servers in the wild that incorrectly set RST_STREAM PROTOCOL_ERROR codes. But even once those go away, this is still a reasonable fix for preventing a broken connection from being stuck in the connection pool that fails all future requests if a similar bug happens in another HTTP/2 server. Updates golang/go#47635 Change-Id: I3f89ecd1d3710e49f7219ccb846e016eb269515b Reviewed-on: https://go-review.googlesource.com/c/net/+/347033 Trust: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
Change https://golang.org/cl/356978 mentions this issue: |
Change https://golang.org/cl/357673 mentions this issue: |
… after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "you're speaking http2 wrong". At that point, regardless of whether we're in the right or not (that is, regardless of whether the Transport is bug-free), clearly there's some confusion and one of the two parties is either wrong or confused. There's no point pushing on and trying to use the connection and potentially exacerbating the confusion (as we saw in golang/go#47635). Instead, make the client "poison" the connection by setting a new "do not reuse" bit on it. Existing streams can finish up but new requests won't pick that connection. Also, make those requests as retryable without the caller getting an error. Given that golang/go#42777 existed, there are HTTP/2 servers in the wild that incorrectly set RST_STREAM PROTOCOL_ERROR codes. But even once those go away, this is still a reasonable fix for preventing a broken connection from being stuck in the connection pool that fails all future requests if a similar bug happens in another HTTP/2 server. Updates golang/go#49077 Change-Id: I3f89ecd1d3710e49f7219ccb846e016eb269515b Reviewed-on: https://go-review.googlesource.com/c/net/+/347033 Trust: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Reviewed-on: https://go-review.googlesource.com/c/net/+/357673 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
… after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "you're speaking http2 wrong". At that point, regardless of whether we're in the right or not (that is, regardless of whether the Transport is bug-free), clearly there's some confusion and one of the two parties is either wrong or confused. There's no point pushing on and trying to use the connection and potentially exacerbating the confusion (as we saw in golang/go#47635). Instead, make the client "poison" the connection by setting a new "do not reuse" bit on it. Existing streams can finish up but new requests won't pick that connection. Also, make those requests as retryable without the caller getting an error. Given that golang/go#42777 existed, there are HTTP/2 servers in the wild that incorrectly set RST_STREAM PROTOCOL_ERROR codes. But even once those go away, this is still a reasonable fix for preventing a broken connection from being stuck in the connection pool that fails all future requests if a similar bug happens in another HTTP/2 server. Updates golang/go#49076 Change-Id: I3f89ecd1d3710e49f7219ccb846e016eb269515b Reviewed-on: https://go-review.googlesource.com/c/net/+/347033 Trust: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Reviewed-on: https://go-review.googlesource.com/c/net/+/356978 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
For golang/go#42777 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+/330909 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Trust: Dmitri Shuralyov <dmitshur@golang.org>
If a server sends a stream error of type "protocol error" to a client, that's the server saying "you're speaking http2 wrong". At that point, regardless of whether we're in the right or not (that is, regardless of whether the Transport is bug-free), clearly there's some confusion and one of the two parties is either wrong or confused. There's no point pushing on and trying to use the connection and potentially exacerbating the confusion (as we saw in golang/go#47635). Instead, make the client "poison" the connection by setting a new "do not reuse" bit on it. Existing streams can finish up but new requests won't pick that connection. Also, make those requests as retryable without the caller getting an error. Given that golang/go#42777 existed, there are HTTP/2 servers in the wild that incorrectly set RST_STREAM PROTOCOL_ERROR codes. But even once those go away, this is still a reasonable fix for preventing a broken connection from being stuck in the connection pool that fails all future requests if a similar bug happens in another HTTP/2 server. Updates golang/go#47635 Change-Id: I3f89ecd1d3710e49f7219ccb846e016eb269515b Reviewed-on: https://go-review.googlesource.com/c/net/+/347033 Trust: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
…te before reading the next frame Updates golang/go#42777 Fixes golang/go#47692 Change-Id: I8d7e35ded3fe508f6c285e84ba768104c3406bdf Reviewed-on: https://go-review.googlesource.com/c/net/+/330909 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Trust: Dmitri Shuralyov <dmitshur@golang.org> (cherry picked from commit 60bc85c4be6d32924bcfddb728394cb8713f2c78) Reviewed-on: https://go-review.googlesource.com/c/net/+/346890 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
… after a stream protocol error If a server sends a stream error of type "protocol error" to a client, that's the server saying "you're speaking http2 wrong". At that point, regardless of whether we're in the right or not (that is, regardless of whether the Transport is bug-free), clearly there's some confusion and one of the two parties is either wrong or confused. There's no point pushing on and trying to use the connection and potentially exacerbating the confusion (as we saw in golang/go#47635). Instead, make the client "poison" the connection by setting a new "do not reuse" bit on it. Existing streams can finish up but new requests won't pick that connection. Also, make those requests as retryable without the caller getting an error. Given that golang/go#42777 existed, there are HTTP/2 servers in the wild that incorrectly set RST_STREAM PROTOCOL_ERROR codes. But even once those go away, this is still a reasonable fix for preventing a broken connection from being stuck in the connection pool that fails all future requests if a similar bug happens in another HTTP/2 server. Updates golang/go#49077 Change-Id: I3f89ecd1d3710e49f7219ccb846e016eb269515b Reviewed-on: https://go-review.googlesource.com/c/net/+/347033 Trust: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Damien Neil <dneil@google.com> Reviewed-on: https://go-review.googlesource.com/c/net/+/357673 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What's the issue
I'm seeing an issue with a grpc-go client talking to a
http2.Server{}
. I've documented the issue grpc/grpc-go#4057. To summarize the issue:http2.Servr{}
on the same machine. The client is generating a lot of load on the server. The server is configured to allow a maximum of 320 requests on a single connection.RST_STREAM
w/PROTOCOL_ERROR
frames from the server.I put some println statements in the client and server code and isolated the error that the server is returning to be from this line. This code checks the
sc.curClientStreams
and rejects the new stream if it believes the client is violating the advertised max num of concurrent streams. I added further logging to the server and the client to record the number of active streams and found that this number never creeps above 320 on the client or the server - the grpc-go client code believes it is being well behaved, which indicated to me something is fishy in thehttp2.Server
code.The only place in the
http2.Server
code that decremtntssc.curClientStreams
is here insc.closeStream
which is called fromsc.wroteFrame
, andsc.wroteFrame
is only ever called in thesc.serve
loop. Writes to the client can be triggered asynchronously bysc.writeFrameAsync
(i.e. see here), and then once the write is complete the result is sent to thesc.wroteFrameCh
which is read by thesc.serve
loop (i.e. see here). Thesc.serve
loop contains a big select statement that reads from thesc.wroteFrameCh
, but also thesc.readFrameCh
which receives new frames from clients. Since these channels are read from the same select statement, if both contain a message then there is no guarantee which one will be read from first.I believe what is happening to cause the spurious
PROTOCOL_ERROR
s is:sc.writeFrameAsync
to be spawned in a separate goroutine that writes a frame f1 that ends the stream.sc.writeFrameAsync
writes the frame f1 to the TCP connection and adds a message to thesc.wroteFrameCh
.sc.readFrames
loop reads frame f2 and pushes it onto thesc.readFrameCh
channel.sc.serve
loop goes through another iteration, and since bothsc.readFrameCh
andsc.writeFrameAsync
contain messages, it's undefined which one it processes first. In this case, it processes the message onsc.readFrameCh
first, which contains frame f2.sc.wroteFrameCh
has not been processed yet, thesc.curClientStreams
variable has not yet been decremented and thus the server blieves thatsc.curClientStreams
is still set to 320. Whensc.processFrameFromReader
is called it will return aPROTOCOL_ERROR
.To attempt to fix this, I patched the
http2.Server
code:When I ran this on the server it significantly reduced the occurrences of the
PRTOCOL_ERROR
s I saw. I was seeing 20 every 3 minutes before the patch, and after I observed about 10 over the course of 2 hours.I believe that this patch solves some of the problem, but not all of it, as the error still persists. Also, I may be totally wrong in my diagnosis above. Does the above reasoning make sense as a possible failure?
I'm going to try and get a minimal repro working, but wanted to create this issue first to get folks thoughts.
The text was updated successfully, but these errors were encountered: