Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (go env)?
go env Output
$ go env
What did you do?
We have a proxy server in Go that proxies over HTTP/2 to a third-party server. We're seeing that the third-party server sometimes runs out of flow control tokens and can't send anymore data back to the client. There's no clear reproduction path, but it seems to happen when the proxy is under heavy load and seeing an increase in canceled requests, due to higher latency.
Looking for potential causes in the http2 package, I noticed that in the case where the transport reads a data frame for a stream that was already canceled, it returns the flow control tokens, but it also increments the flow control counter for the connection:
cc.inflow.add is increasing the flow control window, and the subsequent lines send the increase to the server in a WINDOW_UPDATE frame. There might be a flow control bug here somewhere, but I don't think these lines are it unless I'm misreading something.
Ah, you're right. That does look wrong; we're returning flow control tokens to cs.inflow that were never taken. (I thought we consumed the connection-level flow earlier, but we don't; connection and stream flow are consumed at the same time.) Over time, that'll lead to the transport thinking the server has more tokens than it does, and it'll stop sending window updates.
Add a new inflow type for tracking inbound flow control.
An inflow tracks both the window sent to the peer, and the
window we are willing to send. Updates are accumulated and
sent in a batch when the unsent window update is large
Large enough is currently defined as at least doubling
the peer's current window.
This change makes both the client and server use the same
algorithm to decide when to send window updates. This should
slightly reduce the rate of updates sent by the client, and
significantly reduce the rate sent by the server.
Fix a client flow control tracking bug: When processing data
for a canceled stream, the record of flow control consumed
by the peer was not updated to account for the discard