-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/http2: client sends too many pings for gRPC servers since v0.31.0 #70575
Comments
cc: @neild @tombergan |
The gRPC algorithm is at https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md#server-enforcement |
Two hours is certainly...a choice. The default limit seems to be five minutes if there are outstanding requests, which is still an eternity. There doesn't seem to be any reasonable and safe way to health check an unresponsive Java (possibly any) gRPC connection. (Well, other than to send junk requests, such as OPTIONS *. So far as I can tell, those are fine, despite being more expensive than cheap PING frames.) Not sure what the correct thing to do here is. |
I used to run into similar issues with other reverse proxies as well. |
Some options:
gRPC Java resets its "too many pings" counter every time it sends headers, data, or trailers. You can send as many pings as you want, as long as you wait for the server to send something after the ping response. Perhaps there's some way we can take advantage of this, but I haven't figured out how yet. |
@neild from what I understood this change is trying to workaround issue with tcp when application isnt gracefully shutdown or there is a network issue and request are just timing out Maybe it's not the role of the http2 client to overcome this ? grpc java closed similar problem grpc/grpc-java#8380. To me the standard solution is to decrease the tcp retries on environment (Elastic Search suggest this). On a side note, traefik didnt bump of x/net yet, istio didnt release the bump (but it is in master). Once they release it, this issue might become more important. |
Change https://go.dev/cl/632995 mentions this issue: |
@RemiBou Linux kernel keepalive configuration ( I agree that it is unfortunate that the gRPC "algorithm" and implementations like Java have made such wild choices around ping handling. I don't see a single clearly right path forward, but given the near universal adoption of gRPC over other RPC frameworks out there, and given gRPC isn't likely going away anytime soon (nor will implementations change their ping handling + users update to new versions) a single hack to recognize |
Compromise fix/workaround: gRPC resets the too-many-pings count every time it sends a HEADERS or DATA frame. (This is, in my opinion, backwards: It should reset the count when it receives a HEADERS/DATA frame.) https://go.dev/cl/632995 limits us to at most one PING sent per HEADERS/DATA frame received. (Does not apply to keepalive pings, if you enable keepalives on a gRPC connection the peer will probably close it unless you set a very long keepalive interval. That appears to be working as intended on the gRPC side.) |
@neild as far as we see it, your change fixed our problem, thanks a lot for the quick answer and fixes 👍 |
Go version
1.23.3
Output of
go env
in your module/workspace:What did you do?
We use traefik as a library for our service mesh.
What did you see happen?
Since upgrading x/net to 0.31.0 we have this error appearing on our instance that is deployed as a sidecar of a grpc java server
2024/11/26 10:32:41 httputil: ReverseProxy read error during body copy: http2: server sent GOAWAY and closed the connection; LastStreamID=561, ErrCode=ENHANCE_YOUR_CALM, debug="too_many_pings"
Downgrading net to 0.30 fixes the issues
I think it might be due to the new ping added here
golang/net@f35fec9#diff-e9bd9b4a514c2960ad85405e4a827d83c2decaaac70e9e0158a59db2d05807a7R3149 by @neild
You can find the framer logs here (http2debug=2)
framerlog.log
This message is returned by grpc java here
https://github.com/grpc/grpc-java/blob/a79982c7fded85ac6f53896f045e959b6cbd7c43/netty/src/main/java/io/grpc/netty/NettyServerHandler.java#L983
Which runs this function https://github.com/grpc/grpc-java/blob/master/core/src/main/java/io/grpc/internal/KeepAliveEnforcer.java#L57
This basically refuses a ping if one was sent less than 2 hour before
What did you expect to see?
Stream to be kept and not reset
The text was updated successfully, but these errors were encountered: