Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: client sends too many pings for gRPC servers since v0.31.0 #70575

Closed
RemiBou opened this issue Nov 26, 2024 · 11 comments
Closed

x/net/http2: client sends too many pings for gRPC servers since v0.31.0 #70575

RemiBou opened this issue Nov 26, 2024 · 11 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@RemiBou
Copy link

RemiBou commented Nov 26, 2024

Go version

1.23.3

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT='boringcrypto'
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/.gvm/pkgsets/go1.23.3/global/pkg/mod'
GONOPROXY='none'
GONOSUMDB='...'
GOOS='linux'
GOPATH='/root/.gvm/pkgsets/go1.23.3/global'
GOPRIVATE='...'
GOPROXY='...'
GOROOT='/root/.gvm/gos/go1.23.3'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/root/.gvm/gos/go1.23.3/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.3'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='.../aarch64-linux-musl-cross/bin/aarch64-linux-musl-gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='.../go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1336983329=/tmp/go-build -gno-record-gcc-switches'

What did you do?

We use traefik as a library for our service mesh.

What did you see happen?

Since upgrading x/net to 0.31.0 we have this error appearing on our instance that is deployed as a sidecar of a grpc java server

2024/11/26 10:32:41 httputil: ReverseProxy read error during body copy: http2: server sent GOAWAY and closed the connection; LastStreamID=561, ErrCode=ENHANCE_YOUR_CALM, debug="too_many_pings"

Downgrading net to 0.30 fixes the issues

I think it might be due to the new ping added here

golang/net@f35fec9#diff-e9bd9b4a514c2960ad85405e4a827d83c2decaaac70e9e0158a59db2d05807a7R3149 by @neild

You can find the framer logs here (http2debug=2)
framerlog.log

This message is returned by grpc java here

https://github.com/grpc/grpc-java/blob/a79982c7fded85ac6f53896f045e959b6cbd7c43/netty/src/main/java/io/grpc/netty/NettyServerHandler.java#L983

Which runs this function https://github.com/grpc/grpc-java/blob/master/core/src/main/java/io/grpc/internal/KeepAliveEnforcer.java#L57

This basically refuses a ping if one was sent less than 2 hour before

What did you expect to see?

Stream to be kept and not reset

@GoVeronicaGo
Copy link

cc: @neild @tombergan

@GoVeronicaGo GoVeronicaGo added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 26, 2024
@seankhliao seankhliao changed the title net/http2: Too many ping send since x/net 0.31 x/net/http2: client gets too_many_pings errors since v0.31.0 Nov 26, 2024
@seankhliao seankhliao changed the title x/net/http2: client gets too_many_pings errors since v0.31.0 x/net/http2: client sends too many pings for gRPC servers since v0.31.0 Nov 26, 2024
@seankhliao
Copy link
Member

@neild
Copy link
Contributor

neild commented Nov 26, 2024

Two hours is certainly...a choice.

The default limit seems to be five minutes if there are outstanding requests, which is still an eternity. There doesn't seem to be any reasonable and safe way to health check an unresponsive Java (possibly any) gRPC connection. (Well, other than to send junk requests, such as OPTIONS *. So far as I can tell, those are fine, despite being more expensive than cheap PING frames.)

Not sure what the correct thing to do here is.

@seankhliao
Copy link
Member

I used to run into similar issues with other reverse proxies as well.
In the past I've disabled enforcement on the server side, in go that's https://pkg.go.dev/google.golang.org/grpc@v1.68.0/keepalive#EnforcementPolicy
I think that may be a reasonable thing to do if grpc no longer has a direct mapping of the underlying streams?

@neild
Copy link
Contributor

neild commented Nov 26, 2024

Some options:

  1. Roll back https://go.dev/cl/617655. I dislike this, because I think that change fixes real problems.
  2. Add a GODEBUG to disable sending PINGs other than when explicitly configured to do so. Inconvenient for users, but something we can do today.
  3. Add a knob controlling whether to send non-ReadIdleTimeout PINGs. (DisablePingFrames?) Nobody likes more knobs, and this would need to go through the proposal process. Maybe something we should do, but not an immediate fix.
  4. Detect the ENHANCE_YOUR_CALM/"too_many_pings" closure and remember not to send PINGs on connections to that authority. Fiddly. One connection still needs to drop before we stop pinging. Can't say I like this.

gRPC Java resets its "too many pings" counter every time it sends headers, data, or trailers. You can send as many pings as you want, as long as you wait for the server to send something after the ping response. Perhaps there's some way we can take advantage of this, but I haven't figured out how yet.

@RemiBou
Copy link
Author

RemiBou commented Nov 26, 2024

@neild from what I understood this change is trying to workaround issue with tcp when application isnt gracefully shutdown or there is a network issue and request are just timing out

Maybe it's not the role of the http2 client to overcome this ? grpc java closed similar problem grpc/grpc-java#8380. To me the standard solution is to decrease the tcp retries on environment (Elastic Search suggest this).

On a side note, traefik didnt bump of x/net yet, istio didnt release the bump (but it is in master). Once they release it, this issue might become more important.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/632995 mentions this issue: http2: limit number of PINGs bundled with RST_STREAMs

@bobby-stripe
Copy link

bobby-stripe commented Dec 2, 2024

@RemiBou Linux kernel keepalive configuration (net.ipv4.tcp_keepalive_time, net.ipv4.tcp_retries2 and friends) are typically configured to take effect in the 10s of seconds to 100s of seconds range - it seems like Go's http2 is in a much better state to detect and avoid using bad connections in milliseconds to seconds - I'd personally love to have this as a default.

I agree that it is unfortunate that the gRPC "algorithm" and implementations like Java have made such wild choices around ping handling. I don't see a single clearly right path forward, but given the near universal adoption of gRPC over other RPC frameworks out there, and given gRPC isn't likely going away anytime soon (nor will implementations change their ping handling + users update to new versions) a single hack to recognize ENHANCE_YOUR_CALM responses feels reasonable!

@neild
Copy link
Contributor

neild commented Dec 2, 2024

Compromise fix/workaround: gRPC resets the too-many-pings count every time it sends a HEADERS or DATA frame. (This is, in my opinion, backwards: It should reset the count when it receives a HEADERS/DATA frame.) https://go.dev/cl/632995 limits us to at most one PING sent per HEADERS/DATA frame received. (Does not apply to keepalive pings, if you enable keepalives on a gRPC connection the peer will probably close it unless you set a very long keepalive interval. That appears to be working as intended on the gRPC side.)

@RemiBou
Copy link
Author

RemiBou commented Dec 9, 2024

@neild as far as we see it, your change fixed our problem, thanks a lot for the quick answer and fixes 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants