Skip to content

Conversation

@kssumin
Copy link
Contributor

@kssumin kssumin commented Nov 12, 2025

Problem

When bidirectional keep-alive is enabled (both client and server sending keep-alive pings), the server incorrectly terminates connections with a GO_AWAY frame even when the client is compliant with the server's keep-alive policies. This occurs because the KeepAliveEnforcer validates all incoming ping frames—both client-initiated ping requests and ping acknowledgments (ACKs) sent in response to server-initiated pings.

In a bidirectional keep-alive scenario, the client sends ACKs back to the server for each server-initiated ping. These ACKs are legitimate protocol responses, not abusive client behavior. However, the enforcer treats them as strike-worthy events, causing the strike counter to increment until MAX_PING_STRIKES is exceeded, triggering a connection shutdown with ENHANCE_YOUR_CALM.

This breaks bidirectional keep-alive entirely—a legitimate use case for detecting network failures from both endpoints.

Solution

The keep-alive enforcement logic now only validates client-initiated ping requests (ack=false), not ping acknowledgments (ack=true). This allows servers and clients to exchange keep-alive pings independently without either side's pings counting against the other's enforcement limits.

The fix moves the keepAliveEnforcer.pingAcceptable() check inside the !ack conditional block in okhttp/src/main/java/io/grpc/okhttp/OkHttpServerTransport.java:954, ensuring ping ACKs bypass validation entirely.

Impact

This fix enables bidirectional keep-alive to work correctly. Clients can now respond to server pings without triggering spurious connection shutdowns. Server-side keep-alive enforcement continues to protect against abusive client behavior for actual ping requests.

No breaking changes. This corrects existing behavior to match the expected semantics of the HTTP/2 ping mechanism.

Fixes #12417

When bidirectional keep-alive is enabled (both client and server
sending keep-alive pings), the server incorrectly validates ping
acknowledgments (ACKs) sent in response to server-initiated pings.
This causes the KeepAliveEnforcer strike counter to increment for
legitimate protocol responses, eventually triggering a GO_AWAY with
ENHANCE_YOUR_CALM.

Move the keepAliveEnforcer.pingAcceptable() check inside the !ack
conditional block, so only client-initiated ping requests are
validated. Ping ACKs now bypass enforcement entirely, enabling
bidirectional keep-alive to work correctly.

Fixes grpc#12417
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 12, 2025

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: kssumin / name: mando (13e63bf)

@ejona86
Copy link
Member

ejona86 commented Nov 12, 2025

This breaks bidirectional keep-alive entirely

A workaround would be to configure serverBuilder.permitKeepAliveTime() to be no larger than serverBuilder.keepAliveTime().

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note Netty has the same behavior as with this fix.

if (!keepAliveEnforcer.pingAcceptable()) {
calls pingAcceptable(), and the onPingAckRead() method does not.

@ejona86 ejona86 added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 12, 2025
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 12, 2025
@ejona86 ejona86 added the TODO:backport PR needs to be backported. Removed after backport complete label Nov 12, 2025
@ejona86 ejona86 merged commit 8fa2c2c into grpc:master Nov 12, 2025
15 of 17 checks passed
@ejona86 ejona86 removed the TODO:backport PR needs to be backported. Removed after backport complete label Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected GO_AWAY with bi-directional keep-alive

3 participants