Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pings being sent too often by KeepAliveManager #3274

Closed
stephenh opened this issue Jul 25, 2017 · 7 comments
Closed

Pings being sent too often by KeepAliveManager #3274

stephenh opened this issue Jul 25, 2017 · 7 comments
Assignees
Milestone

Comments

@stephenh
Copy link
Contributor

Please answer these questions before submitting your issue.

What version of gRPC are you using?

1.5.0

What JVM are you using (java -version)?

1.8

What did you do?

  • Enabled keepAliveTime of 20 seconds on the client, permitKeepAlive of 10 seconds on the server
  • Start the server w/logging in KeepAliveEnforcer
  • Start the client w/logging in KeepAliveManager
  • After ~3 pings, KeepAliveManager sends a ping after < 10 seconds, which is marked invalid by KeepAliveEnforcer
  • This happens a few more times and then more calm is sent

What did you expect to see?

  • Pings sent only every 20 seconds

You can checkout github.com/stephenh/mirror, the latest master has the reproducible error.

Here is the output of some logging that I added to KeepAliveManager:


12:37:54 STATE GOING TO PING_SCHEDULED from onTransportActive <-- 1st schedule
12:37:54 SCHEDULING sendPing in onTransportActive, delay=19545661745 (19.5 seconds)
12:38:06 STATE GOING TO PING_DELAYED from onDataReceived
12:38:07 STATE GOING TO IDLE from onTransportIdle

12:38:08 STATE GOING TO PING_SCHEDULED from onTransportActive <-- 2nd schedule
12:38:08 SCHEDULING sendPing in onTransportActive, delay=19630595429 (19.6 seconds)

12:38:13 RUNNING sendPing, state=PING_SCHEDULED <-- from 1st schedule (~20 seconds later)
12:38:13 STATE GOING TO PING_SENT from sendPing <-- 1st ping at :13
12:38:13 STATE GOING TO PING_SCHEDULED from onDataReceived
12:38:13 SCHEDULING sendPing in onDataReceived, delay=20000000000 <-- 3rd schedule

12:38:27 RUNNING sendPing, state=PING_SCHEDULED <-- from 2nd schedule (~20 seconds later)
12:38:27 STATE GOING TO PING_SENT from sendPing <-- 2nd ping at :27

12:38:28 STATE GOING TO PING_SCHEDULED from onDataReceived
12:38:28 SCHEDULING sendPing in onDataReceived, delay=20000000000

12:38:33 RUNNING sendPing, state=PING_SCHEDULED <-- from 3rd schedule
12:38:33 STATE GOING TO PING_SENT from sendPing <-- 3rd ping at :33, < 10 seconds later, marked invalid
12:38:33 STATE GOING TO PING_SCHEDULED from onDataReceived
12:38:33 SCHEDULING sendPing in onDataReceived, delay=2000000000

You can see the 3rd ping is sent only 6 seconds after the 2nd ping.

My guess is that KeepAliveManager is not handling sendPing being scheduled multiple times, and so should check nextKeepaliveTime instead of just the state.

@stephenh
Copy link
Contributor Author

@dapengzhang0 apologies for the "ping" :-), but you've touched KeepAliveManager and so wanted to get this in front of you (that it's sending pings more often than it should, leading to the server marking the pings as invalid, and shutting down the connection).

@stephenh
Copy link
Contributor Author

FWIW in KeepAliveManager onTransportIdle, I made this change:

     if (state == State.PING_SCHEDULED || state == State.PING_DELAYED) {
+      if (pingFuture != null) {
+        pingFuture.cancel(false);
+      }

And I now have appropriately-spaced pings and am not getting goaways.

Disclaimer I didn't write any tests and there could very likely be a better way of handling this (basically either preventing or correctly handling sendPing getting scheduled twice).

@ejona86
Copy link
Member

ejona86 commented Jul 26, 2017

We purposefully don't cancel the pingFuture on transport IDLE. However, that still points us to the problem.

In onTransportActive() we should only schedule pingFuture if sendPing ran from the previous pingFuture. It looks like that means we should set pingFuture = null when it is run or cancelled.

@ejona86
Copy link
Member

ejona86 commented Jul 26, 2017

Assigning to @dapengzhang0, because I've got a lot on my plate and @dapengzhang0 is already familiar with the KeepAliveManager.

@ejona86 ejona86 added this to the Next milestone Jul 27, 2017
stephenh added a commit to stephenh/grpc-java that referenced this issue Jul 31, 2017
If onTransportActive ran while SendPing was already scheduled, we would
schedule another SendPing, which seems fine, but the server might observe
us sending pings too quickly, and make us GOAWAY.

Fixes grpc#3274.
@stephenh
Copy link
Contributor Author

@ejona86 thanks for the feedback! Here's my 2nd attempt, now with a test that was failing pre-fix.

Not sure if I want to open this can of worms, but using a mock for the scheduler is kind of unfortunate, given the default mock behavior of .schedule is to return null, which means a lot of these unit tests were are executing pingFuture = scheduler.schedule are actually getting back null and so not really testing the production code paths, at least now with the more if pingFuture = or != null tests that are added.

In theory setUp could setup a doReturn on the mock scheduler by default, which is probably a good idea...let me know if I should incorporate that.

@dapengzhang0
Copy link
Member

@stephenh thanks for sending the PR. For testing KeepAliveManager, there is a tracking issue #2868.

stephenh added a commit to stephenh/grpc-java that referenced this issue Aug 1, 2017
If onTransportActive ran while SendPing was already scheduled, we would
schedule another SendPing, which seems fine, but the server might observe
us sending pings too quickly, and make us GOAWAY.

Fixes grpc#3274.
stephenh added a commit to stephenh/grpc-java that referenced this issue Aug 1, 2017
If onTransportActive ran while SendPing was already scheduled, we would
schedule another SendPing, which seems fine, but the server might observe
us sending pings too quickly, and make us GOAWAY.

Fixes grpc#3274.
@stephenh
Copy link
Contributor Author

stephenh commented Aug 1, 2017

@dapengzhang0 nice, thanks for the link to #2868. That FakeClock/stub executor looks really spiffy. I've used similar approaches in the past, but not that sophisticated. Would be great to see that approach/initial implementation flushed out and end up in something like Guava.

dapengzhang0 pushed a commit that referenced this issue Aug 1, 2017
If onTransportActive ran while SendPing was already scheduled, we would
schedule another SendPing, which seems fine, but the server might observe
us sending pings too quickly, and make us GOAWAY.

Fixes #3274.
@ejona86 ejona86 modified the milestones: 1.6, Next Aug 2, 2017
@lock lock bot locked and limited conversation to collaborators Sep 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants