-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: fix the race between channel shutdown and clientCallImpl start #3287
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this PR would be better off only handling when deadlineCancellationExecutor rejects execution and use that to cancel the RPC. That seems simpler, doesn't add an additional race, and improves error handling even when the channel isn't terminated.
I would prefer fixes the race instead of mitigating it; I think fixing #2562 may help us in that regard.
|
||
@Override | ||
public boolean cancel(boolean mayInterruptIfRunning) { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should return false
if the future is already cancelled.
@@ -310,11 +330,52 @@ public void run() { | |||
} | |||
} | |||
|
|||
private ScheduledFuture<?> startDeadlineTimer(Deadline deadline) { | |||
private ScheduledFuture<?> startDeadlineTimer(final Deadline deadline) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't use any part of the ScheduledFuture
interface. Maybe just return Future<?>
and then use a pre-existing Future implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the pre-existing Future implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least there's FutureTask
and Futures.immediateCancelledFuture()
.
|
||
@Override | ||
public boolean isDone() { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should return true
if the future is cancelled.
} | ||
} | ||
|
||
return new CancelledFuture(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable silently ignoring the deadline. That seems just asking for undiagnosable run-away RPCs. The graceful way to handle this would be to fail the RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RPC would fail automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RPC is not guaranteed to fail here. You're hoping that the rejection is related to channel termination. But it could have been that the queue was too long. Or it could be that the user shut down their executor before gRPC terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I got what you concerned.
@@ -149,6 +151,26 @@ public void start(final Listener<RespT> observer, Metadata headers) { | |||
checkNotNull(observer, "observer"); | |||
checkNotNull(headers, "headers"); | |||
|
|||
if (deadlineCancellationExecutor == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not excited about having a very different code path during the race vs after (with this case being after). It makes it all the more difficult to find bugs during the race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with removing this code path.
@@ -554,6 +555,32 @@ public String authority() { | |||
} | |||
} | |||
|
|||
private final class DirectFallbackExecutor implements Executor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite an interesting idea, although coupled with the ClientCallImpl change, it will cause us to use the executor
after shutdown more than previously. Previously we would never use the executor
after shutdown; only the deadlineCancellationExecutor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems previously we also would use the executor
after shutdown: anything inside ClientCallImpl.start() may be happening after shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that with FailingClientTransport it would be directly calling back the application, but I guess it does go through the listener which uses the callExecutor. The usages of callExecutor
in start are rarely triggered.
command.run(); | ||
} | ||
}); | ||
} catch (RejectedExecutionException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on the rejection, can we set executor = null
when terminated and then check it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we can check terminated
directly, but still a race.
If we check executor =! null
, and run with it, there could be a NPE because of the race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPE is not a concern:
Executor saved = executor;
if (saved == null) {
command.run();
} else {
saved.execute(command);
}
I agree there is still a race between checking executor
and running with it, but it is dramatically smaller and controlled by our code instead of the executor.
In my mind the race is calling executor
after the channel has terminated. None of these solutions actually fix that. They just mitigate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that none of these solutions actually fix the race. But with the rejection check, the only unresolved race is very very benign as follows:
The executor.execute()
is called and throws a RejectedExecutionException
for whatever unlikely reason before the channel is terminated, and channel is terminated when we check it in the catch block, so we don't throw this exception, and call command.run()
directly.
Other cases are handled gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. The rejection check continues to allow calls to executor
. That's the race. executor
may be provided by the application and it's not safe to call after termination. Or at least that's been my assumption.
I could believe the argument that we should always use the executor
, even after termination, based on the theory that the application is expecting us to use executor
(so it may be "magical") and it's the application's own fault for making an RPC after shutdown (not termination). But in that case we mustn't use direct if there was a RejectionExecutionException
. I'm not sure how many options we have, but we'd maybe need to propagate the exception to the caller and we never call their listener.
But this seems to be neither of these. I'm not sure what API theory this change is based on.
@@ -535,7 +536,7 @@ public String authority() { | |||
CallOptions callOptions) { | |||
Executor executor = callOptions.getExecutor(); | |||
if (executor == null) { | |||
executor = ManagedChannelImpl.this.executor; | |||
executor = new DirectFallbackExecutor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not create this every time.
c13a434
to
351e384
Compare
0727240
to
63d7d2f
Compare
obsolete |
This PR tries to fix the race described and updated in #1981 by introducing a ReadWriteLock.
Tests will be added if the approach is acceptable.