core: fix ConnectivityStateManager is already disabled bug #3288

dapengzhang0 · 2017-07-27T21:31:42Z

Fix bug found by internal user

W/ChannelExecutor: Runnable threw exception in ChannelExecutor
     java.lang.IllegalStateException: ConnectivityStateManager is already disabled
 at com.google.common.base.Preconditions.checkState(Preconditions.java:459)
 at io.grpc.internal.ConnectivityStateManager.gotoState(ConnectivityStateManager.java:66)
 at io.grpc.internal.ManagedChannelImpl$5.run(ManagedChannelImpl.java:476)
 at io.grpc.internal.ChannelExecutor.drain(ChannelExecutor.java:72)
 at io.grpc.internal.DelayedClientTransport.shutdown(DelayedClientTransport.java:216)
 at io.grpc.internal.ManagedChannelImpl.shutdown(ManagedChannelImpl.java:480)
 at io.grpc.internal.ManagedChannelImpl.shutdown(ManagedChannelImpl.java:69)

dapengzhang0 · 2017-07-27T21:42:56Z

Jenkins, retest this please

zhangkun83 · 2017-07-27T22:18:39Z

core/src/main/java/io/grpc/internal/ConnectivityStateManager.java

+      // When ConnectivityStateManager is already disabled, then channel shutdown is called.
+      // Keep state being null.
+      return;
+    }


ConnectivityStateManager should not discriminate states based on the assumption of who calls gotoState(), because the manager may be used in other cases where the assumption may not hold.
I think the issue is that ManagedChannelImpl.shutdown() calls gotoState() when the manager is already disabled. To fix this immediate issue, we can add an isDisabled() to the manager and avoid calling gotoState() if it returns true.

zhangkun83 · 2017-07-28T00:34:57Z

core/src/test/java/io/grpc/internal/ManagedChannelImplTest.java

+    createChannel(new FakeNameResolverFactory(false), NO_INTERCEPTOR);
+    assertEquals(ConnectivityState.IDLE, channel.getState(false));
+    helper.updatePicker(mockPicker);
+


Because gotoState() is called in ChannelExecutor, which catches all exceptions, this shutdown() will not throw even without the fix.
Perhaps we can add an assert false in ChannelExecutor's exception handling block, to actually break test (assuming tests all have assertion enabled).

Applications may also turn on -ea, then the entire application may fail if any runtime exception is thrown in ChannelExecutor's runnables.

what about we run unit tests with a -Dtestgrpc=true argument, and in ChannelExecutor's exception handling block we check if testgrpc is true, if it is we assert false?

The decision for ChannelExecutor to catch instead of throwing is questionable. I did it because ChannelExecutor may run in any thread, e.g., network thread, and I didn't feel comfortable to let random code, e.g., LoadBalancer to throw in network thread. But maybe it's better to just throw? @ejona86 WDYT?

Filed #3293 tracking the ChannelExecutor exception handling.

I'd much rather defer the question; I don't think the user-visible exception handling behavior should be determined by the unit tests. I do think it tends to make sense to catch and log the exception, since propagating the exception will probably only cause additional failures and we know the current stack does not depend on the result of the call (since the execution is not guaranteed to be complete when drain returns.)

You could inject an exception handler or some such, though. We do know that none of the tests should cause an exception in the channel executor.

dapengzhang0 · 2017-07-31T17:27:11Z

Jenkins, retest this please

dapengzhang0 changed the title ~~core: "fix ConnectivityStateManager is already disabled" bug~~ core: fix "ConnectivityStateManager is already disabled" bug Jul 27, 2017

core: fix ConnectivityStateManager is already disabled bug

d37a39a

dapengzhang0 force-pushed the fixchannelstateapibug branch from 4d2d151 to d37a39a Compare July 27, 2017 21:34

dapengzhang0 assigned zhangkun83 Jul 27, 2017

zhangkun83 reviewed Jul 27, 2017

View reviewed changes

incorporate comments

748cb81

dapengzhang0 force-pushed the fixchannelstateapibug branch from ec85979 to 748cb81 Compare July 27, 2017 22:41

dapengzhang0 changed the title ~~core: fix "ConnectivityStateManager is already disabled" bug~~ core: fix ConnectivityStateManager is already disabled bug Jul 27, 2017

zhangkun83 reviewed Jul 28, 2017

View reviewed changes

dapengzhang0 mentioned this pull request Jul 31, 2017

Go to "permanent error mode" if ChannelExecutor throws exception #3293

Closed

rm getState_loadBalancerDoesNotSupportChannelStateThenChannelShutdown()

1f9a476

zhangkun83 approved these changes Jul 31, 2017

View reviewed changes

dapengzhang0 merged commit 18970e6 into grpc:master Aug 1, 2017

dapengzhang0 deleted the fixchannelstateapibug branch August 13, 2017 05:14

lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: fix ConnectivityStateManager is already disabled bug #3288

core: fix ConnectivityStateManager is already disabled bug #3288

Uh oh!

dapengzhang0 commented Jul 27, 2017

Uh oh!

dapengzhang0 commented Jul 27, 2017

Uh oh!

zhangkun83 Jul 27, 2017

Uh oh!

zhangkun83 Jul 28, 2017

Uh oh!

dapengzhang0 Jul 28, 2017

Uh oh!

dapengzhang0 Jul 28, 2017

Uh oh!

zhangkun83 Jul 28, 2017

Uh oh!

dapengzhang0 Jul 31, 2017

Uh oh!

ejona86 Jul 31, 2017

Uh oh!

dapengzhang0 commented Jul 31, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

core: fix ConnectivityStateManager is already disabled bug #3288

core: fix ConnectivityStateManager is already disabled bug #3288

Uh oh!

Conversation

dapengzhang0 commented Jul 27, 2017

Uh oh!

dapengzhang0 commented Jul 27, 2017

Uh oh!

zhangkun83 Jul 27, 2017

Choose a reason for hiding this comment

Uh oh!

zhangkun83 Jul 28, 2017

Choose a reason for hiding this comment

Uh oh!

dapengzhang0 Jul 28, 2017

Choose a reason for hiding this comment

Uh oh!

dapengzhang0 Jul 28, 2017

Choose a reason for hiding this comment

Uh oh!

zhangkun83 Jul 28, 2017

Choose a reason for hiding this comment

Uh oh!

dapengzhang0 Jul 31, 2017

Choose a reason for hiding this comment

Uh oh!

ejona86 Jul 31, 2017

Choose a reason for hiding this comment

Uh oh!

dapengzhang0 commented Jul 31, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants