Implement channel-state API in ManagedChannelImpl #2292

zhangkun83 · 2016-09-21T14:41:19Z

The API is added by #2286, but is implemented (will throw exception) in ManagedChannelImpl.

lukaszx0 · 2017-02-10T06:53:25Z

@zhangkun83 I've talked with @ejona86 about this recently. What design do you have in mind?

Simply adding getState() to picker which would return aggregate of picker's subchannels? I'm assuming we also want to keep the "old" API for ManagedChannel ?

class ManagedChannel {
  State getState(boolean tryToConnect);
  void notifyWhenStateChanged(State lastObserved, Runnable callback);
}

zhangkun83 · 2017-02-10T18:10:49Z

@ejona86 has talked with me about that. The idea is that the channel will call SubchannelPicker.getState() only once, in updatePicker(), so that a new state would require a new picker.

It will work in the sense of implementing the channel state API, but I am yet to figure out how it should interact with pickSubchannel().

Conceptually, PickResult maps to the channel state:

withSubchannel -> READY
withError -> TRANSIENT_FAILURE
withNoResult -> IDLE, CONNECTING

Should we allow getState() to disagree with the pick results, e.g., getState() returns TRANSIENT_FAILURE and in the mean time pickSubchannel() returns working subchannels?

I'm assuming we also want to keep the "old" API for ManagedChannel ?

I am not sure what you are referring to.

lukaszx0 · 2017-02-13T04:17:52Z

I am not sure what you are referring to.

I mean that API for ManagedChannel2, the snippet that I've pasted.

It will work in the sense of implementing the channel state API, but I am yet to figure out how it should interact with pickSubchannel().

That's a tricky one. I'm not sure the ManagedChannel state should be tied to the result returned by pickSubchannel or maybe I'm misunderstanding the definition and semantics of channel's state.

In my opinion ManagedChannel state should reflect the state of underlying subchannels, in aggregate. Interaction with pickSubchannel would mean that channel's state could be affected / changed by affinity of a an RPC call (or Metadata).

The idea is that the channel will call SubchannelPicker.getState() only once, in updatePicker(), so that a new state would require a new picker.

I like the idea and with what I wrote above, I think it should be as simple as adding abstract ConnectivityState getState() to the SubchannelPicker and let users arbitrarily set it based on the current state of subchannels managed by LoadBalancer.

I'm not convinced it should interact with pickSubchannel at all - state of the managed channel should already be known and defined at this point.

lukaszx0 · 2017-02-13T04:29:56Z

And since ManagedChannel's state, in my opinion, should be based on the underlying subchannels state, I think we should have a first class API for setting and getting state of Subchannel (something we should do regardless, IMO). Currently the convention is to convey it in Subchannel's attributes.

zhangkun83 · 2017-02-16T01:24:25Z

I mean that API for ManagedChannel2, the snippet that I've pasted.

I was confused by you saying that ManagedChannel interface is "old API". Anyway, we have no intent to change the ManagedChannel API.

I'm not convinced it should interact with pickSubchannel at all - state of the managed channel should already be known and defined at this point.

I mean today's pickSubchannel() already implies the state of the channel to some extent. For example, when there is no READY subchannel, an LB would create a picker that always returns NO_RESULT. If there was a getState() on the picker, it would be unreasonable to return READY. However, there isn't a mechanism to prevent that. The discrepancy between pickSubchannel() and `getState() is possible because they are not orthogonal.

That said, it's not a deal-breaker. We can live with it if we don't have any better options, but I would like to explore other options.

zhangkun83 · 2017-02-16T01:34:14Z

And since ManagedChannel's state, in my opinion, should be based on the underlying subchannels state, I think we should have a first class API for setting and getting state of Subchannel (something we should do regardless, IMO). Currently the convention is to convey it in Subchannel's attributes.

We don't set state to the Subchannel, because Subchannel manages state on its own. A state getter on Subchannel can race with state changes, and it will be non-trivial to make it right and it won't work well with the serialized threading model of LBv2.

lukaszx0 · 2017-02-27T00:36:22Z

The discrepancy between pickSubchannel() and `getState() is possible because they are not orthogonal.

Yes, that's actually big issue and what I was trying to point out. Results of pickSubchannel are nondeterministic in a way that we can't predict what kind of logic users will implement and by nature it is dynamic based on the affinity and call options.

My point is that I think we can't depend on PickResult returned from pickSubchannel() because following scenario is possible:

LB creates 3 subchannels
LB returns SubchannelPicker with 3 healthy subchannels
ManagedChannel should report READY state.
User makes a call with a custom callOptions.
Call fails with PickResult.withNoResult or PickResult.withError depending on the pickSubchannel impl. The reason for that might be because it didn't return any subchannel for given callOptions (for example user could've been targeting the host in a different DC and this DC was not included in the subchannel list because name resolved didn't returned it)

Now my question is if ManagedChannel should transition to TRANSIENT_FAILURE or IDLE because of that? It could but it depends on what's the definition of ManagedChannel's state, which is something that I'm currently confused about because originally assumed that I'll be reflecting the health of its subchannels (so after 4) it shouldn't change the state and ManagedChannel should continue to be READY).

@zhangkun83 thoughts?

zhangkun83 · 2017-02-27T17:44:23Z

@lukaszx0 your confusion is expected. This is because the connectivity state spec is modeled after a single connection or a set of equivalent connections (like in the stock pick-first and round-robin LBs). For those cases the scenario you described should never happen.

If the channel works for some requests but not for the others, the spec doesn't define what state the channel should be in as a whole, and we leave the decision to the LB implementation. Because the channel's state is going to be consumed by the user, e.g., for displaying connectivity state on the GUI or for delaying RPCs until the channel is "READY", you should return whatever state you expect your user to see.

Discussing this with you has allowed me to re-think my concern that pickSubchannel() and getState() may disagree, and now it looks more acceptable to me. getState() represents the state of the channel as a whole, and pickSubchannel() represents the result for a particular request, and in the scenario you described, they can disagree.

lukaszx0 · 2017-02-27T20:05:12Z

This is because the connectivity state spec is modeled after a single connection or a set of equivalent connections (like in the stock pick-first and round-robin LBs). For those cases the scenario you described should never happen.

Yeah, this is right and is a safe assumption for implementations which don't let provide custom LB implementations (I don't know how C or Go deals with it, I might be wrong and they might be as pluggable as Java).

Discussing this with you has allowed me to re-think my concern that pickSubchannel() and getState() may disagree, and now it looks more acceptable to me. getState() represents the state of the channel as a whole, and pickSubchannel() represents the result for a particular request, and in the scenario you described, they can disagree.

Yes! Looks like we're on the same page now 😉 I'll take a stab at it later today and we can continue iterating in PR.

lukaszx0 · 2017-03-06T18:07:48Z

@ejona86 wrote:

At the very least, you would exit idle mode. You could request an InternalChannel from the LB and then obtainActiveTransport(). But it's not clear how this boolean should work when using round robin. I guess that's because it's not clear what the state should be when using round robin.

That is to say, we need to work on the spec to define it.

Soooo is it acceptable to just drop this param? Or do we need to have strict API-compat with C version?

zhangkun83 · 2017-03-06T18:27:59Z

Because round-robin eagerly creates connection, the requestConnection argument in getState() simply doesn't make any difference. However, this argument does make a difference for any lazy-connection LBs. ManagerChannelImpl should probably pass this signal to LoadBalancer, and let LoadBalancer decide what to do.

stephenh · 2017-07-25T11:16:03Z

This hasn't been updated in a few months; any chance it'll be in a pretty-soon-ish upcoming release?

I'm working on an application where I want mosh-style magical-reconnects as the network comes/goes and AFAIU grpc-java would do that out-of-the-box with keep alives enabled on a ManagedChannel + this API implemented to let my app know when the transport goes up/down.

zhangkun83 · 2017-07-25T18:15:31Z

The state plumbing in channel has been implemented by #3019, although no LoadBalancer yet supports it. /cc @dapengzhang0

stephenh · 2017-07-26T02:26:27Z

Nice, thanks @zhangkun83! I don't need load balance support, so I'll start poking around at this.

zhangkun83 · 2017-07-26T15:55:44Z

To clarify, even if you don't need load balancing, you will be using the default PickFirstBalancer, which is yet to support the channel states.

dapengzhang0 · 2017-09-19T17:41:17Z

Resolved by #3019 #3300 #3311 #3327

ManagedChannel now supports the getState/notifyWhenStateChanged API (grpc#2292).

ManagedChannel now supports the getState/notifyWhenStateChanged API (#2292).

ManagedChannel now supports the getState/notifyWhenStateChanged API (grpc#2292).

ManagedChannel now supports the getState/notifyWhenStateChanged API (#2292).

zhangkun83 added the enhancement label Sep 21, 2016

zhangkun83 added this to the 1.1 milestone Sep 21, 2016

zhangkun83 self-assigned this Sep 21, 2016

zhangkun83 mentioned this issue Sep 21, 2016

Channel-state API #28

Closed

ejona86 mentioned this issue Dec 5, 2016

Listen to channel state changes on Android #2484

Closed

zhangkun83 modified the milestones: 1.2, 1.1 Jan 12, 2017

lukaszx0 mentioned this issue Feb 28, 2017

core: implement state API for ManagedChannel #2768

Closed

carl-mastrangelo modified the milestones: Next, 1.2 Mar 16, 2017

ejona86 mentioned this issue Apr 8, 2017

Unimplemented channel state API is not documented as such #2883

Closed

ejona86 mentioned this issue May 15, 2017

ConnectivityStateInfo never update when using RoundRobinLoadBalancerFactory #3005

Closed

stephenh mentioned this issue Jul 25, 2017

Connections cannot be re-established after network loss/recovery #3266

Closed

dapengzhang0 closed this as completed Sep 19, 2017

ejona86 modified the milestones: Next, 1.6 Sep 25, 2017

dcow mentioned this issue Nov 16, 2017

ManagedChannel getState/notifyChanged API docs #3762

Closed

dcow pushed a commit to dcow/grpc-java that referenced this issue Nov 17, 2017

core: update ManagedChannel docs (grpc#3762)

f84fd51

ManagedChannel now supports the getState/notifyWhenStateChanged API (grpc#2292).

dcow mentioned this issue Nov 17, 2017

core: update ManagedChannel docs (#3762) #3768

Merged

ejona86 pushed a commit that referenced this issue Nov 17, 2017

core: update ManagedChannel docs (#3762)

3315d16

ManagedChannel now supports the getState/notifyWhenStateChanged API (#2292).

ejona86 pushed a commit to ejona86/grpc-java that referenced this issue Nov 17, 2017

core: update ManagedChannel docs (grpc#3762)

ec5c5ae

ManagedChannel now supports the getState/notifyWhenStateChanged API (grpc#2292).

ejona86 mentioned this issue Nov 17, 2017

core: update ManagedChannel docs (#3762) (backport) #3775

Merged

ejona86 pushed a commit that referenced this issue Nov 17, 2017

core: update ManagedChannel docs (#3762)

2fa6427

ManagedChannel now supports the getState/notifyWhenStateChanged API (#2292).

lock bot locked as resolved and limited conversation to collaborators Sep 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement channel-state API in ManagedChannelImpl #2292

Implement channel-state API in ManagedChannelImpl #2292

zhangkun83 commented Sep 21, 2016

lukaszx0 commented Feb 10, 2017

zhangkun83 commented Feb 10, 2017

lukaszx0 commented Feb 13, 2017

lukaszx0 commented Feb 13, 2017

zhangkun83 commented Feb 16, 2017

zhangkun83 commented Feb 16, 2017

lukaszx0 commented Feb 27, 2017

zhangkun83 commented Feb 27, 2017

lukaszx0 commented Feb 27, 2017

lukaszx0 commented Mar 6, 2017

zhangkun83 commented Mar 6, 2017

stephenh commented Jul 25, 2017

zhangkun83 commented Jul 25, 2017

stephenh commented Jul 26, 2017

zhangkun83 commented Jul 26, 2017

dapengzhang0 commented Sep 19, 2017

Implement channel-state API in ManagedChannelImpl #2292

Implement channel-state API in ManagedChannelImpl #2292

Comments

zhangkun83 commented Sep 21, 2016

lukaszx0 commented Feb 10, 2017

zhangkun83 commented Feb 10, 2017

lukaszx0 commented Feb 13, 2017

lukaszx0 commented Feb 13, 2017

zhangkun83 commented Feb 16, 2017

zhangkun83 commented Feb 16, 2017

lukaszx0 commented Feb 27, 2017

zhangkun83 commented Feb 27, 2017

lukaszx0 commented Feb 27, 2017

lukaszx0 commented Mar 6, 2017

zhangkun83 commented Mar 6, 2017

stephenh commented Jul 25, 2017

zhangkun83 commented Jul 25, 2017

stephenh commented Jul 26, 2017

zhangkun83 commented Jul 26, 2017

dapengzhang0 commented Sep 19, 2017