Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement channel-state API in ManagedChannelImpl #2292

Closed
zhangkun83 opened this issue Sep 21, 2016 · 16 comments
Closed

Implement channel-state API in ManagedChannelImpl #2292

zhangkun83 opened this issue Sep 21, 2016 · 16 comments
Assignees
Milestone

Comments

@zhangkun83
Copy link
Contributor

The API is added by #2286, but is implemented (will throw exception) in ManagedChannelImpl.

@zhangkun83 zhangkun83 added this to the 1.1 milestone Sep 21, 2016
@zhangkun83 zhangkun83 self-assigned this Sep 21, 2016
@zhangkun83 zhangkun83 modified the milestones: 1.2, 1.1 Jan 12, 2017
@lukaszx0
Copy link
Collaborator

@zhangkun83 I've talked with @ejona86 about this recently. What design do you have in mind?

Simply adding getState() to picker which would return aggregate of picker's subchannels? I'm assuming we also want to keep the "old" API for ManagedChannel ?

class ManagedChannel {
  State getState(boolean tryToConnect);
  void notifyWhenStateChanged(State lastObserved, Runnable callback);
}

@zhangkun83
Copy link
Contributor Author

@ejona86 has talked with me about that. The idea is that the channel will call SubchannelPicker.getState() only once, in updatePicker(), so that a new state would require a new picker.

It will work in the sense of implementing the channel state API, but I am yet to figure out how it should interact with pickSubchannel().

Conceptually, PickResult maps to the channel state:

withSubchannel -> READY
withError -> TRANSIENT_FAILURE
withNoResult -> IDLE, CONNECTING

Should we allow getState() to disagree with the pick results, e.g., getState() returns TRANSIENT_FAILURE and in the mean time pickSubchannel() returns working subchannels?

I'm assuming we also want to keep the "old" API for ManagedChannel ?

I am not sure what you are referring to.

@lukaszx0
Copy link
Collaborator

I am not sure what you are referring to.

I mean that API for ManagedChannel2, the snippet that I've pasted.

It will work in the sense of implementing the channel state API, but I am yet to figure out how it should interact with pickSubchannel().

That's a tricky one. I'm not sure the ManagedChannel state should be tied to the result returned by pickSubchannel or maybe I'm misunderstanding the definition and semantics of channel's state.

In my opinion ManagedChannel state should reflect the state of underlying subchannels, in aggregate. Interaction with pickSubchannel would mean that channel's state could be affected / changed by affinity of a an RPC call (or Metadata).

The idea is that the channel will call SubchannelPicker.getState() only once, in updatePicker(), so that a new state would require a new picker.

I like the idea and with what I wrote above, I think it should be as simple as adding abstract ConnectivityState getState() to the SubchannelPicker and let users arbitrarily set it based on the current state of subchannels managed by LoadBalancer.

I'm not convinced it should interact with pickSubchannel at all - state of the managed channel should already be known and defined at this point.

@lukaszx0
Copy link
Collaborator

And since ManagedChannel's state, in my opinion, should be based on the underlying subchannels state, I think we should have a first class API for setting and getting state of Subchannel (something we should do regardless, IMO). Currently the convention is to convey it in Subchannel's attributes.

@zhangkun83
Copy link
Contributor Author

I mean that API for ManagedChannel2, the snippet that I've pasted.

I was confused by you saying that ManagedChannel interface is "old API". Anyway, we have no intent to change the ManagedChannel API.

I'm not convinced it should interact with pickSubchannel at all - state of the managed channel should already be known and defined at this point.

I mean today's pickSubchannel() already implies the state of the channel to some extent. For example, when there is no READY subchannel, an LB would create a picker that always returns NO_RESULT. If there was a getState() on the picker, it would be unreasonable to return READY. However, there isn't a mechanism to prevent that. The discrepancy between pickSubchannel() and `getState() is possible because they are not orthogonal.

That said, it's not a deal-breaker. We can live with it if we don't have any better options, but I would like to explore other options.

@zhangkun83
Copy link
Contributor Author

And since ManagedChannel's state, in my opinion, should be based on the underlying subchannels state, I think we should have a first class API for setting and getting state of Subchannel (something we should do regardless, IMO). Currently the convention is to convey it in Subchannel's attributes.

We don't set state to the Subchannel, because Subchannel manages state on its own. A state getter on Subchannel can race with state changes, and it will be non-trivial to make it right and it won't work well with the serialized threading model of LBv2.

@lukaszx0
Copy link
Collaborator

The discrepancy between pickSubchannel() and `getState() is possible because they are not orthogonal.

Yes, that's actually big issue and what I was trying to point out. Results of pickSubchannel are nondeterministic in a way that we can't predict what kind of logic users will implement and by nature it is dynamic based on the affinity and call options.

My point is that I think we can't depend on PickResult returned from pickSubchannel() because following scenario is possible:

  1. LB creates 3 subchannels
  2. LB returns SubchannelPicker with 3 healthy subchannels
  3. ManagedChannel should report READY state.
  4. User makes a call with a custom callOptions.
  5. Call fails with PickResult.withNoResult or PickResult.withError depending on the pickSubchannel impl. The reason for that might be because it didn't return any subchannel for given callOptions (for example user could've been targeting the host in a different DC and this DC was not included in the subchannel list because name resolved didn't returned it)

Now my question is if ManagedChannel should transition to TRANSIENT_FAILURE or IDLE because of that? It could but it depends on what's the definition of ManagedChannel's state, which is something that I'm currently confused about because originally assumed that I'll be reflecting the health of its subchannels (so after 4) it shouldn't change the state and ManagedChannel should continue to be READY).

@zhangkun83 thoughts?

@zhangkun83
Copy link
Contributor Author

@lukaszx0 your confusion is expected. This is because the connectivity state spec is modeled after a single connection or a set of equivalent connections (like in the stock pick-first and round-robin LBs). For those cases the scenario you described should never happen.

If the channel works for some requests but not for the others, the spec doesn't define what state the channel should be in as a whole, and we leave the decision to the LB implementation. Because the channel's state is going to be consumed by the user, e.g., for displaying connectivity state on the GUI or for delaying RPCs until the channel is "READY", you should return whatever state you expect your user to see.

Discussing this with you has allowed me to re-think my concern that pickSubchannel() and getState() may disagree, and now it looks more acceptable to me. getState() represents the state of the channel as a whole, and pickSubchannel() represents the result for a particular request, and in the scenario you described, they can disagree.

@lukaszx0
Copy link
Collaborator

This is because the connectivity state spec is modeled after a single connection or a set of equivalent connections (like in the stock pick-first and round-robin LBs). For those cases the scenario you described should never happen.

Yeah, this is right and is a safe assumption for implementations which don't let provide custom LB implementations (I don't know how C or Go deals with it, I might be wrong and they might be as pluggable as Java).

Discussing this with you has allowed me to re-think my concern that pickSubchannel() and getState() may disagree, and now it looks more acceptable to me. getState() represents the state of the channel as a whole, and pickSubchannel() represents the result for a particular request, and in the scenario you described, they can disagree.

Yes! Looks like we're on the same page now 😉 I'll take a stab at it later today and we can continue iterating in PR.

@lukaszx0
Copy link
Collaborator

lukaszx0 commented Mar 6, 2017

@ejona86 wrote:

At the very least, you would exit idle mode. You could request an InternalChannel from the LB and then obtainActiveTransport(). But it's not clear how this boolean should work when using round robin. I guess that's because it's not clear what the state should be when using round robin.

That is to say, we need to work on the spec to define it.

Soooo is it acceptable to just drop this param? Or do we need to have strict API-compat with C version?

@zhangkun83
Copy link
Contributor Author

Because round-robin eagerly creates connection, the requestConnection argument in getState() simply doesn't make any difference. However, this argument does make a difference for any lazy-connection LBs. ManagerChannelImpl should probably pass this signal to LoadBalancer, and let LoadBalancer decide what to do.

@stephenh
Copy link
Contributor

This hasn't been updated in a few months; any chance it'll be in a pretty-soon-ish upcoming release?

I'm working on an application where I want mosh-style magical-reconnects as the network comes/goes and AFAIU grpc-java would do that out-of-the-box with keep alives enabled on a ManagedChannel + this API implemented to let my app know when the transport goes up/down.

@zhangkun83
Copy link
Contributor Author

The state plumbing in channel has been implemented by #3019, although no LoadBalancer yet supports it. /cc @dapengzhang0

@stephenh
Copy link
Contributor

Nice, thanks @zhangkun83! I don't need load balance support, so I'll start poking around at this.

@zhangkun83
Copy link
Contributor Author

To clarify, even if you don't need load balancing, you will be using the default PickFirstBalancer, which is yet to support the channel states.

@dapengzhang0
Copy link
Member

Resolved by #3019 #3300 #3311 #3327

@ejona86 ejona86 modified the milestones: Next, 1.6 Sep 25, 2017
dcow pushed a commit to dcow/grpc-java that referenced this issue Nov 17, 2017
ManagedChannel now supports the getState/notifyWhenStateChanged API (grpc#2292).
ejona86 pushed a commit that referenced this issue Nov 17, 2017
ManagedChannel now supports the getState/notifyWhenStateChanged API (#2292).
ejona86 pushed a commit to ejona86/grpc-java that referenced this issue Nov 17, 2017
ManagedChannel now supports the getState/notifyWhenStateChanged API (grpc#2292).
ejona86 pushed a commit that referenced this issue Nov 17, 2017
ManagedChannel now supports the getState/notifyWhenStateChanged API (#2292).
@lock lock bot locked as resolved and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants