Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing async functions for loadbalancer/picker #1771

Open
davendu opened this issue May 1, 2021 · 6 comments
Open

Allowing async functions for loadbalancer/picker #1771

davendu opened this issue May 1, 2021 · 6 comments

Comments

@davendu
Copy link

davendu commented May 1, 2021

Problem

I'm writing a customized loadbalancer and picker (implementing grpc.experimental.Picker and grpc.experimental.LoadBalancer). In my implementation, the loadbalancer is meant to get load status from a remote endpoint, and picker would always check the remote endpoint to see if the subchannel is in "disabled" state.

For example, in Picker.pick, to get information from my remote endpoint, I need to call the function like this:

const disabled = await remoteClient.GetIfEndpointIsolated(host, port, pickArgs.metadata.get("uid"))

Currently, LoadBalancer.updateAddressList and Picker.pick are all sync functions, and not accepting a Promise<>. Also, the return value can not be set via a callback. This makes it impossible to call the remote endpoint in those two functions.

However, the same way of checking information from remote endpoint works well for the Golang and C++ version since in those languages Promise works in a different way.

Possible solution

Add async to LoadBalancer.updateAddressList and Picker.pick like this:

...
  async pick(pickArgs: PickArgs): Promise<PickResult>;
...

Or add pickAsync as an alternative.

I had thought if callback is acceptable, but I think it would make no differences than using await.

Possible Problems

It might not be a wise idea to make Pick and Loadbalancer costing too much time, considering they are frequently called. However, allowing Promise here does not mean users have to call remote (they can still write a simple picker and choose subchannel without external information`. Additionally, considering customizing those two objects are in experimental state and if you are writing code for them, you should have considerable knowledge of how gRPC call works, users should make their own responsibility if they take the risk to call remote endpoint for additional information for picking.

@murgatroid99
Copy link
Member

I responded to the Gitter message but I got no response so I'll say the same thing here:

First I want to clarify that updateAddressList isn't really synchronous, in the sense that it provides its results to the caller by calling channelControlHelper.updateState, and it is not required to call that function synchronously. In fact, all existing load balancers call that function in response to asynchronous events.

For leaf load balancers (pick first and round robin), those events are generally subchannel state changes.

I suggest also looking at the CDS and EDS load balancers in the grpc-js-xds package. Those both make remote requests and use the asynchronous responses to construct child load balancers, which eventually provide state updates.

@davendu
Copy link
Author

davendu commented Jun 29, 2021

Hi. I did not notified Gitter message, sorry.

For updateAddressList I actually also called updateState in the async callback. I think maybe adding async is a sugar.

For Picker.pick, I'm not sure if it can be asynchronous, since the subchannel is directly returned. Actually, our loadbalancer has a async remoteSelect() method to choose endpoint from a list of {host}:{port}. I was hoping to build subchannels for all endpoints in updateAddressList, and call await remoteSelect in pick(), so I don't have to implement load balancer's logic again.

@murgatroid99

@murgatroid99
Copy link
Member

The picker API was designed for a load balancing implementation in which the list of endpoints may be determined using an asynchronous operation, but the client can synchronously determine which of those resulting subchannels to use. It simply does not support performing an additional asynchronous operation for every request to determine which subchannel to use. That pattern adds a lot of per-request overhead, which we want to minimize.

@davendu
Copy link
Author

davendu commented Jun 29, 2021

Understand, and that's what I was worry about.

However the whole load balancer is not so easy to implement in picker's style, and the per-request async operation seems necessary. Do you suggest any other way to implement this?

@murgatroid99
Copy link
Member

Can you share more details about how your load balancing implementation works, and why you need to perform the last step of load balancing asynchronously at the time the request is made?

@davendu
Copy link
Author

davendu commented Jun 30, 2021

The load balancing framework is kinda complex. It's not designed for gRPC, so it has several components which:

  • Maintain a list of healthy nodes get from state server. The list is maintained by a background timer.
  • Return a node from this list, then receive call status from framework's caller (after call finished).
  • Update local list by the call status. If failed for too many times, mark it locally as unhealthy.
  • If necessary, sync the local list to the state server. If too many failures on same target, state server will remove it.
  • Lots of other stuffs...

So basically, this framework itself is similar to gRPC's own subchannel management mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants