A34 `weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response #202

yishuT · 2020-08-17T05:35:14Z

No description provided.

linux-foundation-easycla · 2020-08-17T05:35:16Z

The committers are authorized under a signed CLA.

✅ Yi-Shu Tai (0b0adb0, 519d3ab)

markdroth · 2020-08-17T15:22:41Z

Please fix the commit to include your user ID, as per the EasyCLA check above.

yishuT · 2020-09-07T23:13:59Z

Hey, would like to hear an update even you think this proposal is far from the roadmap. I personally have some cycles in coming weeks to make this proposal aligned with grpc roadmap.

markdroth

Sorry for taking so long to get to this!

Please let me know if you have any questions about any of this.

A34-weighted-round-robin.md

markdroth · 2020-09-08T16:28:02Z

A34-weighted-round-robin.md

+
+The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.
+
+### Overview of `weighted_round_robin` policy


I'd prefer to call this policy something other than "weighted_round_robin", since we are planning to eventually provide a policy with that name that works slightly differently: instead of getting the endpoint weights along with the addresses, the policy will automatically compute the endpoint weights based on utilization data reported by the endpoints.

I can't offhand come up with a really ideal suggestion for a better name here. I can think of the following options, none of which seem quite ideal:

address_weighted_round_robin

injected_weighted_round_robin

weighted_round_robin_with_injected_weights

I'm open to other options here.

How about using "utilization_weighted_round_robin" for the other policy? That policy could potentially be implemented as a wrapper of this policy, FWIW.

That's an interesting idea -- I hadn't thought of wrapping one with the other.

In order to do this, we'd have to use the same WRR algorithm in both cases. @menghanl, you've looked into how our internal WRR algorithm works. How does that compare to the EDF algorithm proposed here?

Wrapping one with another means the wrapper needs to convert the loads into weights, and send the new weights along with the addresses to the child whenever the load changes?

I would think if we make an EDF scheduler that's capable of handling both situations, the two balancers can reuse this scheduler implementation.

The WRR algorithm is similar to the internal one. I will double check the details.

I have no preference toward the name. Per discussion here, I would leave it as weighted_round_robin and other policy can build upon it. Or maybe weighted_round_robin_by_edf to indicate underlying impl?

@menghanl, did you have a chance to look up how the algorithm compares to the internal WRR algorithm?

I guess for now, let's call this edf_weighted_round_robin.

A34-weighted-round-robin.md

dfawley · 2020-09-08T23:12:39Z

A34-weighted-round-robin.md

+
+The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.
+
+### Overview of `weighted_round_robin` policy


How about using "utilization_weighted_round_robin" for the other policy? That policy could potentially be implemented as a wrapper of this policy, FWIW.

dfawley · 2020-09-08T23:13:59Z

A34-weighted-round-robin.md

+    // secondary key for the priority queue. Used as a tiebreaker for same deadline to maintain FIFO order.
+    uint64 order_offset;


How this works needs to be documented. Is the smaller value preferred? What if there is a tie here? Is that illegal? Do we validate that?

dfawley · 2020-09-08T23:16:17Z

A34-weighted-round-robin.md

+```
+
+- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation.
+- `weighted_round_robin` updates the entries in EDF priority queue on any change of subchannel `ConnnectiveState` or endpoint `load_balancing_weight`.


Should that be ConnectivityState?

dfawley · 2020-09-08T23:16:55Z

A34-weighted-round-robin.md

+
+- On each call to the `Pick`, EDF picker picks the entry `e` on the top of the queue, returns the subchannel associated with the entry. After that, picker updates the `deadline` of `e` to `e.deadline + 1/weight` and either performs a pop and push the entry back to the queue or key increase operation.
+- `weighted_round_robin` updates the entries in EDF priority queue on any change of subchannel `ConnnectiveState` or endpoint `load_balancing_weight`.
+- If all endpoints have the same `load_balancing_weight`, `EDF` picker degenerates to `round_robin` picker. It's easier to reason and consistent with envoy.


Do we use the order_offset to determine the order in this case?

A34-weighted-round-robin.md

dfawley · 2020-09-08T23:22:03Z

cc @menghanl

A34-weighted-round-robin.md

menghanl · 2020-09-08T23:58:05Z

A34-weighted-round-robin.md

+
+The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.
+
+### Overview of `weighted_round_robin` policy


This section needs to specify:

format of balancing policy config (will always be empty strings for for this policy, because there's no policy level config)

addresses (with weights in address.attributes)

menghanl · 2020-09-09T00:00:55Z

A34-weighted-round-robin.md

+/*
+`Pick` picks the EdfEntry on top of the queue, returns the underlying SubChannel and updates deadline of the EdfEntry.
+*/
+PickResult Pick(PickArgs args)


Pick would be only public API of picker. The other "update"s should be internal to the balancing policy.

menghanl · 2020-09-09T00:03:13Z

A34-weighted-round-robin.md

+
+### On new `ClusterLoadAssignment`
+
+#### endpoints have weight change


It would be helpful to add more details of how the change will be handled. E.g., only the weight is updated, but not the deadline, so the entry at top of the queue is still at the top?

Similar for the following two updates. This helps to keep the implementations consistent cross languages.

yishuT · 2020-09-14T04:34:39Z

Thanks for all the feedback. I will post another update to address all of them next couple of days. Please feel free to leave any comments in the meantime.

yishuT · 2020-10-14T04:52:34Z

soft ping. any udpates?

markdroth

Sorry for the delay on this!

Please let me know if you have any questions. Thanks!

A34-weighted-round-robin.md

markdroth · 2020-10-30T19:21:12Z

A34-weighted-round-robin.md

+
+The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.
+
+### Overview of `weighted_round_robin` policy


@menghanl, did you have a chance to look up how the algorithm compares to the internal WRR algorithm?

I guess for now, let's call this edf_weighted_round_robin.

markdroth · 2020-10-30T19:25:18Z

A34-weighted-round-robin.md

    uint64 order_offset;
-    // `load_balancing_weight` of this endpoint from the latest `ClusterLoadAssignment`.
+
+    // `load_balancing_weight` of this endpoint from address attribute of this endpoint.
    double weight;


Why is this a double instead of a uint32_t? The endpoint weight in the xDS API is an integer, and integers are usually a lot less error-prone to work with, because there's no rounding error.

Same question for the deadline field. Is there a way we can represent is as an integer instead of a double? For example, can we simply multiply by 1000 and say that we are storing milli-units?

I make weight double because of deadline is double. I am not sure we can use uint32 for deadline. On updating deadline of an entry after a pick, one way to implement (or at least envoy does)

new_deadline = deadline + 1/weight

if we want to use uint32, we likely need to

new_deadline = deadline + 1000/weight

or similar. This will need the assumption that per endpoint weight is never larger than 1000.

If the deadlines and weights are all multiplied by 1000, then there's no requirement that endpoint weight is never larger than 1000. We are doing exactly the same calculation as we would be doing with doubles, except that we're storing milliunits instead of whole units. The only thing we're losing is the ability to store more 3 decimal places of precision, and that seems like it shouldn't really affect the algorithm in a material way.

hmmm, sry, I am lost. how to handle the inverse of weight 1/weight, anything larger than the numerator is rounded to 0?

Okay, sorry, let me try this again.

The idea here is that you can do the actual division using a double, just don't use doubles for storage. I think we don't actually need to store the weight as a double at all; it comes as an integer in xDS, and we can keep it that way. We can then do the calculation as a double but immediately round the result for storage.

I think this can be done using something like this:

int32_t weight = <weight from xDS>; int32_t deadline += std::lround(1000.0 / weight);

Then we continue to use the lowest deadline value as the sorting key.

Does that seem reasonable, or is there something I'm missing about this algorithm? The latter is quite possible, since I haven't grokked it fully -- for example, if deadline is constantly increasing, how does it avoid wrap-around? Is it depending on the double representation to avoid that?

I think we'll know in the next few weeks.

I'm planning to make a more rigorous simulation over the weekend. Hopefully something we can just check in and use for other LBs. Main focus is on what the hosts in the pool experience w.r.t. syncing behavior. The perf numbers are more favoriable to IWRR, so we'll just need to see if there's a catch or unwanted change in behavior.

Stay tuned.

Howdy folks! Happy New Year

@tonya11en do we have a final decision that envoy will use iwrr to replace edf?

It looks like we'll be replacing EDF with WRSQ (in review envoyproxy/envoy#14681) for weighted RR among other things. More details on where it'll be used in envoyproxy/envoy#14597.

Weighted Least Request will still use the EDF scheduler.

Gotcha, thanks

A34-weighted-round-robin.md

markdroth · 2020-11-11T18:20:21Z

A34-edf-weighted-round-robin.md

+N/A
+
+## Open issues (if applicable)
+* How to desync to avoid all clients do synchronized pick?


I don't understand what this is referring to. Can you explain the problem here?

updated. Please let me know it's still not clear

Okay, I understand the problem. I see two possible solutions here:

The simple approach is, as you suggest, randomize the starting point in the list whenever we create a new picker, just like round_robin does. This will avoid having all clients hammer the same endpoints at the same time (and also, keep in mind that the control plane could actually send different results to different clients -- each client could see different endpoints or the same endpoints in a different order or with different weights). But it will disrupt the expected scheduling whenever the picker changes.

A more complicated solution would be to maintain the current scheduler state in the LB policy and use a mutex to synchronize access to it between the picker and the LB policy. This is more complicated to implement, and the synchronization imposes a performance penalty (because you have to acquire the mutex for every pick), but it should work.

yishuT · 2022-05-14T23:09:17Z

@markdroth I want to revisit this proposal. Does gRPC community have any plan for per endpoint weight LB policy? If there is no other active proposal or plan, I can resume the work this month

markdroth · 2022-05-16T16:17:35Z

We don't have any other plan to do this, so if it's still something you'd like, please feel free to continue moving this design forward.

s-matyukevich · 2023-01-30T20:21:37Z

I am confused about the status of this proposal. It is not yet merged, but looks like we are already using endpoint weights, at least in grpc-go.

Here EDS response is parsed and weights are stored.
Next we use the weight property to configure weighted round robin LB in the EDS (clusterresolver) load balancer link
We also use those weights in the ringhash EDS implementation link

yishuT · 2023-01-30T20:57:56Z

Hey @s-matyukevich

afaik, weights are stored but grpc client does not have weighted round robin implemented for every language. The proposal is to have a unified protocol for implementation.

yishuT added 2 commits August 17, 2020 08:53

Draft

0b0adb0

add grpc-io thread

519d3ab

yishuT force-pushed the A34 branch from 942c0d2 to 519d3ab Compare August 17, 2020 15:55

markdroth reviewed Sep 8, 2020

View reviewed changes

markdroth requested review from dfawley and ejona86 September 8, 2020 16:48

dfawley reviewed Sep 8, 2020

View reviewed changes

menghanl reviewed Sep 9, 2020

View reviewed changes

yishuT added 3 commits September 20, 2020 22:34

wrr part

5a694c0

xDS part

d1e010e

description of wrr

311cb4e

markdroth reviewed Oct 30, 2020

View reviewed changes

yishuT added 3 commits November 8, 2020 21:53

change file name, and address CR

d858803

add connectivity management section

fe7b887

add open issues and store weight as uint32

577f84d

markdroth reviewed Nov 11, 2020

View reviewed changes

yishuT added 3 commits November 11, 2020 13:22

better describe the issue

e8e0c9b

switch to wrsq

76c0ecd

fix wording

45169f7

markdroth mentioned this pull request Nov 5, 2021

A48: xDS Least Request LB Policy #269

Merged

markdroth mentioned this pull request Jan 3, 2023

Weighted round robin in xDS mode grpc/grpc#32004

Closed

markdroth mentioned this pull request Jan 30, 2023

A58: Weighted Round Robin LB Policy #343

Merged

markdroth mentioned this pull request Aug 26, 2023

A68: Deterministic Subsetting LB policy #383

Closed


		The proposal is to carry [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) of [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) from [`ClusterLoadAssignment`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint.proto#L34) response to `lb_policy` and implement a `weighted_round_robin` policy based on Earliest deadline first scheduling algorithm picker [EDF](https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling). Each endpoint will get fraction equal to the [`load_balancing_weight`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L108) for the [`LbEndpoint`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L76) divided by the sum of the `load_balancing_weight` of all `LbEndpoint` within the same [`LocalityLbEndpoints`](https://github.com/envoyproxy/envoy/blob/2dcf20f4baf5de71ba1d8afbd76b0681613e13f2/api/envoy/config/endpoint/v3/endpoint_components.proto#L116) of traffic routed to the locality.

		### Overview of `weighted_round_robin` policy

		// secondary key for the priority queue. Used as a tiebreaker for same deadline to maintain FIFO order.
		uint64 order_offset;


		### On new `ClusterLoadAssignment`

		#### endpoints have weight change

A34 weighted_round_robin lb_policy for per endpoint weight from ClusterLoadAssignment response #202

Are you sure you want to change the base?

A34 weighted_round_robin lb_policy for per endpoint weight from ClusterLoadAssignment response #202

Conversation

yishuT commented Aug 17, 2020

linux-foundation-easycla bot commented Aug 17, 2020 • edited Loading

markdroth commented Aug 17, 2020

yishuT commented Sep 7, 2020

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfawley commented Sep 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yishuT commented Sep 14, 2020

yishuT commented Oct 14, 2020

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonya11en Jan 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yishuT commented May 14, 2022 • edited Loading

markdroth commented May 16, 2022

s-matyukevich commented Jan 30, 2023

yishuT commented Jan 30, 2023

A34 `weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response #202

A34 `weighted_round_robin` lb_policy for per endpoint weight from `ClusterLoadAssignment` response #202

linux-foundation-easycla bot commented Aug 17, 2020 •

edited

Loading

tonya11en Jan 23, 2021 •

edited

Loading

yishuT commented May 14, 2022 •

edited

Loading