[load balancer] Introducing deterministic aperture load balancing. #22991

conqerAtapple · 2022-09-06T00:04:32Z

Based on the discussions in issue:#17013, this change introduces implementation of deterministic aperture load balancing algorithm. The implementation matches the original finagle implementation (https://github.com/twitter/finagle/tree/develop/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer).

Risk Level: Medium
Testing: Unit/Integration, Mannual
Docs Changes: TBD
Release Notes: TBD
Platform Specific Features: N/A
API Changes: TBD

Based on the discussions in issue:envoyproxy#17013, this change introduces implementation of deterministic aperture load balancing algorithm. The implementation matches the original finagle implementation (`https://github.com/twitter/finagle/tree/develop/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer`). Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

repokitteh-read-only · 2022-09-06T00:04:40Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #22991 was opened by conqerAtapple.

see: more, trace.

conqerAtapple · 2022-09-06T00:06:10Z

This is a draft based on #17013.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple · 2022-09-06T22:17:15Z

/retest

repokitteh-read-only · 2022-09-06T22:17:20Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #22991 (comment) was created by @conqerAtapple.

see: more, trace.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple · 2022-09-07T16:40:18Z

@markdroth @zuercher PTAL. This is an initial draft based on discussions in #17013.

Also, looks like the.broken build is unrelated to the PR?

zuercher · 2022-09-09T03:32:50Z

/retest

repokitteh-read-only · 2022-09-09T03:32:54Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #22991 (comment) was created by @zuercher.

see: more, trace.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple · 2022-09-11T22:26:30Z

/retest

repokitteh-read-only · 2022-09-11T22:26:34Z

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #22991 (comment) was created by @conqerAtapple.

see: more, trace.

conqerAtapple · 2022-09-12T00:44:42Z

@zuercher I think at this point the CI errors are coverage related which is expected as this is WIP/Draft. Could use eyes on the PR. Thanks!

ggreenway · 2022-09-12T16:06:16Z

@zuercher assigning to you as you have expertise in related areas. Feel free to re-assign if needed. But this has sat unassigned for awhile so trying to get it some attention.

zuercher

Took a first pass through this. The general shape seems good, but I think a bunch of the range checking can be elided if we make better use of const values and assertions.

api/envoy/config/cluster/v3/cluster.proto

source/common/upstream/deterministic_aperture_lb.h

source/common/upstream/deterministic_aperture_lb.cc

zuercher · 2022-09-14T15:55:01Z

source/common/upstream/deterministic_aperture_lb.cc

+  if (offset < 0 || (offset >= unit_width_ * ring_size_)) {
+    return absl::nullopt;
+  }
+  return offset / unit_width_;


Would we get better precision if this was just offset * ring_size_ since unit_width_ is 1/ring_size_?

We could. But i think we will have to do a modulus operation also ((offset * ring_size_)%ring_size)

zuercher · 2022-09-14T16:03:36Z

source/common/upstream/deterministic_aperture_lb.h

+    HostConstSharedPtr chooseHost(uint64_t hash, uint32_t attempt) const override;
+
+    //
+    // Utility Ring methods. Note that they are public so that we can add unit tests easily.


I think we normally prefer not to expose methods just for testing. We have cases where we mark a test as a friend class for this kind of direct testing of class internals, but I wonder if it's not sufficient to provide a mocked RNG to test the different paths.

source/common/upstream/ring_hash_lb.h

source/common/upstream/upstream_impl.h

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple · 2022-09-17T02:22:48Z

@zuercher Thanks for reviewing and thoughtful comment. I hav addressed most of the points you raised in the comments.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

zuercher

Generally looks good. Have you looked at the code coverage report to see that you have good coverage of the corner cases in weight and tryPickSecond?

source/common/upstream/deterministic_aperture_lb.cc

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple · 2022-11-23T18:59:56Z

@adisuissa @wbpcode @markdroth would need your advice on docs.

wbpcode · 2022-11-24T01:51:07Z

Thanks for all your great work and patience.

API is more important than the implementation. It would be better to wait for the cc @markdroth's review comments to API changes first.

/wait

wbpcode · 2022-11-24T01:49:06Z

api/envoy/config/cluster/v3/cluster.proto

+    envoy.extensions.load_balancing_policies.deterministic_aperture.v3.DeterministicApertureLbConfig
+        deterministic_aperture_lb_config = 57;


I don't think this is the correct way to use load balancing extension.

See #23472 as a example.

We won't added new field to Cluster for specific load balancing policy. All load balancing extension should be configured by the load_balancing_policy (wrapper of TypedExtensionConfig).

@wbpcode I understand what you are trying to say. Unfortunately currently the only way to get the load balancer configuration in the Factory is via the ClusterInfo object. So we will have to convert the loadBalancingPolicy to DeterministicApertureLbConfig. Is there an example for that?

edit: I think the PR example #23472 helps a lot. Thanks!

@wbpcode I have made the suggested changes (from #23472). doc build still fails :(

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple · 2022-11-28T17:44:23Z

@wbpcode I think the load balancer now follows the extension pattern. docs build fails for some reason. It is not very obvious what the reason might be.

markdroth · 2022-11-28T19:41:23Z

...oy/extensions/load_balancing_policies/deterministic_aperture/v3/deterministic_aperture.proto

+// [#extension: envoy.load_balancing_policies]
+message DeterministicApertureLbConfig {
+  // Hash ring configuration for the internal ring for endpoints.
+  ring_hash.v3.RingHash ring_config = 1;


Is this new policy ring-hash-specific? If so, maybe instead of implementing this as a separate policy, it should just be a new config field in the existing ring-hash policy?

That having been said, I think a better approach would be to change this such that it is not actually specific to the ring_hash policy in the first place. I haven't looked at the implementation to see how tightly coupled this is with ring_hash, but if the overall goal here is to allow the LB policy to choose from only a subset of the endpoints, then presumably the child policy here could be any policy, even something other than ring_hash. In that case, this field could be replaced with:

config.cluster.v3.LoadBalancingPolicy child_policy = 1;

@markdroth I agree. The policy is not specific to Ring hash but for now that is the policy. Ring was picked as it was a natural/easy fit. We could change the field as you suggested but the implementation would be only for Ring hash for now.

edit: Another option is to have this load balancer's configuration have its own Ring which then internally gets translated to the RingHash's configuration. This way, we keep them separate but reuse the RingHash's implementation internally.

But why is the implementation of this new subsetting policy dependent upon implementation details of the child policy in the first place? It seems to me that this policy should be self-contained: it should determine which subset of hosts the child policy should be looking at and then delegate to the child policy to pick from that set of hosts. It should not care how the child policy is actually doing that pick.

If we agree that this is the long-term approach that we want here, then we should definitely not bake any dependency on ring_hash into the API itself. So I would definitely prefer to see a generic child policy config here.

I'll defer to @wbpcode in terms of the implementation, but in principle, it seems wrong to me to implement this in a way that works only with ring_hash, regardless of the API representation. It seems like this is just unnecessarily breaking abstractions by exposing inner workings of the ring_hash policy, and it is unnecessarily constraining to provide this functionality only in conjunction with the ring_hash policy.

I would really prefer to see this implemented in a way that is completely agnostic to the details of the underlying child policy.

I think this might need some explanation :). The deterministic aperture lb uses a ring to place its endpoints but then overlays the Envoy peers ring on top to split the endpoints equally among the peers. This is well explained here : https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.

So in short, the load balancer needs a ring to place its endpoints. We already have a Ring implementation in Envoy so the thought process is to reuse that.

I understand that the algorithm here is using a ring, and I agree that it makes sense to reuse the existing ring implementation in Envoy. The thing I'm objecting to is the way we're accomplishing that reuse. This PR is currently doing that by tying the deterministic_aperature policy to the ring_hash policy, which (a) prevents flexibility by not allowing deterministic_aperature to be used with other child policies, even though in principle its functionality seems orthogonal to that of the child policy, and (b) breaks the abstraction of the ring_hash policy by tying another policy to its internals, which will make it harder to make changes to ring_hash in the future.

I think a cleaner way to structure this would be to refactor the ring implementation out of the ring_hash policy and into its own library, and then use that ring library in both the ring_hash policy and in the deterministic_aperature policy. That way, you can still reuse the existing ring implementation, but nothing in the deterministic_aperature policy will have to reach inside of the ring_hash policy, and it will allow using deterministic_aperature with any child policy.

I don't think we need a separate RingConfig message. I think it would be fine to just copy the relevant fields from the RingHash message to the new DeterministicApertureLbConfig message.

But otherwise, yes, this sounds good!

Thanks @markdroth . Since the changes discussed here implies we change the existing RingHash load balancer to use the new refactored Ring implementation, I am wondering if that will increase the scope of this PR. One way to do this could be to not change the existing RingHash load balancer in this PR but have only DeterministicAperture load balancer use the new the Ring library. Then when we move the RingHash load balancer to the new extension model, change it to use the RIng library. I can give a shot at moving the RingHash load balancer to the extension model in another PR.

I'll let @wbpcode weigh in here, but IMHO, refactoring the ring functionality out of the ring_hash LB policy really has nothing to do with converting the ring_hash LB policy to be an extension. I think if you're going to be moving that functionality into its own library, it makes sense to change ring_hash to use that library in the same PR, rather than just duplicating the functionality and then removing the duplication later.

Got it. Thanks for the clarification 👍

@markdroth I have refactored out the Ring concept. PTAL.
The docs build is still broken unfortunately.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

markdroth

The overall structure here looks good!

I'll let @wbpcode and the Envoy maintainers review the actual code. I have just one small comment on the API.

markdroth · 2022-12-06T01:02:11Z

...oy/extensions/load_balancing_policies/deterministic_aperture/v3/deterministic_aperture.proto

+// uniform random distribution to select a backend from the range of overlapping backends. This
+// along with P2C ensures uniform load distribution.
+// [#next-free-field: 7]
+message DeterministicApertureLbConfig {


This config still needs a field for the child policy config, or else there's no way to tell it what policy to actually use to pick from amongst the selected subset of endpoints. I think you need to add something like this:

config.cluster.v3.LoadBalancingPolicy child_policy = 7 [(validate.rules).message = {required: true}];

@markdroth Thanks for looking. Not sure what you mean. The DeterministicAperture load balancer policy is itself a LbPolicy of type LOAD_BALANCING_POLICY_CONFIG. The integration test in the PR shows the usage.

I'm confused. I thought that what this policy provided was a way to restrict the LB pick to a subset of the endpoints. Once we do that, we still need to define the policy to be used to pick from that subset of endpoints, and we were going to do that by delegating to a child policy.

It seems to me that we have two separate pieces of functionality here:

Choosing what subset of endpoints to use.

Choosing from among those endpoints for a particular request.

I think it makes sense to be able to configure these two things independently of each other. It seems useful for users to be able to decide whether or not to do the subsetting defined in this policy (item 1) and then independently choose what LB policy to use to pick from among the chosen subset of endpoints (item 2). For example, it should be possible to configure either ring-hash or round-robin for (2).

I think the right way to do this is to delegate to a child policy for (2), so that users can plug in whatever child policy they want.

I'm not sure what algorithm you're actually using for (2) here, but if it's not one of the existing policies, then I think it would make sense to add that as a separate child policy, which can be configured as a child of this one (or could even be used on its own, without the subsetting functionality provided by the deterministic_aperature policy).

@markdroth The policy here is at the same level as other load balancers in Envoy. So deterministic aperture is able to choose host from the subset of endpoints. This subset is derived by the deterministic aperture load balancer by overlapping the ring of endpoints with the ring of participating Envoys. So in short the load balancer here is self contained (like others - aka Maglev, Ring etc). Please let me know if that makes sense.

I understand that this is intended to be used as a top-level LB policy. But that doesn't mean that it can't delegate to another policy for the actual pick from among the filtered subset of hosts.

What is the actual picking algorithm used here? Is there any reason that that algorithm needs to be tied to the subsetting behavior? If not, it seems more flexible to delegate to a child policy.

I understand what you are proposing, but I don't think it makes any sense to create a new interface here. The whole point of an LB policy is to provide the endpoint picking behavior across a set of endpoints, so creating a new interface to do the same thing makes no sense. That would add needless complexity and prevent people from using any of the existing policies as child policies.

To be clear, what I am suggesting here is that the P2C child policy should create its own ring rather than reusing the ring from the parent subsetting policy. There should not be any need to share any state between the parent and child policy except for the information that can be passed through the existing LB policy API.

I am missing something here. The P2C Policy with its own Ring is the current state of the PR. The current PR exposes the P2C Policy + Ring as a load balancing policy which can be used as shown in the integration test.

What I am saying is that there should be two different rings here: one in the parent policy that does subsetting, and one in the child policy that picks the endpoint from the selected subset. You should not need to pass the ring through the LB policy API when creating the child.

The child (P2C) would need access to the parent Ring (to ask what are my subsets). I am not sure in the case of P2C, if it needs a Ring for itself since all it needs is the section of the Ring it can pick from. The parent Ring provides API for child to ask give me the ring subsection(subset) so that I can apply my policy. Some other policy might decide to create its own Ring with the subset information but that depends on that policy.

Let's step back for a moment here. Let's set aside the details of the specific algorithm used in the two policies and instead talk about what each of the two policies is responsible for and how they interact with each other.

The LB policy API basically gives the policy a set of hosts and then asks it to pick from that set of hosts for each request. We want both the parent and the child policy to use that same API.

The goal of the parent policy is to decide what subset of hosts the child should pick from. In effect, the parent policy does not actually do any picking; all it does is filter the set of hosts that it is given so that it passes only a subset of those hosts to the child policy.

The goal of the child policy is to pick from the set of hosts that it is given by its parent. It should not matter to the child whether its parent is Envoy (i.e., it is the top-level policy) or whether its parent is another LB policy that did subsetting before delegating to it; either way, it is just responsible for picking from amongst the hosts it's been given.

I think the important properties here are:

Both of these policies are implemented using the existing LB policy API, not creating some new API to do the same thing.

There should be no communication between the two policies that cannot be expressed via the existing LB policy API.

So getting back to this specific situation, I don't think we want an API for the child to refer back to the ring in the parent policy, because that would tie the child to this parent; the child would no longer be able to be used by itself without the parent. Instead, I think we want the parent policy to build the ring, decide which subset to use, and then construct a list of hosts in that subset. The parent policy will then pass that list of hosts to the child in the same format as would be expected by any LB policy.

If the child policy doesn't actually need a ring for its algorithm, then it doesn't need to construct a ring. It can just do a uniform random distribution pick from among the set of hosts that are passed to it.

KBaichoo · 2023-01-09T16:22:27Z

/wait

For merge conflicts, API changes discussion

zuercher · 2023-02-27T19:45:13Z

/wait

Is this PR still a going concern? Should we close it and wait for a future PR that addresses the API issues?

conqerAtapple · 2023-02-27T21:24:43Z

@zuercher Sorry i got pulled into another task so did not get any time for this PR.

Need understanding of what @markdroth is proposing.

Thanks for the suggestion @markdroth

I am not sure if i completely understand the two layer policy. I am not sure why subsetting LB is part of this LB policy at all. The idea of this algorithm is to provide another LB algorithm (not a subsetting algorithm). This LB algorithm serves the same purpose as of all other core algorithms (example RingHash or RoundRobin). So maybe @markdroth could help me by explaining how RoundRobin algorithm could be implemented (assuming it was not already part of the current core implementations) in terms of his proposal. This will help me grasp the proposal.

github-actions · 2023-03-30T00:02:51Z

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions · 2023-04-06T04:01:25Z

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

conqerAtapple added 2 commits September 5, 2022 16:02

format.

2d53ba4

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

repokitteh-read-only bot added the api label Sep 6, 2022

repokitteh-read-only bot assigned markdroth Sep 6, 2022

conqerAtapple added 4 commits September 5, 2022 18:44

Fixed Formatting

1db992d

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

format - spellcheck.

3ca8eb2

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Fixed test build.

3b76ab6

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Fixed test build.

0900b07

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

format.

d3f9996

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple added 2 commits September 9, 2022 09:10

Merge branch 'main' into draft_deterministic_lb

0e72319

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Reverted back the test iteration count

7b13163

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

ggreenway assigned zuercher Sep 12, 2022

zuercher reviewed Sep 14, 2022

View reviewed changes

Code review changes.

c4df5b1

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

conqerAtapple added 3 commits September 20, 2022 09:23

Added back weight method.

75b7a32

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

format.

59215ef

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Merge branch 'main' into draft_deterministic_lb

ba41694

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

zuercher reviewed Sep 20, 2022

View reviewed changes

source/common/upstream/deterministic_aperture_lb.cc Outdated Show resolved Hide resolved

source/common/upstream/deterministic_aperture_lb.cc Outdated Show resolved Hide resolved

conqerAtapple added 3 commits November 22, 2022 19:15

Cleanup.

b33c6d8

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

fix build.

cf697ec

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Build.

feb70d5

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

repokitteh-read-only bot added the waiting label Nov 24, 2022

wbpcode reviewed Nov 24, 2022

View reviewed changes

Moving to extensions template.

e0112fd

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

repokitteh-read-only bot removed the waiting label Nov 24, 2022

conqerAtapple added 4 commits November 24, 2022 22:08

Adding missing files.

33356b6

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Fix extension type.

20852e8

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

format.

ec8a794

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Removing ring_hash metadata.

e486c33

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

markdroth reviewed Nov 28, 2022

View reviewed changes

conqerAtapple added 4 commits December 2, 2022 16:30

Refactored out Ring implementation.

e233057

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

fix format.

8b1874e

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Merge branch 'main' into draft_deterministic_lb

ac1f192

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

Format.

166c5c7

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>

markdroth reviewed Dec 6, 2022

View reviewed changes

repokitteh-read-only bot added the waiting label Jan 9, 2023

zuercher removed their assignment Jan 31, 2023

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 30, 2023

github-actions bot closed this Apr 6, 2023

s-matyukevich mentioned this pull request Jun 13, 2023

How should we implement subsetting in gRPC? grpc/grpc-go#6370

Closed

		envoy.extensions.load_balancing_policies.deterministic_aperture.v3.DeterministicApertureLbConfig
		deterministic_aperture_lb_config = 57;

[load balancer] Introducing deterministic aperture load balancing. #22991

[load balancer] Introducing deterministic aperture load balancing. #22991

Conversation

conqerAtapple commented Sep 6, 2022

repokitteh-read-only bot commented Sep 6, 2022

conqerAtapple commented Sep 6, 2022

conqerAtapple commented Sep 6, 2022

repokitteh-read-only bot commented Sep 6, 2022

conqerAtapple commented Sep 7, 2022

zuercher commented Sep 9, 2022

repokitteh-read-only bot commented Sep 9, 2022

conqerAtapple commented Sep 11, 2022

repokitteh-read-only bot commented Sep 11, 2022

conqerAtapple commented Sep 12, 2022

ggreenway commented Sep 12, 2022

zuercher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conqerAtapple commented Sep 17, 2022

zuercher left a comment

Choose a reason for hiding this comment

conqerAtapple commented Nov 23, 2022

wbpcode commented Nov 24, 2022 • edited

Choose a reason for hiding this comment

conqerAtapple Nov 24, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conqerAtapple commented Nov 28, 2022

Choose a reason for hiding this comment

conqerAtapple Nov 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

conqerAtapple Nov 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KBaichoo commented Jan 9, 2023

zuercher commented Feb 27, 2023

conqerAtapple commented Feb 27, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Apr 6, 2023

wbpcode commented Nov 24, 2022 •

edited

conqerAtapple Nov 24, 2022 •

edited

conqerAtapple Nov 28, 2022 •

edited

conqerAtapple Nov 28, 2022 •

edited