Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[load balancer] Introducing deterministic aperture load balancing. #22991

Closed

Conversation

conqerAtapple
Copy link
Contributor

Based on the discussions in issue:#17013, this change introduces implementation of deterministic aperture load balancing algorithm. The implementation matches the original finagle implementation (https://github.com/twitter/finagle/tree/develop/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer).

Risk Level: Medium
Testing: Unit/Integration, Mannual
Docs Changes: TBD
Release Notes: TBD
Platform Specific Features: N/A
API Changes: TBD

Based on the discussions in issue:envoyproxy#17013, this change
introduces implementation of deterministic aperture load balancing algorithm. The
implementation matches the original finagle implementation
(`https://github.com/twitter/finagle/tree/develop/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer`).

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #22991 was opened by conqerAtapple.

see: more, trace.

@conqerAtapple
Copy link
Contributor Author

This is a draft based on #17013.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
@conqerAtapple
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #22991 (comment) was created by @conqerAtapple.

see: more, trace.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
@conqerAtapple
Copy link
Contributor Author

@markdroth @zuercher PTAL. This is an initial draft based on discussions in #17013.

Also, looks like the.broken build is unrelated to the PR?

@zuercher
Copy link
Member

zuercher commented Sep 9, 2022

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #22991 (comment) was created by @zuercher.

see: more, trace.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
@conqerAtapple
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #22991 (comment) was created by @conqerAtapple.

see: more, trace.

@conqerAtapple
Copy link
Contributor Author

@zuercher I think at this point the CI errors are coverage related which is expected as this is WIP/Draft. Could use eyes on the PR. Thanks!

@ggreenway
Copy link
Contributor

@zuercher assigning to you as you have expertise in related areas. Feel free to re-assign if needed. But this has sat unassigned for awhile so trying to get it some attention.

Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass through this. The general shape seems good, but I think a bunch of the range checking can be elided if we make better use of const values and assertions.

api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved
source/common/upstream/deterministic_aperture_lb.h Outdated Show resolved Hide resolved
source/common/upstream/deterministic_aperture_lb.cc Outdated Show resolved Hide resolved
source/common/upstream/deterministic_aperture_lb.cc Outdated Show resolved Hide resolved
if (offset < 0 || (offset >= unit_width_ * ring_size_)) {
return absl::nullopt;
}
return offset / unit_width_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we get better precision if this was just offset * ring_size_ since unit_width_ is 1/ring_size_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. But i think we will have to do a modulus operation also ((offset * ring_size_)%ring_size)

HostConstSharedPtr chooseHost(uint64_t hash, uint32_t attempt) const override;

//
// Utility Ring methods. Note that they are public so that we can add unit tests easily.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we normally prefer not to expose methods just for testing. We have cases where we mark a test as a friend class for this kind of direct testing of class internals, but I wonder if it's not sufficient to provide a mocked RNG to test the different paths.

source/common/upstream/ring_hash_lb.h Outdated Show resolved Hide resolved
source/common/upstream/upstream_impl.h Outdated Show resolved Hide resolved
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
@conqerAtapple
Copy link
Contributor Author

@zuercher Thanks for reviewing and thoughtful comment. I hav addressed most of the points you raised in the comments.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Have you looked at the code coverage report to see that you have good coverage of the corner cases in weight and tryPickSecond?

source/common/upstream/deterministic_aperture_lb.cc Outdated Show resolved Hide resolved
source/common/upstream/deterministic_aperture_lb.cc Outdated Show resolved Hide resolved
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
@conqerAtapple
Copy link
Contributor Author

@adisuissa @wbpcode @markdroth would need your advice on docs.

@wbpcode
Copy link
Member

wbpcode commented Nov 24, 2022

Thanks for all your great work and patience.

API is more important than the implementation. It would be better to wait for the cc @markdroth's review comments to API changes first.

/wait

Comment on lines 1072 to 1073
envoy.extensions.load_balancing_policies.deterministic_aperture.v3.DeterministicApertureLbConfig
deterministic_aperture_lb_config = 57;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct way to use load balancing extension.

See #23472 as a example.

We won't added new field to Cluster for specific load balancing policy. All load balancing extension should be configured by the load_balancing_policy (wrapper of TypedExtensionConfig).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbpcode I understand what you are trying to say. Unfortunately currently the only way to get the load balancer configuration in the Factory is via the ClusterInfo object. So we will have to convert the loadBalancingPolicy to DeterministicApertureLbConfig. Is there an example for that?

edit: I think the PR example #23472 helps a lot. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbpcode I have made the suggested changes (from #23472). doc build still fails :(

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
@conqerAtapple
Copy link
Contributor Author

@wbpcode I think the load balancer now follows the extension pattern. docs build fails for some reason. It is not very obvious what the reason might be.

// [#extension: envoy.load_balancing_policies]
message DeterministicApertureLbConfig {
// Hash ring configuration for the internal ring for endpoints.
ring_hash.v3.RingHash ring_config = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new policy ring-hash-specific? If so, maybe instead of implementing this as a separate policy, it should just be a new config field in the existing ring-hash policy?

That having been said, I think a better approach would be to change this such that it is not actually specific to the ring_hash policy in the first place. I haven't looked at the implementation to see how tightly coupled this is with ring_hash, but if the overall goal here is to allow the LB policy to choose from only a subset of the endpoints, then presumably the child policy here could be any policy, even something other than ring_hash. In that case, this field could be replaced with:

config.cluster.v3.LoadBalancingPolicy child_policy = 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth I agree. The policy is not specific to Ring hash but for now that is the policy. Ring was picked as it was a natural/easy fit. We could change the field as you suggested but the implementation would be only for Ring hash for now.

edit: Another option is to have this load balancer's configuration have its own Ring which then internally gets translated to the RingHash's configuration. This way, we keep them separate but reuse the RingHash's implementation internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why is the implementation of this new subsetting policy dependent upon implementation details of the child policy in the first place? It seems to me that this policy should be self-contained: it should determine which subset of hosts the child policy should be looking at and then delegate to the child policy to pick from that set of hosts. It should not care how the child policy is actually doing that pick.

If we agree that this is the long-term approach that we want here, then we should definitely not bake any dependency on ring_hash into the API itself. So I would definitely prefer to see a generic child policy config here.

I'll defer to @wbpcode in terms of the implementation, but in principle, it seems wrong to me to implement this in a way that works only with ring_hash, regardless of the API representation. It seems like this is just unnecessarily breaking abstractions by exposing inner workings of the ring_hash policy, and it is unnecessarily constraining to provide this functionality only in conjunction with the ring_hash policy.

I would really prefer to see this implemented in a way that is completely agnostic to the details of the underlying child policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need some explanation :). The deterministic aperture lb uses a ring to place its endpoints but then overlays the Envoy peers ring on top to split the endpoints equally among the peers. This is well explained here : https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.

So in short, the load balancer needs a ring to place its endpoints. We already have a Ring implementation in Envoy so the thought process is to reuse that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the algorithm here is using a ring, and I agree that it makes sense to reuse the existing ring implementation in Envoy. The thing I'm objecting to is the way we're accomplishing that reuse. This PR is currently doing that by tying the deterministic_aperature policy to the ring_hash policy, which (a) prevents flexibility by not allowing deterministic_aperature to be used with other child policies, even though in principle its functionality seems orthogonal to that of the child policy, and (b) breaks the abstraction of the ring_hash policy by tying another policy to its internals, which will make it harder to make changes to ring_hash in the future.

I think a cleaner way to structure this would be to refactor the ring implementation out of the ring_hash policy and into its own library, and then use that ring library in both the ring_hash policy and in the deterministic_aperature policy. That way, you can still reuse the existing ring implementation, but nothing in the deterministic_aperature policy will have to reach inside of the ring_hash policy, and it will allow using deterministic_aperature with any child policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a separate RingConfig message. I think it would be fine to just copy the relevant fields from the RingHash message to the new DeterministicApertureLbConfig message.

But otherwise, yes, this sounds good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @markdroth . Since the changes discussed here implies we change the existing RingHash load balancer to use the new refactored Ring implementation, I am wondering if that will increase the scope of this PR. One way to do this could be to not change the existing RingHash load balancer in this PR but have only DeterministicAperture load balancer use the new the Ring library. Then when we move the RingHash load balancer to the new extension model, change it to use the RIng library. I can give a shot at moving the RingHash load balancer to the extension model in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @wbpcode weigh in here, but IMHO, refactoring the ring functionality out of the ring_hash LB policy really has nothing to do with converting the ring_hash LB policy to be an extension. I think if you're going to be moving that functionality into its own library, it makes sense to change ring_hash to use that library in the same PR, rather than just duplicating the functionality and then removing the duplication later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the clarification 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth I have refactored out the Ring concept. PTAL.
The docs build is still broken unfortunately.

Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Signed-off-by: Jojy George Varghese <jojy_varghese@apple.com>
Copy link
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure here looks good!

I'll let @wbpcode and the Envoy maintainers review the actual code. I have just one small comment on the API.

// uniform random distribution to select a backend from the range of overlapping backends. This
// along with P2C ensures uniform load distribution.
// [#next-free-field: 7]
message DeterministicApertureLbConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config still needs a field for the child policy config, or else there's no way to tell it what policy to actually use to pick from amongst the selected subset of endpoints. I think you need to add something like this:

config.cluster.v3.LoadBalancingPolicy child_policy = 7
    [(validate.rules).message = {required: true}];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth Thanks for looking. Not sure what you mean. The DeterministicAperture load balancer policy is itself a LbPolicy of type LOAD_BALANCING_POLICY_CONFIG. The integration test in the PR shows the usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. I thought that what this policy provided was a way to restrict the LB pick to a subset of the endpoints. Once we do that, we still need to define the policy to be used to pick from that subset of endpoints, and we were going to do that by delegating to a child policy.

It seems to me that we have two separate pieces of functionality here:

  1. Choosing what subset of endpoints to use.
  2. Choosing from among those endpoints for a particular request.

I think it makes sense to be able to configure these two things independently of each other. It seems useful for users to be able to decide whether or not to do the subsetting defined in this policy (item 1) and then independently choose what LB policy to use to pick from among the chosen subset of endpoints (item 2). For example, it should be possible to configure either ring-hash or round-robin for (2).

I think the right way to do this is to delegate to a child policy for (2), so that users can plug in whatever child policy they want.

I'm not sure what algorithm you're actually using for (2) here, but if it's not one of the existing policies, then I think it would make sense to add that as a separate child policy, which can be configured as a child of this one (or could even be used on its own, without the subsetting functionality provided by the deterministic_aperature policy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth The policy here is at the same level as other load balancers in Envoy. So deterministic aperture is able to choose host from the subset of endpoints. This subset is derived by the deterministic aperture load balancer by overlapping the ring of endpoints with the ring of participating Envoys. So in short the load balancer here is self contained (like others - aka Maglev, Ring etc). Please let me know if that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is intended to be used as a top-level LB policy. But that doesn't mean that it can't delegate to another policy for the actual pick from among the filtered subset of hosts.

What is the actual picking algorithm used here? Is there any reason that that algorithm needs to be tied to the subsetting behavior? If not, it seems more flexible to delegate to a child policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you are proposing, but I don't think it makes any sense to create a new interface here. The whole point of an LB policy is to provide the endpoint picking behavior across a set of endpoints, so creating a new interface to do the same thing makes no sense. That would add needless complexity and prevent people from using any of the existing policies as child policies.

To be clear, what I am suggesting here is that the P2C child policy should create its own ring rather than reusing the ring from the parent subsetting policy. There should not be any need to share any state between the parent and child policy except for the information that can be passed through the existing LB policy API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing something here. The P2C Policy with its own Ring is the current state of the PR. The current PR exposes the P2C Policy + Ring as a load balancing policy which can be used as shown in the integration test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am saying is that there should be two different rings here: one in the parent policy that does subsetting, and one in the child policy that picks the endpoint from the selected subset. You should not need to pass the ring through the LB policy API when creating the child.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The child (P2C) would need access to the parent Ring (to ask what are my subsets). I am not sure in the case of P2C, if it needs a Ring for itself since all it needs is the section of the Ring it can pick from. The parent Ring provides API for child to ask give me the ring subsection(subset) so that I can apply my policy. Some other policy might decide to create its own Ring with the subset information but that depends on that policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's step back for a moment here. Let's set aside the details of the specific algorithm used in the two policies and instead talk about what each of the two policies is responsible for and how they interact with each other.

The LB policy API basically gives the policy a set of hosts and then asks it to pick from that set of hosts for each request. We want both the parent and the child policy to use that same API.

The goal of the parent policy is to decide what subset of hosts the child should pick from. In effect, the parent policy does not actually do any picking; all it does is filter the set of hosts that it is given so that it passes only a subset of those hosts to the child policy.

The goal of the child policy is to pick from the set of hosts that it is given by its parent. It should not matter to the child whether its parent is Envoy (i.e., it is the top-level policy) or whether its parent is another LB policy that did subsetting before delegating to it; either way, it is just responsible for picking from amongst the hosts it's been given.

I think the important properties here are:

  1. Both of these policies are implemented using the existing LB policy API, not creating some new API to do the same thing.
  2. There should be no communication between the two policies that cannot be expressed via the existing LB policy API.

So getting back to this specific situation, I don't think we want an API for the child to refer back to the ring in the parent policy, because that would tie the child to this parent; the child would no longer be able to be used by itself without the parent. Instead, I think we want the parent policy to build the ring, decide which subset to use, and then construct a list of hosts in that subset. The parent policy will then pass that list of hosts to the child in the same format as would be expected by any LB policy.

If the child policy doesn't actually need a ring for its algorithm, then it doesn't need to construct a ring. It can just do a uniform random distribution pick from among the set of hosts that are passed to it.

@KBaichoo
Copy link
Contributor

KBaichoo commented Jan 9, 2023

/wait

For merge conflicts, API changes discussion

@zuercher zuercher removed their assignment Jan 31, 2023
@zuercher
Copy link
Member

/wait

Is this PR still a going concern? Should we close it and wait for a future PR that addresses the API issues?

@conqerAtapple
Copy link
Contributor Author

@zuercher Sorry i got pulled into another task so did not get any time for this PR.

Need understanding of what @markdroth is proposing.

Thanks for the suggestion @markdroth

I am not sure if i completely understand the two layer policy. I am not sure why subsetting LB is part of this LB policy at all. The idea of this algorithm is to provide another LB algorithm (not a subsetting algorithm). This LB algorithm serves the same purpose as of all other core algorithms (example RingHash or RoundRobin). So maybe @markdroth could help me by explaining how RoundRobin algorithm could be implemented (assuming it was not already part of the current core implementations) in terms of his proposal. This will help me grasp the proposal.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 30, 2023
@github-actions
Copy link

github-actions bot commented Apr 6, 2023

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants