LB: introduce randomization in locality LB scheduler initialization by adisuissa · Pull Request #32075 · envoyproxy/envoy

adisuissa · 2024-01-26T19:36:25Z

Commit Message: LB: introduce randomization in locality LB scheduler initialization
Additional Description:
Following up on the randomized picking introduced in #31592, this PR plumbs a seed into the locality-LB.
Prior to this PR whenever a new EDS assignment arrived, the order of the picked locality was always reset.
For example, if there are 2 localities, locality_A with weight 99 and locality_B with weight 1, an update of the EDS message (that doesn't update the weights) may result in starting from the beginning of the "pick list".
This may cause a skew in the traffic distribution, especially in a fleet of Envoys, and when the Envoys receive periodic assignment updates and a very low QPS.

The change ensures that the starting point of which locality to choose from is randomized and picked from some random starting point.

Risk Level: Medium - a bug fix that may impact current traffic flows.
Testing: Added tests, and updated old ones.
Docs Changes: None.
Release Notes: Added.
Platform Specific Features: N/A.
Runtime guard: Added envoy.reloadable_features.edf_lb_locality_scheduler_init_fix

Signed-off-by: Adi Suissa-Peleg <adip@google.com>

repokitteh-read-only · 2024-01-26T19:36:33Z

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #32075 was opened by adisuissa.

see: more, trace.

adisuissa · 2024-01-26T19:38:28Z

Assigning load-balancing relevant maintainers.
This PR is mostly plumbing (unfortunately needed in many places of the code) and tests.
/assign @wbpcode @nezdolik @htuch

tonya11en · 2024-01-26T19:43:50Z

source/common/upstream/upstream_impl.cc

+          locality_entries.emplace_back(std::make_shared<LocalityEntry>(i, effective_weight));
+        }
+      }
+      // If not all effective weights were zero, create the scheduler.


Do we still want to create a scheduler even if all weights are equal?

AFAICT this was the behavior prior to this PR, and I haven't modified that.

This comment (and code) is looking at the case where all weights are zero. The end-result of this method (both prior to this change and after this change) is that if all weights are zero, the locality_scheduler will be null.

test/extensions/load_balancing_policies/subset/subset_test.cc

Signed-off-by: Adi Suissa-Peleg <adip@google.com>

…b_seed Signed-off-by: Adi Suissa-Peleg <adip@google.com>

source/common/upstream/cluster_manager_impl.h

source/common/upstream/upstream_impl.cc

nezdolik · 2024-01-29T21:20:17Z

source/common/upstream/health_discovery_service.cc

-                            hosts_added, hosts_removed, absl::nullopt, absl::nullopt);
+  priority_set_.updateHosts(
+      0, HostSetImpl::partitionHosts(hosts_, hosts_per_locality_), {}, hosts_added, hosts_removed,
+      server_context_.api().randomGenerator().random(), absl::nullopt, absl::nullopt);


checked randomGenerator().random() briefly, it does not seem to rely on any properties that increase the chance of duplications across large Envoy fleet (e.g. by using current time in the seed)

I think it is quite challenging to sync the time across a fleet, especially if we are thinking in the context of EDS updates that are being sent to many Envoys (and each triggers the update at a slightly different time).
BTW I'm also basing this code on the comment that already exists in:

envoy/source/common/upstream/load_balancer_impl.h

Lines 540 to 545 in b34d122

// Seed to allow us to desynchronize load balancers across a fleet. If we don't

// do this, multiple Envoys that receive an update at the same time (or even

// multiple load balancers on the same host) will send requests to

// backends in roughly lock step, causing significant imbalance and potential

// overload.

const uint64_t seed_;

test/common/upstream/upstream_impl_test.cc

nezdolik · 2024-01-29T21:29:46Z

source/common/upstream/upstream_impl.cc

+        locality_scheduler = std::make_unique<EdfScheduler<LocalityEntry>>(
+            EdfScheduler<LocalityEntry>::createWithPicks(
+                locality_entries,
+                [](const LocalityEntry& entry) { return entry.effective_weight_; }, seed));


do we need to ensure that seed > 0?

seed is defined as uint64_t in the input of the current function, and as a uint32_t in the input of createWithPick(), so both are unsigned and should be >0. The implicit casting seems fine to me, as it just specifies the number of initial-picks which if it is more than 10^6 is probably bad anyway.

was curios if 0 value (which is valid uint64_t) can lead to unpredicted results.

This was validated in this test.

test/common/upstream/upstream_impl_test.cc

nezdolik

if edf_lb_locality_scheduler_init_fix flag is now on by default, we need to test both code branches in the tests

…b_seed Signed-off-by: Adi Suissa-Peleg <adip@google.com>

Signed-off-by: Adi Suissa-Peleg <adip@google.com>

adisuissa · 2024-01-30T14:22:53Z

if edf_lb_locality_scheduler_init_fix flag is now on by default, we need to test both code branches in the tests

In some cases we can do a parameterized test that works with both. I think the question is what is the added value of this, and in this case I'm not sure the benefit outweighs the cost.
I'm not opposing doing this, just looking for the reasons to increase test time.

adisuissa · 2024-01-30T14:23:25Z

Thanks for the detailed review and comments @nezdolik!

nezdolik · 2024-01-30T20:41:58Z

if edf_lb_locality_scheduler_init_fix flag is now on by default, we need to test both code branches in the tests

In some cases we can do a parameterized test that works with both. I think the question is what is the added value of this, and in this case I'm not sure the benefit outweighs the cost. I'm not opposing doing this, just looking for the reasons to increase test time.

Would not executing code from both if branches affect coverage?

adisuissa · 2024-01-30T21:32:47Z

if edf_lb_locality_scheduler_init_fix flag is now on by default, we need to test both code branches in the tests

In some cases we can do a parameterized test that works with both. I think the question is what is the added value of this, and in this case I'm not sure the benefit outweighs the cost. I'm not opposing doing this, just looking for the reasons to increase test time.

Would not executing code from both if branches affect coverage?

Yes. I added a specific test that sets the runtime-flag to false, to validate that it works as expected, and that will cover the old-path code.
If you think there are other tests that may benefit from that, let me know which.

nezdolik · 2024-01-30T22:02:53Z

if edf_lb_locality_scheduler_init_fix flag is now on by default, we need to test both code branches in the tests

In some cases we can do a parameterized test that works with both. I think the question is what is the added value of this, and in this case I'm not sure the benefit outweighs the cost. I'm not opposing doing this, just looking for the reasons to increase test time.

Would not executing code from both if branches affect coverage?

Yes. I added a specific test that sets the runtime-flag to false, to validate that it works as expected, and that will cover the old-path code. If you think there are other tests that may benefit from that, let me know which.

np, sorry I missed that test, it was lots of affected test code :)

wbpcode · 2024-02-01T02:49:40Z

source/common/upstream/upstream_impl.cc

+      // If not all effective weights were zero, create the scheduler.
+      if (!locality_entries.empty()) {
+        locality_scheduler = std::make_unique<EdfScheduler<LocalityEntry>>(
+            EdfScheduler<LocalityEntry>::createWithPicks(


This is LGTM overall.

I have only one question. The createWithPicks will use the seed % 429496729 as the pre picks number.
And because the seed is a random value from random() method, so it's possible we may do 429496728 times pre-picks when we update the hosts.

Do we acutally evalute the possible performance impact at the worst case? If the users update the host set frequently, will this bring huge performance burden?

(PS: the host scheduler will use seed % hosts.size() as the pre picks number, I think this should be some similar things? like we can use seed % locality_entries.size() or std::max(8, seed % locality_entries.size()) here?

This is LGTM overall.

I have only one question. The createWithPicks will use the seed % 429496729 as the pre picks number. And because the seed is a random value from random() method, so it's possible we may do 429496728 times pre-picks when we update the hosts.

Do we acutally evalute the possible performance impact at the worst case? If the users update the host set frequently, will this bring huge performance burden?

While conceptually the number of pre-picks could be large, the code is bounded by the number of entries/weights (N- see here), and the operation has the complexity of O(N*log(N)), which is the same as it is right now (because the localities are added one after the other to the priority-queue).
A proof of the bounded number of iterations can be found at the doc linked for this comment.

(PS: the host scheduler will use seed % hosts.size() as the pre picks number, I think this should be some similar things? like we can use seed % locality_entries.size() or std::max(8, seed % locality_entries.size()) here?

Yes, the next PR will be to fix the wrong initialization there as well.

I actually not completely get it. I will read the code more detailedly tomorrow. Will unblock the PR first.

feel free to reach out directly and I can try to give a guided explanation.

I got it. Sorry for the previous negligenc.

I have to say, it's awesome 👍.

wbpcode

LGTM overall. I mark it as request changes because there is an important quesion to be checked first.

source/common/upstream/upstream_impl.cc

…b_seed Signed-off-by: Adi Suissa-Peleg <adip@google.com>

adisuissa · 2024-02-05T20:40:27Z

/retest

wbpcode

LGTM. Thanks.

LB: randomization in locality LB scheduler initialization

c47e79c

Signed-off-by: Adi Suissa-Peleg <adip@google.com>

adisuissa requested review from alyssawilk, mattklein123, nezdolik, wbpcode and zuercher as code owners January 26, 2024 19:36

repokitteh-read-only bot assigned htuch, nezdolik and wbpcode Jan 26, 2024

tonya11en reviewed Jan 26, 2024

View reviewed changes

adisuissa added 2 commits January 29, 2024 14:05

increasing coverage

06a516d

Signed-off-by: Adi Suissa-Peleg <adip@google.com>

Merge remote-tracking branch 'upstream/main' into locality_sched_plum…

85e1cdc

…b_seed Signed-off-by: Adi Suissa-Peleg <adip@google.com>

nezdolik reviewed Jan 29, 2024

View reviewed changes

adisuissa added 2 commits January 30, 2024 13:45

Merge remote-tracking branch 'upstream/main' into locality_sched_plum…

633e12a

…b_seed Signed-off-by: Adi Suissa-Peleg <adip@google.com>

comments + coverage

9042d97

Signed-off-by: Adi Suissa-Peleg <adip@google.com>

nezdolik previously approved these changes Jan 30, 2024

View reviewed changes

wbpcode reviewed Feb 1, 2024

View reviewed changes

wbpcode requested changes Feb 1, 2024

View reviewed changes

htuch reviewed Feb 1, 2024

View reviewed changes

source/common/upstream/upstream_impl.cc Show resolved Hide resolved

repokitteh-read-only bot added waiting:any and removed waiting:any labels Feb 1, 2024

wbpcode previously approved these changes Feb 1, 2024

View reviewed changes

htuch reviewed Feb 2, 2024

View reviewed changes

source/common/upstream/upstream_impl.cc Show resolved Hide resolved

htuch previously approved these changes Feb 2, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into locality_sched_plum…

1e82534

…b_seed Signed-off-by: Adi Suissa-Peleg <adip@google.com>

adisuissa dismissed stale reviews from htuch, wbpcode, and nezdolik via 1e82534 February 5, 2024 13:56

wbpcode approved these changes Feb 6, 2024

View reviewed changes

adisuissa merged commit 12e277d into envoyproxy:main Feb 6, 2024

adisuissa mentioned this pull request Feb 6, 2024

LB: fix randomization in host LB scheduler initialization #32233

Merged

alyssawilk mentioned this pull request Oct 15, 2024

envoy_reloadable_features_edf_lb_locality_scheduler_init_fix deprecation #36602

Closed

	// Seed to allow us to desynchronize load balancers across a fleet. If we don't
	// do this, multiple Envoys that receive an update at the same time (or even
	// multiple load balancers on the same host) will send requests to
	// backends in roughly lock step, causing significant imbalance and potential
	// overload.
	const uint64_t seed_;

Conversation

adisuissa commented Jan 26, 2024

Uh oh!

repokitteh-read-only bot commented Jan 26, 2024

Uh oh!

adisuissa commented Jan 26, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nezdolik left a comment

Choose a reason for hiding this comment

Uh oh!

adisuissa commented Jan 30, 2024

Uh oh!

adisuissa commented Jan 30, 2024

Uh oh!

nezdolik commented Jan 30, 2024

Uh oh!

adisuissa commented Jan 30, 2024

Uh oh!

nezdolik commented Jan 30, 2024

Uh oh!

wbpcode Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wbpcode Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wbpcode left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adisuissa commented Feb 5, 2024

Uh oh!

wbpcode left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wbpcode Feb 1, 2024 •

edited

Loading

wbpcode Feb 1, 2024 •

edited

Loading