load_balancing: Add Load Aware Locality LB Policy by jukie · Pull Request #43784 · envoyproxy/envoy

jukie · 2026-03-05T05:06:45Z

Commit Message: load balancing: add load-aware locality-picking LB policy

Additional Description: New LB policy (envoy.load_balancing_policies.load_aware_locality) that uses ORCA utilization data to choose localities based on real-time headroom, as proposed in #43665.

Risk Level: low (new extension behind a new config proto, core changes are additive)

Testing: Unit tests for config validation, weight computation (EWMA, variance threshold, probe percentage, all overloaded fallback, topology changes), per-locality host partitioning, and weighted random selection. Integration test with a multi-locality cluster verifying traffic shifts in response to ORCA utilization reports.

Docs Changes: proto comments serve as initial documentation.

Release Notes: added new extension envoy.load_balancing_policies.load_aware_locality.

Platform Specific Features: n/a
Fixes #43665
[Optional API Considerations:]
API Considerations:

New proto LoadAwareLocality in envoy/extensions/load_balancing_policies/load_aware_locality/v3/.
- endpoint_picking_policy field accepts any endpoint-picking child policy.
- metric_names_for_computing_utilization is declared but not yet honored (documented in proto comments as TODO).
New virtual method orcaUtilization() on HostDescription interface.

AI was used during implementation and for writing tests but I fully understand the changes here

repokitteh-read-only · 2026-03-05T05:06:51Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #43784 was opened by jukie.

see: more, trace.

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

CODEOWNERS

jukie · 2026-03-05T08:14:14Z

This approach removes the OrcaWeightManager extraction (#43695) as a prerequisite - the policy reads utilization directly via a new lightweight per-host OrcaUtilizationStore rather than reusing CSWRR's weight management machinery.

How it works:

A main-thread timer periodically reads host->orcaUtilization() from healthy hosts, averages per locality, applies EWMA smoothing, and computes capacity-weighted headroom (healthy_hosts × (1 - utilization)).
The resulting routing weights are pushed to workers via TLS.
Workers select a locality by weighted random and delegate endpoint selection to per-locality child LB instances (e.g., round_robin, client_side_weighted_round_robin).

LoadAwareLocalityLoadBalancer (main thread, ThreadAwareLoadBalancer)
  |
  |-- weight_update_timer_ (plain Event::Timer)
  |     Fires periodically to recompute locality routing weights.
  |
  |-- on timer callback:
  |     computeLocalityRoutingWeights()
  |       Reads host->orcaUtilization() from each healthy host, averages
  |       per locality, applies EWMA smoothing, computes capacity weighted
  |       by healthy host count (healthy_hosts * headroom), checks variance
  |       threshold for local-zone preference, applies probe percentage,
  |       publishes immutable snapshot.
  |
  +-- WorkerLocalLbFactory (shared across all workers)
        |
        |-- child_thread_aware_lb_ (single shared child ThreadAwareLoadBalancer)
        |     Created and initialized on the main thread. Workers call
        |     factory()->create() from it to build per-locality worker LBs.
        |
        |-- tls_ (ThreadLocal::TypedSlot<ThreadLocalShim>)
        |     Main thread pushes RoutingWeightsSnapshot to workers via
        |     runOnAllThreads(). Workers read from their TLS slot with
        |     zero synchronization.
        |
        +-- create() --> WorkerLocalLb (one per worker thread)
              |
              |-- selectLocality() [weighted random by capacity]
              |     Weighted random across localities proportional to
              |     host-count-weighted headroom.
              |
              +-- per_locality_[] (one PerLocalityState per locality)
                    |-- PrioritySetImpl (hosts for this locality only)
                    +-- LoadBalancer (worker-local child instance, e.g., RoundRobin)

Design decisions:

Locality selection and endpoint selection are fully decoupled — any endpoint-picking child policy works, including CSWRR (no lbPolicyData() conflicts since utilization flows through a separate OrcaUtilizationStore channel).
Local-zone preference: when local utilization is within a configurable variance threshold of the remote average, routes 100% local.
Probe percentage: ensures a minimum fraction of traffic reaches remote localities to keep ORCA data fresh even in all-local mode. If Out-Of-Band reporting is added this can be used as an alternative to completely avoid cross-zone traffic

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

envoy/upstream/host_description.h

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

repokitteh-read-only · 2026-03-05T15:41:29Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @wbpcode
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #43784 was ready_for_review by jukie.

see: more, trace.

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

markdroth

This approach looks really good from an xDS API perspective!

I'll let one of the Envoy maintainers review the implementation.

api/envoy/extensions/load_balancing_policies/load_aware_locality/v3/load_aware_locality.proto

CODEOWNERS

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

markdroth · 2026-03-05T23:26:43Z

/lgtm api

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie · 2026-03-06T17:51:20Z

/retest

agrawroh · 2026-03-06T22:55:51Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new load balancing policy, load_aware_locality, which distributes traffic between localities based on real-time utilization data from ORCA reports. The implementation is robust, featuring EWMA smoothing for utilization metrics, a variance threshold to prefer the local locality when load is balanced, and a configurable probe percentage to ensure telemetry freshness. The code is well-structured, with clear separation between main-thread weight computation and per-worker load balancing logic. It also includes comprehensive unit and integration tests that cover a wide range of scenarios, including dynamic load shifts and various configuration parameters. The changes are additive and well-contained within the new extension. Overall, this is a high-quality contribution that adds significant new functionality to Envoy's load balancing capabilities.

_{Note: Security Review did not run due to the size of the PR.}

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

…icies Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie · 2026-03-12T13:30:38Z

/retest

phlax · 2026-03-12T13:33:16Z

needs main merge - unrelated go problem is fixed there

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

markdroth · 2026-03-12T15:25:28Z

source/extensions/load_balancing_policies/load_aware_locality/config.cc

+        return absl::InvalidArgumentError(
+            absl::StrCat("Unsupported endpoint picking policy for load_aware_locality: ",
+                         endpoint_picking_policy_factory->name(),
+                         ". Child policies must support locality-scoped worker instantiation."));


I'm not super familiar with Envoy's implementation, so I don't understand why we have this restriction, but I will note that it seems sub-optimal. In principle, the child policy and the parent policy should have exactly the same API, so delegation should be possible to any child policy. If that's not the case, then maybe we need some changes in Envoy's LB policy API to make that possible.

This new locality policy only works with child policies whose logic can be preserved on a locality-scoped set of hosts. This is similar to the limitations with combining weighted clusters with consistent hashing (#21675). When the host set is split before the child policy runs, policies like ring_hash and maglev lose their expected behavior because they're building their hash structures from an incomplete view of the host set.

If you'd prefer that choice be up to the user I can remove this restriction and document it as a limitation.

load_aware_locality only works with child policies whose logic can be preserved on a locality-scoped set of hosts. Policies that depend on the full cluster's host set (such as the hashing-based policies rejected here) don't fit that model

I don't understand what that means, since I'm not that familiar with the Envoy LB policy API. But in principle, I would expect that the parent policy and child policy should both accept the list of endpoints in the same structure, since they are both implementing the same LB policy API. Given that, it's not clear to me what restrictions we would have here: I would expect that the parent policy would essentially just filter the list of endpoints it passes down to the child policy, and the child policy would just pick from among that filtered set of endpoints. What am I missing?

I would expect that the parent policy would essentially just filter the list of endpoints it passes down to the child policy

That's exactly what happens here but the problem is policies like ring_hash or maglev who depend on building a hash structure over the full cluster host set. Once the parent narrows that to a single locality, the child is no longer implementing the same policy.
To really support that, the parent would need to implement something like locality-aware hashing so a given hash key maps consistently to the same locality. That isn't necessarily impossible in the future but is a bit at-odds with the core goal of this policy which is to route based on load. I can add that on top if you feel strongly but I personally feel that would be better suited as a locality-aware hashing LB policy.

Again, this is similar to other limitations that exist in Envoy (a good list is in envoyproxy/gateway#5307 (comment)) so documenting it as a known limitation but still accepting these child policies could be reasonable as well. I'm open to changing this.

If I'm understanding right, you're saying that the child's behavior won't be exactly the same underneath the locality-picking policy, because of things like the fact that ring_hash won't have the same set of hosts in its ring and will therefore not wind up picking the same endpoint for the same request hash?

If so, I think that's fine and is not something we should be disallowing. It's certainly something that people need to understand if they choose to configure those policies underneath a locality-picking policy, but I don't think we should go out of our way to disallow that.

I also think this hard-coded allow list is going to wind up being brittle, because no one will ever remember to update it when we add new LB policies over time.

Thanks for the feedback, updated.

After a host removal tears down a locality's child LB, the routing snapshot may still assign weight to that locality, causing chooseHost to return nullptr despite other localities having healthy hosts. Add pickLocalityLb helper that falls back to any locality with a usable child LB. Also propagate empty-delta priority updates to child LBs. Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

… too much Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

wbpcode · 2026-03-13T16:55:50Z

envoy/upstream/host_description.h

+class OrcaUtilizationStore {
+public:
+  double get() const { return value_.load(std::memory_order_relaxed) / kScale; }
+
+  // Set utilization with a monotonic timestamp (milliseconds since epoch).
+  void set(double utilization, int64_t monotonic_time_ms) {
+    // Reject non-finite values: std::clamp and the uint32 cast have undefined behavior for them.
+    if (!std::isfinite(utilization)) {
+      return;
+    }
+    // Clamp to [0, 1]. The fixed-point uint32 encoding cannot represent values outside this range:
+    // negative values would wrap around on cast, and values above 1.0 would overflow kScale.
+    // ORCA also defines utilization in [0, 1].
+    utilization = std::clamp(utilization, 0.0, 1.0);
+    value_.store(static_cast<uint32_t>(utilization * kScale), std::memory_order_relaxed);
+    last_update_time_ms_.store(monotonic_time_ms, std::memory_order_release);
+  }
+
+  // Returns the monotonic timestamp (ms since epoch) of the last set() call,
+  // or 0 if set() has never been called.
+  int64_t lastUpdateTimeMs() const { return last_update_time_ms_.load(std::memory_order_acquire); }
+
+private:
+  static constexpr double kScale = 10000.0;
+  std::atomic<uint32_t> value_{0};
+  std::atomic<int64_t> last_update_time_ms_{0};
+};


Is this necessary here? I think we may prefer the solution in the client_side_weighted_round_robin? That's say, the LB will implement the specific HostLbPolicyData implementation and will compute the the necessary utilization?

The HostLbPolicyData is designed to do this task.

wbpcode · 2026-03-13T17:14:30Z

envoy/upstream/load_balancer.h

+  /**
+   * @return true if this LB policy reads per-host ORCA load report data.
+   *         When true, the router will parse ORCA load reports from upstream
+   *         responses.
+   */
+  virtual bool requiresOrcaLoadReports() const { return false; }


Maybe, Maybe we can move the requiresOrcaLoadReports to the LoadBalancerConfig, and we can add another absl::optional<double> onOrcaLoadReport(const Upstream::OrcaLoadReport& report); there to calculate the utilization based on the configuration.

Then combine the new OrcaUtilizationStore, we may could get better solution than previous HostLbPolicyData?

That makes sense, it would allow the LoadBalancerConfig to control which ORCA fields get extracted instead of the current hardcoded logic. I'll take a look at this.

I initially took a stab at extracting HostLbPolicyData into a central class with #43695 but I'll re-think this approach.

wbpcode · 2026-03-17T05:47:56Z

cc @jukie recently we do have lots of new LB policies. Orz. See #43588 where the LbPolicyData also be used.

I think we may also could enhance the current LbPolicyData mechanism, to support multiple LbPolicyData in the single host. 🤔

paul-r-gall · 2026-03-17T13:17:14Z

@jukie I agree with @wbpcode's recommendation for enhancing LbPolicyData to have multiple on a single host.

jukie · 2026-03-17T14:33:33Z

Sounds good, thanks for the feedback!

jukie · 2026-03-17T23:08:25Z

@paul-r-gall @wbpcode starting on the multi-entry approach in #43995 if you could take a look please. Let me know if you'd rather see that in this PR.

wbpcode · 2026-03-18T10:07:32Z

CODEOWNERS

 /*/extensions/load_balancing_policies/client_side_weighted_round_robin @wbpcode @adisuissa @efimki
 /*/extensions/load_balancing_policies/override_host @yanavlasov @tonya11en
 /*/extensions/load_balancing_policies/wrr_locality @wbpcode @adisuissa @efimki
+/*/extensions/load_balancing_policies/load_aware_locality @wbpcode @adisuissa @efimki @jukie


You may could remove me from the owner list because we may have no enough bandwidth to sponsor this new extension. 😞

Add Load Aware Locality LB Policy

8509dc8

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie force-pushed the load-aware-locality-lb branch from 796eb56 to 8509dc8 Compare March 5, 2026 05:51

jukie added 2 commits March 4, 2026 23:27

deps

4f23faa

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

test-coverage and reduce integration flakiness

83f0b03

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie force-pushed the load-aware-locality-lb branch from e82f25f to 83f0b03 Compare March 5, 2026 07:29

jukie commented Mar 5, 2026

View reviewed changes

CODEOWNERS Outdated Show resolved Hide resolved

repokitteh-read-only bot added the api label Mar 5, 2026

jukie added 2 commits March 5, 2026 01:36

stats and fix unit tests

ca929b6

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Comments and reserved fields

09507df

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie commented Mar 5, 2026

View reviewed changes

envoy/upstream/host_description.h Show resolved Hide resolved

jukie added 2 commits March 5, 2026 08:33

format-proto

29daa6b

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

changelog

080b386

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie mentioned this pull request Mar 5, 2026

Proposal: ORCA-driven locality routing for zone-aware load balancing #43665

Open

jukie marked this pull request as ready for review March 5, 2026 15:41

repokitteh-read-only bot assigned wbpcode Mar 5, 2026

Fix flakiness

8344a71

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

markdroth reviewed Mar 5, 2026

View reviewed changes

markdroth self-assigned this Mar 5, 2026

feedback

cd4475d

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

repokitteh-read-only bot removed the api label Mar 5, 2026

jukie and others added 3 commits March 6, 2026 08:45

restore early-return flow in maybeProcessOrcaLoadReport

0f5b5ba

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Merge branch 'main' into load-aware-locality-lb

50e411a

Signed-off-by: Isaac Wilson <isaac.wilson514@gmail.com>

comments

0839e6f

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

RyanTheOptimist removed the deps Approval required for changes to Envoy's external dependencies label Mar 9, 2026

jukie added 3 commits March 10, 2026 22:48

Add acquire-release to OrcaUtilizationStore and cleanup

90cb171

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Update CODEOWNERS

2a07627

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Add support for priority failover and reject unsupported child lb pol…

82bf2c3

…icies Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie added 4 commits March 12, 2026 07:44

rout degraded/panic picks by selectable localities

3de47d9

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into load-aware-locality-lb

2fece89

Skip child-lb creation for empty hosts

3c6e19b

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

format

b2467ec

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

markdroth reviewed Mar 12, 2026

View reviewed changes

jukie added 8 commits March 12, 2026 09:58

Fix tets

02717d7

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Spelling

3566793

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Fix tests for-real

b0b9952

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

Fix test coverage. Will likely rewrite the test structure, it's grown…

54f78cb

… too much Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

rewrite LB coverage suite with helper-driven cases

35acf52

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

coverage

6e44ab4

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

More coverage and integration rewrite

9c9d7ba

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

jukie requested a review from markdroth March 12, 2026 22:58

remove child policy restriction

030a0ba

Signed-off-by: jukie <10012479+jukie@users.noreply.github.com>

wbpcode reviewed Mar 13, 2026

View reviewed changes

yanavlasov self-assigned this Mar 17, 2026

jukie mentioned this pull request Mar 17, 2026

support multiple HostLbPolicyData entries per host #43995

Open

wbpcode reviewed Mar 18, 2026

View reviewed changes

Conversation

jukie commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

repokitteh-read-only bot commented Mar 5, 2026

Uh oh!

Uh oh!

jukie commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

repokitteh-read-only bot commented Mar 5, 2026

Uh oh!

markdroth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markdroth commented Mar 5, 2026

Uh oh!

jukie commented Mar 6, 2026

Uh oh!

agrawroh commented Mar 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jukie commented Mar 12, 2026

Uh oh!

phlax commented Mar 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jukie Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jukie Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jukie Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wbpcode commented Mar 17, 2026

Uh oh!

paul-r-gall commented Mar 17, 2026

Uh oh!

jukie commented Mar 17, 2026

Uh oh!

jukie commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jukie commented Mar 5, 2026 •

edited

Loading

jukie commented Mar 5, 2026 •

edited

Loading

jukie Mar 12, 2026 •

edited

Loading

jukie Mar 12, 2026 •

edited

Loading

jukie Mar 12, 2026 •

edited

Loading