Cluster-level hash policy for sticky routing #23060

agrawroh · 2022-09-10T21:08:38Z

Description

Currently, the hashing policy is defined on the per-route basis [Link] using hash_policy.

We have a use-case where we want to do sticky routing for all the incoming traffic for the external ExtAuthZ and RateLimit services but, there is no good way to achieve it.

We can benefit a lot from a consistent hash Load Balancing like Ring Hash and Maglev by hashing on one of the HTTP headers to achieve consistent hashing and leverage the in-memory cache we have per-replica in these upstream services.

Is it possible to support LB hashing policy on the cluster-level (for all the routes)?

The text was updated successfully, but these errors were encountered:

htuch · 2022-09-12T15:23:41Z

Yeah, there is support in the AsyncClient interface (via RequestOptions), but not configurable in a uniform way for things like ext_authz. One option would be to add this to the GrpcService.EnvoyGrpc config. This would avoid any major changes such as having to mix routing logic with ClusterManager. Does this work?

If not, I think the idea of having some cluster-wide control might have merit, but is a deeper discussion that would require @envoyproxy/api-shepherds and @mattklein123 to weigh in.

agrawroh · 2022-09-12T15:50:52Z

@htuch Thanks for chiming in. There was a similar request for mirroring traffic.
Is it possible to identify some of the things that we currently only have on routes which would also make sense to be on the cluster-level and then think more on what would be the best place?

htuch · 2022-09-13T02:04:23Z

Yeah, there are others, e.g. fault injection. One thing I can offer here is a workaround - you can loopback the ext_authz cluster through a listener bound to localhost and have that apply a standard route table before hitting the real backend cluster. This is a total kludge, but if it helps your use case it might be worth considering.

I think we should leave this issue open to gauge wider interest. The API design here would require some careful thought.

agrawroh · 2022-09-13T05:51:13Z

Thanks, @htuch! That's exactly what we are doing right now for mirroring & splitting the traffic :)

One more question if you know the answer on top of your head - If we have hash_policy defined on the routes and the clusters to which the traffic is being mirrored/split have one of RING_HASH or MAGLEV LB Policy then would it do sticky routing? Or would it ignore the hash_policy and the split will be completely random?

htuch · 2022-09-13T14:26:19Z

In both cases Envoy is using an independent per-cluster HTTP async client with its own pseudo-config, so I strongly suspect the answer is it will ignore the original route hash policy.

agrawroh added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Sep 10, 2022

snowp added help wanted Needs help! and removed triage Issue requires triage labels Sep 12, 2022

htuch added area/load balancing area/cluster labels Sep 13, 2022

htuch mentioned this issue Feb 24, 2023

[CDS] Add/port retry_policy to Envoy cluster. #25751

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster-level hash policy for sticky routing #23060

Cluster-level hash policy for sticky routing #23060

agrawroh commented Sep 10, 2022 •

edited

htuch commented Sep 12, 2022 •

edited

agrawroh commented Sep 12, 2022

htuch commented Sep 13, 2022 •

edited

agrawroh commented Sep 13, 2022

htuch commented Sep 13, 2022

Cluster-level hash policy for sticky routing #23060

Cluster-level hash policy for sticky routing #23060

Comments

agrawroh commented Sep 10, 2022 • edited

Description

htuch commented Sep 12, 2022 • edited

agrawroh commented Sep 12, 2022

htuch commented Sep 13, 2022 • edited

agrawroh commented Sep 13, 2022

htuch commented Sep 13, 2022

agrawroh commented Sep 10, 2022 •

edited

htuch commented Sep 12, 2022 •

edited

htuch commented Sep 13, 2022 •

edited