Cross instances local rate limit filter #34230

wbpcode · 2024-05-17T14:20:54Z

Title: Cross instances local rate limit filter

Description:

Local rate limit works more stable and has no additional dependency. It basically is our first choice to do rate limiting.

The only shortage of local rate limit is that the token bucket configuration works independently in different Envoy instances.

This means the replica number of Envoy will effect the final throughout of limiter.

It's not friendly for users who don't know the technology details. And the replica number may be changed dynamically.

But the Envoy actually could know the replica number of it self, by the local cluster.

So, I think it's possible to let all Envoy instances (in same gateway cluster or in same service of mesh) to share a token bucket. Every instance will be pre-allocated part of the bucket quantitative by specific algorithm. (for example, even allocation.) And when the membership of local cluster is changed, we re-execute the algorithm again.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

ravenblackx · 2024-05-17T14:42:06Z

I'm assuming this doesn't need anyone pinged for triage since wbpcode filed it and is the person I would ping. :)

wbpcode · 2024-05-18T03:39:52Z

A possible path to reach this target:

To extend the cluster manager to expose additional method to accept a callback to watch membership change of local cluster.
To create a singleton (managed by the singleton manager of server context) on need to watch the membership change of local cluster and calculate the token share of current Envoy instance.
To use the token share when the rate limit filter refilling the token bucket.

If no local cluster is provided, the token share will be 1.0 forever and will not change anything.
When all the local limit filter is disabled or not used, the singleton will be destroyed automatically and unregister the watching callback.

juanmolle · 2024-05-18T15:25:29Z

The first question I need to do is "what is the local cluster and how can be used to know the number of replicas?". The number of replicas is the only info can be retrieved or there could be other info sheared between replica? Probably this info could be useful for other filter or custom wasm filter, right?

ramaraochavali · 2024-05-20T14:53:56Z

local rate limit filter is intended to protect a single instance of a service i.e. how much an instance of envoy can process. I think constantly changing limit based on membership count (during HPA or any scale up/scale down operations) would be very confusing and hard to reason about. Curious why don't you use global rate limiting if you need such a behaviour?

wbpcode · 2024-05-21T09:45:10Z

cc @ramaraochavali Global rate limiting introduce additional dependency (rate limit server, redis) and latency, and may not work properly if it's overload.

We also use the local limit in the gateway mode where it's hard to say the local limit is used to protect only one instance of service. And we only enable it for users who know it and require it. So, I believe it won't confuse anyone.

I think constantly changing limit based on membership count (during HPA or any scale up/scale down operations) would be very confusing and hard to reason about.

From the other side, I think it's also confusing for users who want a total limit (like in gateway mode) but the total limit will be changed because the HPA or any scale up/down operations. This new feature will provide a new option to let the local rate limit work with a stable total limit of whole Envoy cluster/service.

wbpcode · 2024-05-21T09:47:08Z

what is the local cluster and how can be used to know the number of replicas?

local cluster is a special cluster that contains the Envoy instance self. See local cluster name in the https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-clustermanager

I have prepared a PR. You can check it if you are interested in that.

ramaraochavali · 2024-05-21T10:20:02Z

From the other side, I think it's also confusing for users who want a total limit (like in gateway mode) but the total limit will be changed because the HPA or any scale up/down operations.

Are you saying the total limit would be changed during HPA by operators based on the number of nodes configured for gateway?

wbpcode · 2024-05-21T10:35:35Z

@ramaraochavali I mean if local rate limit is used, for example, 100 tokens per second is configured, the total limit is 100 * number of Envoy instances.

But the instances of Envoy will be changed at runtime because the HPA or something. So, the total limit will also be changed. But in the gateway mode, the users will expect a stable total limit in most cases regardless of the number of the Envoy instances.

ramaraochavali · 2024-05-21T10:42:53Z

But in the gateway mode, the users will expect a stable total limit in most cases regardless of the number of the Envoy instances.

I see. So when a new node comes/node goes down, the current envoy instance limit may go up/down causing few inflight requests to fail because there is another node in the cluster which otherwise would have passed if membership did not change. We have always used local rate limit as a service protection mechanism per envoy instance so trying to understand more about the use case

ldb · 2024-05-21T11:26:56Z

I would also be very interested, we are currently exploring ways to implement a suitable approach to rate limiting that is aware of the number of envoy instances but reduces the number of extra dependencies and also calls that are being made during request processing.
Global Rate Limiting would have to call out to the rate limiting service which in turn calls out to some redis or memcached (in the reference implementation at least) which has the potential to increase request latency a lot.

Implementing a shared local rate limiting approach would be a good approach here.

As I understand, a shared token bucket would (/could) also mean that tokens can be used from a different instance in the local cluster? That would mitigate the problem @ramaraochavali mentioned where scaling during requests could fail requests that would otherwise have passed.

wbpcode · 2024-05-21T12:03:39Z

As I understand, a shared token bucket would (/could) also mean that tokens can be used from a different instance in the local cluster? That would mitigate the problem @ramaraochavali mentioned where scaling during requests could fail requests that would otherwise have passed.

Nope. Envoy cannot actually share data or message with other instances. We can only compute a share/pecentage base on the membership and apply the share to the token buckets.

wbpcode added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels May 17, 2024

ravenblackx added area/ratelimit and removed triage Issue requires triage labels May 17, 2024

wbpcode mentioned this issue May 21, 2024

local rate limit: add cross local cluster rate limit support #34276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross instances local rate limit filter #34230

Cross instances local rate limit filter #34230

wbpcode commented May 17, 2024 •

edited

ravenblackx commented May 17, 2024

wbpcode commented May 18, 2024 •

edited

juanmolle commented May 18, 2024

ramaraochavali commented May 20, 2024

wbpcode commented May 21, 2024 •

edited by ravenblackx

wbpcode commented May 21, 2024 •

edited

ramaraochavali commented May 21, 2024

wbpcode commented May 21, 2024

ramaraochavali commented May 21, 2024

ldb commented May 21, 2024

wbpcode commented May 21, 2024 •

edited

Cross instances local rate limit filter #34230

Cross instances local rate limit filter #34230

Comments

wbpcode commented May 17, 2024 • edited

ravenblackx commented May 17, 2024

wbpcode commented May 18, 2024 • edited

juanmolle commented May 18, 2024

ramaraochavali commented May 20, 2024

wbpcode commented May 21, 2024 • edited by ravenblackx

wbpcode commented May 21, 2024 • edited

ramaraochavali commented May 21, 2024

wbpcode commented May 21, 2024

ramaraochavali commented May 21, 2024

ldb commented May 21, 2024

wbpcode commented May 21, 2024 • edited

wbpcode commented May 17, 2024 •

edited

wbpcode commented May 18, 2024 •

edited

wbpcode commented May 21, 2024 •

edited by ravenblackx

wbpcode commented May 21, 2024 •

edited

wbpcode commented May 21, 2024 •

edited