Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross instances local rate limit filter #34230

Open
wbpcode opened this issue May 17, 2024 · 11 comments
Open

Cross instances local rate limit filter #34230

wbpcode opened this issue May 17, 2024 · 11 comments
Labels
area/ratelimit enhancement Feature requests. Not bugs or questions.

Comments

@wbpcode
Copy link
Member

wbpcode commented May 17, 2024

Title: Cross instances local rate limit filter

Description:

Local rate limit works more stable and has no additional dependency. It basically is our first choice to do rate limiting.

The only shortage of local rate limit is that the token bucket configuration works independently in different Envoy instances.

This means the replica number of Envoy will effect the final throughout of limiter.

It's not friendly for users who don't know the technology details. And the replica number may be changed dynamically.

But the Envoy actually could know the replica number of it self, by the local cluster.

So, I think it's possible to let all Envoy instances (in same gateway cluster or in same service of mesh) to share a token bucket. Every instance will be pre-allocated part of the bucket quantitative by specific algorithm. (for example, even allocation.) And when the membership of local cluster is changed, we re-execute the algorithm again.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

@wbpcode wbpcode added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels May 17, 2024
@ravenblackx ravenblackx added area/ratelimit and removed triage Issue requires triage labels May 17, 2024
@ravenblackx
Copy link
Contributor

I'm assuming this doesn't need anyone pinged for triage since wbpcode filed it and is the person I would ping. :)

@wbpcode
Copy link
Member Author

wbpcode commented May 18, 2024

A possible path to reach this target:

  1. To extend the cluster manager to expose additional method to accept a callback to watch membership change of local cluster.
  2. To create a singleton (managed by the singleton manager of server context) on need to watch the membership change of local cluster and calculate the token share of current Envoy instance.
  3. To use the token share when the rate limit filter refilling the token bucket.

If no local cluster is provided, the token share will be 1.0 forever and will not change anything.
When all the local limit filter is disabled or not used, the singleton will be destroyed automatically and unregister the watching callback.

@juanmolle
Copy link
Contributor

The first question I need to do is "what is the local cluster and how can be used to know the number of replicas?". The number of replicas is the only info can be retrieved or there could be other info sheared between replica? Probably this info could be useful for other filter or custom wasm filter, right?

@ramaraochavali
Copy link
Contributor

local rate limit filter is intended to protect a single instance of a service i.e. how much an instance of envoy can process. I think constantly changing limit based on membership count (during HPA or any scale up/scale down operations) would be very confusing and hard to reason about. Curious why don't you use global rate limiting if you need such a behaviour?

@wbpcode
Copy link
Member Author

wbpcode commented May 21, 2024

cc @ramaraochavali Global rate limiting introduce additional dependency (rate limit server, redis) and latency, and may not work properly if it's overload.

We also use the local limit in the gateway mode where it's hard to say the local limit is used to protect only one instance of service. And we only enable it for users who know it and require it. So, I believe it won't confuse anyone.

I think constantly changing limit based on membership count (during HPA or any scale up/scale down operations) would be very confusing and hard to reason about.

From the other side, I think it's also confusing for users who want a total limit (like in gateway mode) but the total limit will be changed because the HPA or any scale up/down operations. This new feature will provide a new option to let the local rate limit work with a stable total limit of whole Envoy cluster/service.

@wbpcode
Copy link
Member Author

wbpcode commented May 21, 2024

what is the local cluster and how can be used to know the number of replicas?

local cluster is a special cluster that contains the Envoy instance self. See local cluster name in the https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-clustermanager

I have prepared a PR. You can check it if you are interested in that.

@ramaraochavali
Copy link
Contributor

From the other side, I think it's also confusing for users who want a total limit (like in gateway mode) but the total limit will be changed because the HPA or any scale up/down operations.

Are you saying the total limit would be changed during HPA by operators based on the number of nodes configured for gateway?

@wbpcode
Copy link
Member Author

wbpcode commented May 21, 2024

@ramaraochavali I mean if local rate limit is used, for example, 100 tokens per second is configured, the total limit is 100 * number of Envoy instances.

But the instances of Envoy will be changed at runtime because the HPA or something. So, the total limit will also be changed. But in the gateway mode, the users will expect a stable total limit in most cases regardless of the number of the Envoy instances.

@ramaraochavali
Copy link
Contributor

But in the gateway mode, the users will expect a stable total limit in most cases regardless of the number of the Envoy instances.

I see. So when a new node comes/node goes down, the current envoy instance limit may go up/down causing few inflight requests to fail because there is another node in the cluster which otherwise would have passed if membership did not change. We have always used local rate limit as a service protection mechanism per envoy instance so trying to understand more about the use case

@ldb
Copy link

ldb commented May 21, 2024

I would also be very interested, we are currently exploring ways to implement a suitable approach to rate limiting that is aware of the number of envoy instances but reduces the number of extra dependencies and also calls that are being made during request processing.
Global Rate Limiting would have to call out to the rate limiting service which in turn calls out to some redis or memcached (in the reference implementation at least) which has the potential to increase request latency a lot.

Implementing a shared local rate limiting approach would be a good approach here.

As I understand, a shared token bucket would (/could) also mean that tokens can be used from a different instance in the local cluster? That would mitigate the problem @ramaraochavali mentioned where scaling during requests could fail requests that would otherwise have passed.

@wbpcode
Copy link
Member Author

wbpcode commented May 21, 2024

As I understand, a shared token bucket would (/could) also mean that tokens can be used from a different instance in the local cluster? That would mitigate the problem @ramaraochavali mentioned where scaling during requests could fail requests that would otherwise have passed.

Nope. Envoy cannot actually share data or message with other instances. We can only compute a share/pecentage base on the membership and apply the share to the token buckets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ratelimit enhancement Feature requests. Not bugs or questions.
Projects
None yet
Development

No branches or pull requests

5 participants