Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many rate limiters are too many? #3981

Closed
dhempy opened this issue Sep 26, 2018 · 6 comments
Closed

How many rate limiters are too many? #3981

dhempy opened this issue Sep 26, 2018 · 6 comments

Comments

@dhempy
Copy link

dhempy commented Sep 26, 2018

Ruby version:
sidekiq-ent (1.5.4)

We use Sidekiq::Limiter.concurrent() to good effect in several places. I'm about to implement a single-concurrency worker to ensure no duplicates get imported into a system in a race. (Detecting duplicates is complicated and cannot be implemented in the database). I started out with:

  Sidekiq::Limiter.concurrent('Importer', 1, wait_timeout: 0, lock_timeout: 60)

This is fine, except we're importing a LOT of records, and I'd like to take advantage of the many servers in our farm to process non-duplicate candidates.

There are a dozen fields that contribute to uniqueness with some fuzzy matches. However, there is a key, "purchase_order_id" which distinguishes 99.9% of the records coming in at any one time. So, what I'd like to do is put that in the Limiter name, similar to the Stripe user example in the Sidekiq docs:

  Sidekiq::Limiter.concurrent("Importer-#{purchase_order_id}, 1, wait_timeout: 0, lock_timeout: 60)

So, my question is...how many is too many? If we had thousands of these simultaneously processing, would that cause any problem?

If we churned through millions per hour, would that gum up Redis over time?

If either of those are issues, do you have a recommended threshhold?

@mperham
Copy link
Collaborator

mperham commented Sep 26, 2018

I'm unsure as I don't have a lot of data on limiter scale.

There's a few things you can do to minimize risk:

  1. Use a separate Redis instance for limiters.
  2. Don't visit the Limiter page in the Web UI or things will go badly. I don't believe it implements paging.
  3. Set the TTL to aggressively expire unused rate limiters: ttl: 24.hours

I would think you can use 1000s of limiters simultaneously. I would try it in staging and let me know what you learn about scale. Heavy usage will not gum anything up over time.

@dhempy
Copy link
Author

dhempy commented Sep 26, 2018

Thanks, Mike. I missed the TTL option. I'll use ttl: 1.day, for sure.

On further thought, I think we're going to use the last 3 digits of the ID for the rate limiter name. This will avoid races between potentially matching records, and sidestep any limiter-count overload entirely. We're only looking to improve parallelism -- not implement guaranteed uniqueness via the rate limiter. This approach should give us both.

If we generate any interesting data, I'll share it with you.

@mperham mperham closed this as completed Oct 3, 2018
@raivil
Copy link

raivil commented Apr 3, 2020

Hey,

Do you have more data on this issue?
I'm facing a similar situation where the exclusive rate limiter Redis instance it almost using all the memory.
App uses concurrent limiter of size 1 to make a distributed mutexes.

We're considering reducing some of the locks TTL to less than 24.hours but before doing that I'd like to understand why it's not recommended to go below 24 hours, as written on https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting#ttl

thanks.

@mperham
Copy link
Collaborator

mperham commented Apr 3, 2020

I would wonder why you are running out of memory on Redis. You must be running a very small Redis instance or making many, many millions of limiters.

I recommend 24 hours because it should never, ever go lower than the max amount of time that the limiter might be held. If you have a long-running job which takes the limiter, it might be held for hours. If your jobs only hold the limiter for seconds or minutes, you can drop the TTL to a few hours.

@locofocos
Copy link

Thanks for the info. I came here from a search after reading https://github.com/mperham/sidekiq/wiki/Ent-Rate-Limiting#ttl which recommended 24 hours.

Just as another example, we're doing exactly what you mentioned- running millions of limiters 😆 We have a multi-tenant ecommerce software that uses a concurrent limiter. We use a limiter name based on the full product ID, I think to avoid concurrent updates for that product in other systems. Over the past 90 days, we've used Sidekiq::Limiter.concurrent for about 3.8 million records. Because sidekiq is generating redis keys for both lmtr-c-whatever and lmtr-cfree for each limiter, that put us at over 7 million redis keys!

As someone a little new to sidekiq, it took me a while to figure out where the redis keys were coming from. One thing that would have helped me figure this out sooner: if there was sidekiq documentation to the effect of

If your redis instance is using too much memory / has to many keys, use an rdb analyzer to grab the unique keys. If you see key names like lmtr-cfree-asdf and lmtr-c-asdf, those are created when you call Sidekiq::Limiter.concurrent('asdf', ...).

Thanks again for the tips!

@mperham
Copy link
Collaborator

mperham commented Mar 12, 2021

@locofocos The wiki is publicly editable. This is great content that you are welcome to add to the page. 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants