Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Points-based Limiter #5757

Closed
Tracked by #5769
kirillplatonov opened this issue Jan 31, 2023 · 11 comments
Closed
Tracked by #5769

Points-based Limiter #5757

kirillplatonov opened this issue Jan 31, 2023 · 11 comments
Milestone

Comments

@kirillplatonov
Copy link

The existing leaky bucket limiter supports request-based rate limiting. Some APIs like provide points-based limiters. For example Shopify GraphQL API:
We're given 1000 points max. Every API request has its cost and returns the remaining points in response. The bucket is refilled at a rate of X points per second.

It would be great if Sidekiq Enterprise could support this use case as well.

References:
https://shopify.dev/api/usage/rate-limits#graphql-admin-api-rate-limits

@mperham
Copy link
Collaborator

mperham commented Jan 31, 2023

If Shopify wanted to contribute an official limiter I would consider it but this is pretty hyper-specialized for me to build and support long-term. I generally don't build things which are only used for one site. Are there any other points-based limiters elsewhere? Can we build a standard limiter that works with all of those sites, esp if those sites add response headers which mutate the limiter logic somehow (for example, a hypothetical X-Limiter-Points-Consumed: 10.75)?

@kirillplatonov
Copy link
Author

Point-based limits seem to be pretty common for GraphQL-based APIs. It's used to limit the complexity of queries made during a certain period.

I found a few other GraphQL APIs that use a similar approach to rate limits:
https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit
https://developers.linear.app/docs/graphql/working-with-the-graphql-api/rate-limiting#api-request-complexity-limits
https://developer.monday.com/api-reference/docs/rate-limits

What makes it hyper-specialized on Shopify is the combination of point limits with a leaky bucket. I guess having just a standard points-based limiter that limits the number of points for a certain period should be enough for most cases. The logic for fetching remaining points should definitely be part of the application code as some APIs return it in headers and some in the response body. The app could just write back the remaining number to Sidekiq. For example:

limiter = Sidekiq::Limiter.points("shopify-graphql-#{user_id}", 1000, 20) # 1000 points every 20 seconds
limiter.within_limit do
  # make graphql call and fetch remaining points from response
  limiter.remaining(850)
end

Alternatively, we can pass to limiter how many points has been consumed:

limiter = Sidekiq::Limiter.points("shopify-graphql-#{user_id}", 1000, 20) # 1000 points every 20 seconds
limiter.within_limit do
  # make graphql call and fetch consumed points from response
  limiter.points_consumed(150)
end

@kirillplatonov kirillplatonov changed the title Points-based Leaky Bucket Limiter Points-based Limiter Feb 1, 2023
@mperham
Copy link
Collaborator

mperham commented Feb 1, 2023

I think the points aspect is perfectly normal, points are just drops in the leaky bucket. You get back X drops per second or minutes.

The novel twist is that each call dynamically consumes M points which you don't know until the call is finished. The tricky aspect is estimating the point usage before the call and how to fetch the actual points consumed after the call that is specialized to each service provider. As far as I can tell, none of the links you gave (thanks!) share much of any logic.

I could provide a generic points limiter which has two additions:

limiter = Sidekiq::Limiter.points("foo", 5000, :minute)

estimate = # you estimate how many points this call will make
limiter.within_limit(estimate: 300) do |handle|
  # do work

  actual = # you calculate or get the actual points value consumed
  handle.finalize(actual)
end

Something like that might work but I'd need to build a prototype. It may not be worth it to implement the finalize bit as there is an open question about points and windows of time. Might be better to just implement an estimate and let them rate limit you if your estimate is off.

@kirillplatonov
Copy link
Author

I think having estimate with rate limit error handling should be enough for most cases. finalize will add additional precision and will allow making more calls if the actual cost is much less than the estimate. But that will make a difference only at a large scale.

@mperham
Copy link
Collaborator

mperham commented Feb 13, 2023

Sure. This is low priority for me so don't expect it soon.

@th-ad
Copy link

th-ad commented Feb 14, 2023

I put together a rough implementation based on this conversation and the existing enterprise limiters.

@mperham What would be the best way to send it to you?

@mperham
Copy link
Collaborator

mperham commented Feb 14, 2023

@th-ad Cool. You can send it to mike@contribsys.com

@mperham
Copy link
Collaborator

mperham commented Feb 15, 2023

Thanks @th-ad, I have this working now: it calls successfully four times and raises on the fifth call, as expected.

  def test_typical_flow
    limiter = Sidekiq::Limiter.points(:shopify, 1000, 20, wait_timeout: 0)
    count = 0
    e = assert_raises Sidekiq::Limiter::OverLimit do
      5.times do
        limiter.within_limit(estimate: 300) do |helper|
          count += 1
          helper.finalize(200)
        end
      end
    end
    assert_equal 4, count
    assert_equal "shopify: need 300 points, have 200", e.message
  end

Is finalize the right method name to report the actual points used?

@kirillplatonov
Copy link
Author

points_used might be a good option as it's more clear from the name what it does

@mperham mperham mentioned this issue Feb 15, 2023
10 tasks
@mperham mperham added this to the 7.1 milestone Feb 21, 2023
@mperham
Copy link
Collaborator

mperham commented Feb 22, 2023

I'm finalizing the API design and implementation. The happy path is great but there's one big unanswered question: what do we do if the remote side raises an error?

We've already consumed the estimated points locally but we don't know if they've been consumed on the remote side. If I have 1000 points and I make a call with an estimate of 200 points and that call raises an error, do I still have 1000 points, 800 or other?

  • What if you get a 503? 401?
  • Which error types should consume the points?
  • Which error types should reclaim the points?
  • Should the caller be responsible for calling points_used(0) to reclaim the estimated points?

I'm inclined to say that this "unhappy path" bookkeeping is the responsibility of the caller to handle. What do you think?

@kirillplatonov
Copy link
Author

kirillplatonov commented Feb 22, 2023

I'm inclined to say that this "unhappy path" bookkeeping is the responsibility of the caller to handle. What do you think?

I agree with it. Handling errors on the limiter's level doesn't make much sense. Implementations of Graphql clients vary a lot and some clients don't even raise Ruby exceptions in case of errors so there's nothing to catch.

I think being able to report points_used(0) to reclaim points should be enough. Some types of errors won't consume points and would be great to have a way to return them back.

@mperham mperham closed this as completed Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants