Skip to content

Conversation

@constanca-m
Copy link
Contributor

Issue is #619.

Comment on lines 206 to 207
current := r.addRequests(uniqueKey, hits)
delay := time.Duration(resp.GetResetTime()-createdAt+int64(current)*cfg.ThrottleInterval.Milliseconds()) * time.Millisecond
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigneshshanmugam Can I ask your opinion on this logic before I do the same for the local rate limiter?

This bug was added to GA goals: #619

This implementation has some things that worry me, but I don't if it makes sense to be worried over this:

  1. What if we receive tons of requests? The delay will be increasing for each request, and then we will have many processes on hold. I think it makes sense to maybe set the delay to a fixed time, and after that time, each request starts getting rejected, regardless of the strategy. WDYT?
  2. What if we receive requests like this:
    • We get a request that takes 10 tokens
    • We then get a request that takes 4 tokens
    • Both are delayed. Should the request of 4 tokens take priority, thus less delay for it, or should the 10 tokens have priority since it arrived first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think you need to do this, The issue here is that we are allowing all the clients to retry and send data simultanously once we past the reset time which is wrong. Its assuming the reset time guarentees the availability of tokens - requests/bytes based on the strategy.

The ideal way we could fix is basically looping through and retry after the delay, re checking the rate limits at each time. This respects all the configurations like strategy/algorithm along with keeping the race condition in check.

Rough sudo code would be

for {
  // check rate limit
 resp = getCurrentLimits()
 if resp.IsUnderLimit {
  // not limited
   return nil
 }
 
// we shouldn't use createAt here its flawed as its based on when the request is made and doesnt take retries in to account. 
  delay := time.Duration(resp.GetResetTime()-time.Now().UnixMilli()) * time.Millisecond
  // same code as before



} 

Hope this helps. Let me know if you want more details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This helped. I have updated the code, it should be working correctly now

@constanca-m constanca-m marked this pull request as ready for review August 18, 2025 12:26
@constanca-m constanca-m requested a review from a team as a code owner August 18, 2025 12:26
if err := makeRateLimitRequest(); err != nil {
return err
}
if resp.GetStatus() == gubernator.Status_UNDER_LIMIT {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels incorrect, we are using the old response, we need to get the new response after retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the new response, it changed on makeRateLimitRequest() call two lines above:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, it was quite confusing with the diff. Can we change the function to return resp, err so its easier to reason this? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I have changed the code

@constanca-m constanca-m merged commit 1dc8ec4 into elastic:main Aug 26, 2025
13 checks passed
@constanca-m constanca-m deleted the ratelimiter-reset branch August 26, 2025 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants