Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on 429 is not granular enough #34

Closed
jefferai opened this issue Apr 26, 2018 · 10 comments
Closed

Retry on 429 is not granular enough #34

jefferai opened this issue Apr 26, 2018 · 10 comments

Comments

@jefferai
Copy link

I think #32 should be backed out.

As far as I'm aware you can't configure per-status-code backoff strategies. The problem is that with a 429 further attempts to retry may actually keep counting against your rate limiter. So unlike a 503 where you may want to keep trying once a second, trying once a second on a 429 may mean that you are never able to successfully make a call.

At the very least I think if #32 is kept in there needs to be a way to configure a separate backoff strategy for it. And if it is kept in, perhaps an additional boolean can be added to control its behavior.

@pritamsarkar007
Copy link

pritamsarkar007 commented Apr 26, 2018

@jefferai The problem you are explaining here may happen for 503 as well, as 503 is also another HTTP code, which is widely used for rate limiting. For example AWS APIs send 503 if your call gets rate limited. Also nginx's default rate limit code is 503 as well.

@jefferai
Copy link
Author

From the RFC (https://tools.ietf.org/html/rfc6585#page-3):

Note that this specification does not define how the origin server identifies the user, nor how it counts requests.

When using 429 it is explicitly up to the implementation to decide whether or not requests count even if they are not satisfied by the backend service. They may very well count further requests against quota because a client that persists in sending too many requests while not performing (potentially expensive) request handling in the backend service is one that will likely keep sending too many requests after they are allowed back in, resulting in further 429s and potentially large usage of server resources in the mean time. Plus, they may well be indicative of an unfixed error in the client or attempted DDoS or brute force attack.

It is a totally reasonable implementation to keep 429ing until the rate of client requests over time drops to below an allowed threshold, and only then allow the client back in. Pester doesn't allow per-code backoff strategies so you can't handle this scenario without also changing strategies for 5xx codes.

Contrast this to 503 (https://tools.ietf.org/html/rfc7231#page-63):

   The 503 (Service Unavailable) status code indicates that the server
   is currently unable to handle the request due to a temporary overload
   or scheduled maintenance, which will likely be alleviated after some
   delay.  The server MAY send a Retry-After header field
   (Section 7.1.3) to suggest an appropriate amount of time for the
   client to wait before retrying the request.

Using 503 for rate limiting is not disallowed, but it is not explicitly defined for rate limiting. It may be a temporary blip or a failover to a different server, and as a result retrying at a relatively healthy clip may well prove fruitful because nothing is telling you "the problem is that you're trying too fast". And due to this, when using 503 a server likely won't count additional requests against the client, because they haven't told the client that's the problem to begin with so it's not reasonable to expect a client to slow down their request rate.

@sethgrid
Copy link
Owner

@jefferai thanks for the feedback! You bring up a good point I did not consider. Have you seen a case "in the wild," or is this a preemptive concern?

As for a potential fix, we could put 429 retries behind a boolean that defaults to false, and add a client method to allow setting it to true. I'm a bit swamped right now. I could maybe write that up over the weekend. PRs are welcome.

@pritamsarkar007 thoughts?

@jefferai
Copy link
Author

Hi Seth,

I haven't seen it in the wild, but take that to mean an ignorance at both the broad spectrum of software that includes rate limiting and the particulars of the implementation of rate limiting.

That said, it's something I've been thinking about because the project I work on, which is security related, may gain a rate limiting feature, and likely it would behave in the way I suggested. There are a number of beneficial security properties, both from external threats and accidental misconfiguration perspectives, that come from keeping misbehaving clients blocked until you have good confidence that they are behaving correctly.

@pritamsarkar007
Copy link

@sethgrid I can make that change. @jefferai does that address your concern(adding a flag that controls whether the library will retry 429 responses)?
Also, for 429, should we use always use exponential backoff always? @jefferai @sethgrid thoughts?

@jefferai
Copy link
Author

Yes, that would address my concern. I would also suggest that the default behavior be off.

@sethgrid
Copy link
Owner

sethgrid commented Apr 27, 2018 via email

@pritamsarkar007
Copy link

@sethgrid I will get it done over the weekend.

@pritamsarkar007
Copy link

@jefferai @sethgrid PR submitted

@sethgrid
Copy link
Owner

@jefferai @pritamsarkar007 #35 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants