Retry on 429 is not granular enough #34

jefferai · 2018-04-26T20:11:23Z

I think #32 should be backed out.

As far as I'm aware you can't configure per-status-code backoff strategies. The problem is that with a 429 further attempts to retry may actually keep counting against your rate limiter. So unlike a 503 where you may want to keep trying once a second, trying once a second on a 429 may mean that you are never able to successfully make a call.

At the very least I think if #32 is kept in there needs to be a way to configure a separate backoff strategy for it. And if it is kept in, perhaps an additional boolean can be added to control its behavior.

pritamsarkar007 · 2018-04-26T20:24:18Z

@jefferai The problem you are explaining here may happen for 503 as well, as 503 is also another HTTP code, which is widely used for rate limiting. For example AWS APIs send 503 if your call gets rate limited. Also nginx's default rate limit code is 503 as well.

jefferai · 2018-04-26T21:12:09Z

From the RFC (https://tools.ietf.org/html/rfc6585#page-3):

Note that this specification does not define how the origin server identifies the user, nor how it counts requests.

When using 429 it is explicitly up to the implementation to decide whether or not requests count even if they are not satisfied by the backend service. They may very well count further requests against quota because a client that persists in sending too many requests while not performing (potentially expensive) request handling in the backend service is one that will likely keep sending too many requests after they are allowed back in, resulting in further 429s and potentially large usage of server resources in the mean time. Plus, they may well be indicative of an unfixed error in the client or attempted DDoS or brute force attack.

It is a totally reasonable implementation to keep 429ing until the rate of client requests over time drops to below an allowed threshold, and only then allow the client back in. Pester doesn't allow per-code backoff strategies so you can't handle this scenario without also changing strategies for 5xx codes.

Contrast this to 503 (https://tools.ietf.org/html/rfc7231#page-63):

   The 503 (Service Unavailable) status code indicates that the server
   is currently unable to handle the request due to a temporary overload
   or scheduled maintenance, which will likely be alleviated after some
   delay.  The server MAY send a Retry-After header field
   (Section 7.1.3) to suggest an appropriate amount of time for the
   client to wait before retrying the request.

Using 503 for rate limiting is not disallowed, but it is not explicitly defined for rate limiting. It may be a temporary blip or a failover to a different server, and as a result retrying at a relatively healthy clip may well prove fruitful because nothing is telling you "the problem is that you're trying too fast". And due to this, when using 503 a server likely won't count additional requests against the client, because they haven't told the client that's the problem to begin with so it's not reasonable to expect a client to slow down their request rate.

sethgrid · 2018-04-27T01:22:13Z

@jefferai thanks for the feedback! You bring up a good point I did not consider. Have you seen a case "in the wild," or is this a preemptive concern?

As for a potential fix, we could put 429 retries behind a boolean that defaults to false, and add a client method to allow setting it to true. I'm a bit swamped right now. I could maybe write that up over the weekend. PRs are welcome.

@pritamsarkar007 thoughts?

jefferai · 2018-04-27T02:36:37Z

Hi Seth,

I haven't seen it in the wild, but take that to mean an ignorance at both the broad spectrum of software that includes rate limiting and the particulars of the implementation of rate limiting.

That said, it's something I've been thinking about because the project I work on, which is security related, may gain a rate limiting feature, and likely it would behave in the way I suggested. There are a number of beneficial security properties, both from external threats and accidental misconfiguration perspectives, that come from keeping misbehaving clients blocked until you have good confidence that they are behaving correctly.

pritamsarkar007 · 2018-04-27T12:58:43Z

@sethgrid I can make that change. @jefferai does that address your concern(adding a flag that controls whether the library will retry 429 responses)?
Also, for 429, should we use always use exponential backoff always? @jefferai @sethgrid thoughts?

jefferai · 2018-04-27T14:03:26Z

Yes, that would address my concern. I would also suggest that the default behavior be off.

sethgrid · 2018-04-27T14:04:43Z

I'd think that using the same backoff for rate limiting makes sense. Maybe a future version can add response code specific backoff strategies.

…

On Fri, Apr 27, 2018, 5:58 AM Pritam ***@***.***> wrote: @sethgrid <https://github.com/sethgrid> I can make that change. @jefferai <https://github.com/jefferai> does that address your concern(adding a flag that controls whether the library will retry 429 responses)? Also, for 429, should we use always use exponential backoff always? @jefferai <https://github.com/jefferai> @sethgrid <https://github.com/sethgrid> thoughts? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#34 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8afJn7I21mG59qjzor7uRij25waBGTks5tsxYDgaJpZM4TnbIy> .

pritamsarkar007 · 2018-04-27T15:24:59Z

@sethgrid I will get it done over the weekend.

pritamsarkar007 · 2018-04-28T00:22:02Z

@jefferai @sethgrid PR submitted

sethgrid · 2018-04-30T14:01:21Z

@jefferai @pritamsarkar007 #35 merged

jefferai mentioned this issue Apr 26, 2018

New release required for upgraded Pester library hashicorp/vault#4464

Closed

jefferai mentioned this issue Apr 26, 2018

HTTP Health check add Response HTTP codes option hashicorp/consul#2294

Open

sethgrid closed this as completed Apr 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry on 429 is not granular enough #34

Retry on 429 is not granular enough #34

jefferai commented Apr 26, 2018

pritamsarkar007 commented Apr 26, 2018 •

edited

Loading

jefferai commented Apr 26, 2018

sethgrid commented Apr 27, 2018

jefferai commented Apr 27, 2018

pritamsarkar007 commented Apr 27, 2018

jefferai commented Apr 27, 2018

sethgrid commented Apr 27, 2018 via email

pritamsarkar007 commented Apr 27, 2018

pritamsarkar007 commented Apr 28, 2018

sethgrid commented Apr 30, 2018

Retry on 429 is not granular enough #34

Retry on 429 is not granular enough #34

Comments

jefferai commented Apr 26, 2018

pritamsarkar007 commented Apr 26, 2018 • edited Loading

jefferai commented Apr 26, 2018

sethgrid commented Apr 27, 2018

jefferai commented Apr 27, 2018

pritamsarkar007 commented Apr 27, 2018

jefferai commented Apr 27, 2018

sethgrid commented Apr 27, 2018 via email

pritamsarkar007 commented Apr 27, 2018

pritamsarkar007 commented Apr 28, 2018

sethgrid commented Apr 30, 2018

pritamsarkar007 commented Apr 26, 2018 •

edited

Loading