Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttle individual API requests #976

Open
mholt opened this issue Sep 28, 2019 · 13 comments
Open

Throttle individual API requests #976

mholt opened this issue Sep 28, 2019 · 13 comments
Labels

Comments

@mholt
Copy link
Contributor

mholt commented Sep 28, 2019

With support for concurrent challenges (i.e. running multiple calls to Certificate.Obtain() or Certificate.Renew() at the same time -- different certs of course) I think it would be a good idea to make sure that lego doesn't hammer any the CA.

For example, if a client (user of lego) needs to obtain 100 certificates, it should be able to call Obtain() 100 times at once, but lego should be sure to space out the individual ACME API requests by some time.

So, a configurable rate limit such as MinAPIRequestInterval would be useful.

The only alternative right now is for the user/importer of lego to do their own throttling. But this is not very effective... The difference between what I am proposing and just leaving it up to the user of lego do their own rate limiting is that rate limiting implemented into lego would apply directly to individual ACME API calls, which lego hides away from the user. The only thing the user can throttle themselves is the number of issuances at once, which is less useful because issuances are comprised of more than just a single API call. In other words, Certificate.Obtain() does not correspond to fixed amount of load on CA. This detail is hidden from the user, and so the rate limit needs to be applied within lego.

For example: setting MinAPIRequestInterval to 2 * time.Second would allow a user to call Obtain() 100 times all at once, but lego would make sure that only 1 request goes to each endpoint once every 2 seconds. (I think it is better for the rate limit to apply per endpoint instead of all endpoints in the same bucket.)

This would speed up certificate issuances while respecting the CA's infrastructure.

What do you think?

@dmke
Copy link
Member

dmke commented Oct 11, 2019

I'm in favour of this: I was recently rate-limited because I needed to renew a bunch of certificates (N domains × M subdomains × Y ACME API calls; where N is monotonically increasing, M constant, and Y out of my control). Upsy-daisy :-)

On this particular host, N is controlled by customers adding new domains at any point in time.

My current mitigations are:

  1. Create a separate SAN for each domain, so I can spread out the renewal across a longer time period (this creates N bursts of M×Y ACME calls).
  2. Switch to wildcard certifcates (which eliminates M).

(1) already happens, apart from the spreading-out over time. (2) is work-in-progress.

@mholt: Do you think it's necessary to rate-limit calls to challenge-provider endpoints as well? Cloudflare for example, limits requests to 1200/5 min. This is large enough for Lego (which currently requires at most 3 requests/domain, but other providers might need more requests and/or enforce draconian limits.

@ldez ldez added the area/lib label Oct 11, 2019
@ldez
Copy link
Member

ldez commented Oct 11, 2019

I think that the proposal of @mholt will not solve your issue @dmke because it's a throttling on API calls, not a way to spreading-out over time to allow to overflow the LE rate limits (like "Certificates per Registered Domain (50 per week)" or "maximum of 300 New Orders per account per 3 hours" or "Duplicate Certificate limit of 5 per week", etc.).

https://letsencrypt.org/fr/docs/rate-limits/

@ldez
Copy link
Member

ldez commented Oct 11, 2019

This proposal can be linked to the rqs/s limit of LE:

The “new-reg”, “new-authz” and “new-cert” endpoints have an Overall Requests limit of 20 per second. The “/directory” endpoint and the “/acme” directory & subdirectories have an Overall Requests limit of 40 requests per second.

https://letsencrypt.org/fr/docs/rate-limits/

@mholt
Copy link
Contributor Author

mholt commented Oct 11, 2019

@ldez

As a lib, for me, the app using the lib must manage that because:

  • I don't want that lego to have a state
  • the app can create multiple client and introduce concurrent access
  • Global mutable variables are never a good idea.

I've explained why the app cannot do this. Lego hides away the individual ACME calls, so even if the app is managing concurrent Obtain or Renew calls, it cannot throttle the requests to individual API endpoints.

I think that the proposal of @mholt will not solve your issue @dmke because it's a throttling on API calls, not a way to spreading-out over time to allow to overflow the LE rate limits (like "Certificates per Registered Domain (50 per week)" or "maximum of 300 New Orders per account per 3 hours" or "Duplicate Certificate limit of 5 per week", etc.).

Yes, I believe that's correct. My proposal is for throttling requests to individual API endpoints, regardless of how many concurrent Renew or Obtain are happening, or how frequently they are called. In other words, rate limits that have nothing to do with the actual certificate details.

This proposal can be linked to the rqs/s limit of LE:

Yep! Let's Encrypt has per-endpoint rate limits, and calls to Renew and Obtain don't necessarily correspond to the underlying API calls.

@mholt
Copy link
Contributor Author

mholt commented Oct 16, 2019

I am going to work on a PR that does the following:

  • Defines a global throttle per ACME endpoint (map[string]TokenBucket or something)
  • All ACME calls honor the throttle (if any) associated with its endpoint

The throttles will be global, so we'll probably have a mutex to protect the throttles map.

Some rate limits are per-account (e.g. Let's Encrypt's "300 new orders per account per 3 hours" rate limit) so perhaps we'd augment the endpoint with the account name in some way to make that a unique key.

I will try to submit a PR this weekend. Let me know if you have any objections that would prevent it from being accepted!

@ldez
Copy link
Member

ldez commented Oct 16, 2019

  • no global variables

@mholt
Copy link
Contributor Author

mholt commented Oct 16, 2019

Well, what do you suggest then? The rate limits apply globally, whether we like it or not.

@ldez
Copy link
Member

ldez commented Oct 16, 2019

The variables must be linked to the client.

@mholt
Copy link
Contributor Author

mholt commented Oct 16, 2019

That's no good though, because multiple clients can be created, thus multiplying the limit for throttling. The point here is to prevent abuse (both accidental and intentional) of CA resources.

Global variables have a valid purpose, and this is one of them.

@ldez
Copy link
Member

ldez commented Oct 16, 2019

It's possible to use multiple clients with multiple CAs.

You can provide a rate limit strategy/manager to the client and re-use it between clients.

@mholt
Copy link
Contributor Author

mholt commented Oct 16, 2019

Hmm, I suppose, however this still relies on the user doing it correctly. One major reason for this change is to prevent accidentally hammering the CA. Besides, that rate limiting manager still has to be stored somewhere, and if it's not global, then more than one could exist, which again defeats the throttling.

I'm not sure we can get around the need for a global variable to do this correctly.

@jsha
Copy link
Contributor

jsha commented Oct 17, 2019

It seems like net/http provides a good example here: There is a default HTTP client that is used in most cases, but you have the option of instantiating your own HTTP client that has different properties (and a different connection pool). By analogy, you could define a default limit manager that is used by most lego client instances, but allow the user to override the limit manager on their lego client, for instance if they are using a non-default CA.

@mholt
Copy link
Contributor Author

mholt commented Oct 17, 2019

That could work, but the default rate limiter (if one is not provided) still needs to be a singular instance somewhere (this is not a bad thing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants