Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rancher resulting in large volumes of API requests to DNSimple #5996

Closed
weppos opened this issue Sep 18, 2016 · 8 comments
Closed

Rancher resulting in large volumes of API requests to DNSimple #5996

weppos opened this issue Sep 18, 2016 · 8 comments
Assignees
Milestone

Comments

@weppos
Copy link

weppos commented Sep 18, 2016

At DNSimple we a few occurrences in the last months of very large spikes of requests to our API that were caused by customers using Rancher.

The last one this week, where a customer is currently sending about 1.100.000 requests per hour (of course all of them are throttled), and the customer previously confirmed they are using Rancher (we got in touch with them a few months ago, when the issue happened the first time).

I already reported the issue to Jan Bruder and we had some email exchange. Unfortunately, I failed to follow up due to a few priorities at DNSimple, hence I'm opening the ticket here so that we have a shared place to keep track of the issue.

This is a response an affected customer send back to us:

Thanks for reaching out. We do have a cloud orchestration platform (rancher) which uses your API. And sometimes our services flip flop causing too many requests for DNS updates. It looks like there is no exponential back off for this service but there seem to have been a few releases trying to address throttling (for Route 53 but hopefully they apply to DNSimple) so I’ll update and hopefully it will address the problem.

Jan replied to me:

I also have an idea how to reduce the overall ListRecords queries. We could also add a token-bucket rate-limiter to the DNSimple provider similar to what we already do for Route 53 (see: https://github.com/rancher/external-dns/blob/master/providers/route53/route53.go#L23).

I see the DNSimple integration was updated (thanks @janeczku), but it lacks the bucket-based rate limiter. Can you please add it?

These customers are currently sending over 24 millions API requests per day. Sadly, if the issue persists, we'll likely have to start banning those requests instead of just rate limiting them.

I also want to take the chance to thanks Jan for the initial follow up.

jmatsushita added a commit to iilab/external-dns that referenced this issue Sep 18, 2016
@jmatsushita
Copy link

jmatsushita commented Sep 18, 2016

We're one of the DNSimple customers affected so I gave a go at applying the Route53 rate limiter code to DNSimple.

Let me know what you think. Here's the PR rancher/external-dns#45

@jmatsushita
Copy link

This might help for Rancher environments with large amount of changes, but when the DNSimple api starts answering 429 too many requests, then it exits (or returns a bad health check). By default this triggers the re-creation of all the external-dns containers every time the health check fails i.e. every 2 secs. So I ended up making health checks less frequent which is likely to be the culprit in most cases.

@will-chan
Copy link
Contributor

@janeczku can you look into adding this?

@janeczku
Copy link
Contributor

janeczku commented Oct 12, 2016

@weppos @jmatsushita I have updated the DNSimple catalog entry to use external-lb:v0.5.0. That means that health checks will now query the /users endpoint of the DNSimple API instead of sending expensive calls to list the records. I also bumped the health check interval to 10 secs. This should significantly lower the amount of overall API calls in normal operation. The case where an unhealthy Rancher environment (flapping service) results in a higher number of API calls will be addressed in the next version.
rancher/community-catalog#310

I am inclined to implement a backoff retrier to handle HTTP code 429 responses instead of a bucket rate limiter. This is better suited to limit global call rates in cases where a user has multiple instances of the DNSimple service running concurrently. On the other hand this should probably be implemented in the DNSimple API library. Thoughts @weppos @jmatsushita ?

@weppos
Copy link
Author

weppos commented Oct 12, 2016

@janeczku the DNSimple API client is currently mostly a toolkit around the API. Even the v2 toolkit (AFAIK rancher is still connecting to the v1 API using the old client that is under my personal account) doesn't implement rate limiting right now.

One of the reasons we don't implement rate limiting is because each customer may have different needs. Our rate limiting is mostly per hour period, hence there may be different applicable approaches (throttle per second, per minute, bucket based, etc) depending on your application.

Most API clients generally don't provide rate limiting handling, leaving the choice to the final user. For this reason, I don't think we'll ever add rate-limiting to our client.

If it helps, when we throttle a request we also return the time when the limit will be lifted. In a distributed environment, you can reuse such information along with the 429 http status code to lock further requests until the time has passed.

Is it something that would work for you?

@janeczku
Copy link
Contributor

@weppos Sure, once we have the v2 API client library in the provider we can implement some backoff based on the rate limit information in the response.

@janeczku
Copy link
Contributor

rancher/external-dns#49

@galal-hussein
Copy link
Contributor

The new template tested and merged to the community catalog:

$ curl -H "Authorization: Bearer xxxxxxxxxxxxxx" -I "https://api.dnsimple.com/v2/whoami"
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 14 Nov 2016 21:59:06 GMT
Connection: keep-alive
X-RateLimit-Limit: 2400
X-RateLimit-Remaining: 2399
X-RateLimit-Reset: 1479164346
Cache-Control: no-cache
X-Request-Id: 10e815b7-be7c-4289-bb49-3037064c37fa
X-Runtime: 0.036459
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: DENY
X-Permitted-Cross-Domain-Policies: none
X-XSS-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000

After adding multiple services:

$ curl -H "Authorization: Bearer xxxxxxxxxxxxxxx" -I "https://api.dnsimple.com/v2/whoami"
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 14 Nov 2016 22:22:05 GMT
Connection: keep-alive
X-RateLimit-Limit: 2400
X-RateLimit-Remaining: 2395
X-RateLimit-Reset: 1479164346
Cache-Control: no-cache
X-Request-Id: 1eaf7a4f-bcf4-4a41-94e5-0e411420830e
X-Runtime: 0.009921
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: DENY
X-Permitted-Cross-Domain-Policies: none
X-XSS-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants