New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rancher resulting in large volumes of API requests to DNSimple #5996
Comments
We're one of the DNSimple customers affected so I gave a go at applying the Route53 rate limiter code to DNSimple. Let me know what you think. Here's the PR rancher/external-dns#45 |
This might help for Rancher environments with large amount of changes, but when the DNSimple api starts answering 429 too many requests, then it exits (or returns a bad health check). By default this triggers the re-creation of all the external-dns containers every time the health check fails i.e. every 2 secs. So I ended up making health checks less frequent which is likely to be the culprit in most cases. |
@janeczku can you look into adding this? |
@weppos @jmatsushita I have updated the DNSimple catalog entry to use I am inclined to implement a backoff retrier to handle HTTP code 429 responses instead of a bucket rate limiter. This is better suited to limit global call rates in cases where a user has multiple instances of the DNSimple service running concurrently. On the other hand this should probably be implemented in the DNSimple API library. Thoughts @weppos @jmatsushita ? |
@janeczku the DNSimple API client is currently mostly a toolkit around the API. Even the v2 toolkit (AFAIK rancher is still connecting to the v1 API using the old client that is under my personal account) doesn't implement rate limiting right now. One of the reasons we don't implement rate limiting is because each customer may have different needs. Our rate limiting is mostly per hour period, hence there may be different applicable approaches (throttle per second, per minute, bucket based, etc) depending on your application. Most API clients generally don't provide rate limiting handling, leaving the choice to the final user. For this reason, I don't think we'll ever add rate-limiting to our client. If it helps, when we throttle a request we also return the time when the limit will be lifted. In a distributed environment, you can reuse such information along with the 429 http status code to lock further requests until the time has passed. Is it something that would work for you? |
@weppos Sure, once we have the v2 API client library in the provider we can implement some backoff based on the rate limit information in the response. |
The new template tested and merged to the community catalog:
After adding multiple services:
|
At DNSimple we a few occurrences in the last months of very large spikes of requests to our API that were caused by customers using Rancher.
The last one this week, where a customer is currently sending about 1.100.000 requests per hour (of course all of them are throttled), and the customer previously confirmed they are using Rancher (we got in touch with them a few months ago, when the issue happened the first time).
I already reported the issue to Jan Bruder and we had some email exchange. Unfortunately, I failed to follow up due to a few priorities at DNSimple, hence I'm opening the ticket here so that we have a shared place to keep track of the issue.
This is a response an affected customer send back to us:
Jan replied to me:
I see the DNSimple integration was updated (thanks @janeczku), but it lacks the bucket-based rate limiter. Can you please add it?
These customers are currently sending over 24 millions API requests per day. Sadly, if the issue persists, we'll likely have to start banning those requests instead of just rate limiting them.
I also want to take the chance to thanks Jan for the initial follow up.
The text was updated successfully, but these errors were encountered: