Why does external-dns poll? Polling causes too many API requests #484

azuretek · 2018-03-07T03:03:02Z

Is there a reason external-dns is polling? Why not watch the event stream and trigger updates that way? There's no reason to poll on an interval if you can just watch for changes. It would drastically reduce the number of API requests and also be a lot quicker to reflect changes as services and ingresses are deployed.

ideahitme · 2018-03-07T10:50:41Z

At some certain stage it might make sense to integrate "watch" capabilities, but polling is probably required anyway. For example, in case of External-DNS not running for a while, the list of services and ingresses created during that period of time should be handled as well. I am not entirely sure how well Kubernetes handles watching, but a year or so ago I found the API to be buggy.

The problem with "watching" is that we cannot simply make an API call to DNS provider on every single event, because those calls usually cost money and are normally rate limited. So with "watching" we would have to do some aggregation and batching.

We could allow to configure the polling interval to reduce the number of API calls, however I don't believe "watching" is a better solution to the "problem", especially in big clusters with lots of ingresses and services.

azuretek · 2018-03-07T23:48:28Z

I'm not seeing in the code where the polling is necessary, you can watch the event stream and just append changes as they come in and call submitChanges on the interval that's specified. You're already "batching" the way you described, it's just happening on a set interval.

The main improvement is that you eliminate API calls altogether until a change actually needs to be made.

If you're concerned about a fresh pod not being aware of changes that happened since starting you can do one initial poll to get the current state and then update as necessary.

I can contribute the code changes necessary to make this happen if that's a concern.

Just to clarify my issue and why I think this is a major problem. In our environment we use AWS and we have several clusters where external-dns is configured, we have lots of domains so every time external-dns polls we have at least zones*clusters queries to the AWS API (5 clusters with 20 zones = 100 API calls every minute) even when nothing has changed. This is causing us to hit limits with the AWS API and the only resolution is to either remove the number of domains managed by external-dns (requiring an external service to create CNAMEs for us) or to reduce the polling interval which directly impacts the speed we can deploy.

ideahitme · 2018-03-08T10:20:06Z

I don't believe it is as simple as you described, with the concepts of ownership and multi target records, you have to maintain information like who owns the record, can I modify the record, etc either in memory (cache) or do the DNS provider get call. You want to avoid the latter, but in case of in memory storage, you might as well do the diff with the previous change to see if update is required. I would make this optional and not recommended for use anyway. However, I would love to see a proposal on how to use "watch" first with proper description how external dns will operate and preserve all the features it currently has

hjacobs · 2018-03-08T10:44:24Z

Are we even talking about the same thing? Are we talking about polling the Kubernetes or the AWS API? @azuretek mentions hitting the rate limits of AWS.. Maybe we should identify the actual problem first before discussing potential solutions or improvements? Is the problem "External DNS hits AWS API rate limits"?

ideahitme · 2018-03-08T10:50:04Z

@hjacobs I think he means to use Kubernetes API events to watch for changes and then do the AWS API call/ otherwise stay idle.

Currently the problem is we fetch the list of records from AWS even if no changes are required and this is the API call we want to prevent. However External DNS is smart enough not to "post" changes to AWS API if no changes were detected.

External DNS hitting AWS API rate limiting is a problem, but I think it should be addressed in other ways, e.g. with caching result. #178

prydie · 2018-05-09T13:20:15Z

How about having the controller trigger off informers watching Service/Ingress with the informer resync periods set to --interval? Then couple that with fronting the registry with a TTL cache (#178) so fetching records from the provider would occur once per --interval as it does currently.

The resync period/TTL cache would ensure that we maintained the current functionality (i.e. always ensuring state is reconciled between the provider and the cluster at least once per --interval) but would greatly improve the latency of changes in cluster being reflected in the provider.

API rate limits could be handled by exposing --cache-ttl flag or similar.

Related: #14

jhohertz · 2018-05-11T15:46:27Z

I've run into this when running in an AWS account with a large number of Route53 zones. For whatever reason, it polls zones even if there are no ingress/service/etc manifests referencing that zone. Is there any way (besides filtering on domain name param) to optimise things such that it doesn't look at zones not relevant to anything configured inside kubernetes?

(In my case the account had 250+ zones... and with no filter, despite the cluster coming up with maybe a half-dozen records on just a single zone, all 249 other zones are getting scanned, confirmed by looking at CloudTrail logs, resulting in the API throttling so badly it sometimes took 10-20 minutes before external-dns could get any records provisioned.)

For the moment I've worked around it by specifying a whitelist of domains that can get managed by external-dns to keep how much it's scanning to a minimum.

2rs2ts · 2018-06-08T17:21:41Z

Some things to add to this thread:

Watching on k8s events and batching seems fine but those aren't your only events, yeah? What happens if a record gets modified outside of external-dns' scope? A regular poll as @prydie suggests would still be wise.

@jhohertz to your point I thought that was unintuitive too but external-dns has to delete records too. That said, whitelisting domains is the way to go and that's what we do. We include all our public domains, and then only the private domains for the VPC we're running external-dns in, for each VPC.

Just ranting here, but honestly the problem here is with Amazon's APIs, which I understand we can't easily change... ideally they would give you the ability to post to an SNS topic or something like that when Route53 calls are made so we could watch on AWS events the same as we can on K8s events.

number of retries that API calls will attempt before giving up. This somewhat mitigates the issues discussed in kubernetes-sigs#484 by allowing the current sync attempt to complete vs. failing and starting anew. Defaults to 3, which is what the aws-sdk-go defaults to where not specified. Signed-off-by: Joe Hohertz <joe@viafoura.com>

Evesy · 2019-02-27T11:35:28Z

We're seeing similar things with the Cloudflare provider.

Our account has approximately 10,000 zones which means (with the maximum pagination allowed) that's 200 API calls to return solely the zones. --domain-filter dictates that we're only actually interested in two of those zones, and in those zones, there are only about 75-100 pages of records

Cloudflare limits 1200 requests per 5 minutes which with DNS' default interval of 1m gives room for about 250 requests a minute, which based on the above means we're hitting the limit (Issue is exasperated if you reuse client credentials on more than one cluster running external DNS). Decreasing the interval is certainly a workaround but of course it does mean provisioning of services is impacted.

Would restructuring so that --domain-filter is used at the time records/zones are queried in the provider to only look at said zones, rather than just being used to filte records after they have been retrieved from the provider, or are there other considerations needed?

Raffo · 2019-02-27T11:37:01Z

Do you confirm that this is happening with the latest version released (v0.5.11)?

…

On Wed, Feb 27, 2019, 12:35 Mike Eves ***@***.***> wrote: We're seeing similar things with the Cloudflare provider. Our account has approximately 10,000 zones which means (with the maximum pagination allowed) that's 200 API calls to return solely the zones. --domain-filter dictates that we're only actually interested in two of those zones, and in those zones, there are only about 75-100 pages of records Cloudflare limits 1200 requests per 5 minutes which with DNS' default interval of 1m gives room for about 250 requests a minute, which based on the above means we're hitting the limit (Issue is exasperated if you reuse client credentials on more than one cluster running external DNS). Decreasing the interval is certainly a workaround but of course it does mean provisioning of services is impacted. Would restructuring so that --domain-filter is used at the time records/zones are queried in the provider to only look at said zones, rather than just being used to filte records after they have been retrieved from the provider, or are there other considerations needed? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#484 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApv1KapNU2vNlte4D9_5fAOx4u2b5qwks5vRm2BgaJpZM4SfyeV> .

Evesy · 2019-02-27T12:00:35Z

Correct, 0.5.11

jlamillan · 2019-02-27T17:34:32Z

@Evesy it won't solve your problem completely, but we've been using the new --events flag introduced in this pull-request as a way to significantly reduce the number of regular poll calls to our provider while actually improving our provisioning time by combining --events with a long --interval. In our scenario, out of band DNS changes are unlikely, so has been working well for us.

rtkgjacobs · 2019-04-09T12:48:34Z

There are several key problems from looking over things and testing on a larger AWS deployments

Larger DNS hostedzones and low intervals often hit API transaction limits within AWS. Having event driven updates and using a longer --interval= time is ideal to reduce that impact - glad to see its being worked on in part
Larger hostedzones means that each operation above, still can hit a rate limit. Ideally can a 'sleep()' between each API call or groups of them during a 'scrape' of hostedzone contents be optionally used to have that operation more 'gentle' with AWS? Add a setting like --aws-operation-throttle_ops=25/s for larger deployments to place time delays or operations per second API calls wise? (set a max operations per second API rater limiter?)
Filters applied should constrain what API calls / queries are performed - so If I say only want .domain.com don't walk over other hosted zones serving other domains

* Add github.com/miekg/dns to dependencies * Change RFC2136 default port to zero * Add small comment to push CLA check * Updating CHANGELOG.md to v0.5.6 * Moving methods around This is an update to the order in which we have the new mock methods. Removed comment and added comment to exported mock helper. * Fix handling of custom TTL values with Google DNS. * chore: replace glog with a noop logger * Continue even if node listing fails * Add couple of tests for RFC2136 provider * Fix interface export issue * Fix TLS issue with OpenStack auth (Designate) * fix: pass all relevant CLI flags to AWS provider (kubernetes-sigs#719) * fix(aws): correctly populate target health check on existing records * Don't erase endpoints labels (kubernetes-sigs#713) * docs: document how to use a different security context * Matching entire string for wildcard in txt records with prefixes * Added linode to support cloud providers * Fixed tests store records with escaped wildcard. Added test to verify wildcard record with prefix. * Small style fix after code review * docs: update changelog to v0.5.8 * docs: fix wrong entry in changelog * fix domain filter match logic to not match similar domain names * Fix nil map access of endpoint labels * Add missing rfc2136 enum value to provider flag * Switch to using nobody instead. * Add TestNewDesignateProvider test func * Add alias annotation for ingress * Add small Readme for RFC2136 provider * Format changes * allow hostname annotations to be ignored * MAINTAINER is deprecated - using LABEL instead https://docs.docker.com/engine/reference/builder/#maintainer-deprecated * pdns: Add DomainFilter support * Update Azure documentation * Update dyn.go * Update CoreDNS provider to use etcd v3 client * Update Gopkg.* vendor management files for github.com/coreos/etcd * Tiny clarification about two available deployment methods. * Oracle doc fix (add "key:" to secret) (kubernetes-sigs#750) * fix domain filter match logic to not match similar domain names * MAINTAINER is deprecated - using LABEL instead https://docs.docker.com/engine/reference/builder/#maintainer-deprecated * Fix to documentation for Oracle to include `key:` * Add Traefik to the supported list of ingress controllers. * Fix Multiple subdomains bug * Remove unnecessary slashes * Change log level * Add docs for alias annotation * Fix typos: sychronized->synchronized, resouce->resource, sepecified->specified (kubernetes-sigs#769) Signed-off-by: mooncake <xcoder@tenxcloud.com> * Remove dupplicated words:have,aliyun (kubernetes-sigs#768) Signed-off-by: mooncake <xcoder@tenxcloud.com> * adding kubernetes adder * adding kubernetes adder * Allow setting Cloudflare proxying by annotation * Change default apiversion of crd - Change default apiversion of DNSEndpoint - Add error to output CRDClient * panic: assignment to entry in nil map * Remove trim suffix * adjust gometalinter timeout by setting env var * Remove sorting of rrdatas * update dep dependencies * chore: remove unused import (kubernetes-sigs#781) * chore: update delivery.yaml to new format * Changelog v0.5.9 * Improve errors in Records() of infoblox provider * Updating Azure tutorial * update README to include Linode on the 0.5 roadmap (kubernetes-sigs#787) Notes that Linode support was added in 0.5.5 * add tutorial for coredns (kubernetes-sigs#791) There is no coredns tutorial for externalDNS. This pull request makes coredns based on minikube for working with externalDNS. * fix(infoblox): don't import logrus twice * feat(controller): expose managed resources and records as metrics * update the FAQ list of supported DNS providers (kubernetes-sigs#796) * adding config for bind for tsig (kubernetes-sigs#790) * adding config for bind for tsig * add indentation as requested * Use SOAP API to retrieve all records with 1 request * fix json syntax error - typing error (kubernetes-sigs#765) there was an unexpected comma in json used as custom configuration file * 2 issues: - coredns support more than 1 targets - delete with prefix to make sure the record is cleaned * Add zone tag filter for AWS * Removed extractTarget * Update coredns tutorial with RBAC manifest (see kubernetes-sigs#791) * avoid unnecessary updating for CRD resource with test updated * fix commands to cleanup * Update coredns.md Make the DNS service IP consistent with `my-coredns-coredns` in example * Add metrics info to FAQ * Update cloudflare.md * docs(azure): better security granuality concerning external dns service principal * Implement Stringer for planTableRow Makes for clearer log messages. * Normalize DNS names during planning Ensure that we don't consider names with and without a trailing dot differently at this stage. * RFC2136 seems to require one IP Target per RRSET instead of multiple IPs per RRSET. * Fix typos in rfc2136 provider The rfc2136Actions interface was misspelled. Signed-off-by: Lachlan Cooper <lachlancooper@gmail.com> * Fix dry-run mode in rfc2136 provider In dry-run mode we need to return early to avoid sending messages. Fixes kubernetes-sigs#816. Signed-off-by: Lachlan Cooper <lachlancooper@gmail.com> * Change default AWSBatchChangeSize to 1000 AWS API ChangeResourceRecordSets method only allows 1000 ResourceRecord elements in one call, so the previous value was not very useful. * Correct Google Cloud DNS (ref: https://cloud.google.com/dns/) naming in docs * add security file Signed-off-by: Nick Jüttner <nick@zalando.de> * Add support for eu-north-1 * Clarify registry error info * Fix private zone dns record does not work * Add apiVersion to ingress.yaml, and Delete the duplicated line in dnstools * Support updating ProviderSpecific property. * Make awscli commands use JSON output This way the use of `jq`, and the output in this document would make sense. * Cloudflare pagination for zones * Adds a new flag `--aws-api-retries` which allows overriding the number of retries that API calls will attempt before giving up. This somewhat mitigates the issues discussed in kubernetes-sigs#484 by allowing the current sync attempt to complete vs. failing and starting anew. Defaults to 3, which is what the aws-sdk-go defaults to where not specified. Signed-off-by: Joe Hohertz <joe@viafoura.com> * fix gofmt issue * Add questions from slack to the faq * Update Gopkg.toml * Update Gopkg.toml * Cloudflare pagination for zones * Improve documentation regarding Alias I got stuck here and opened kubernetes-sigs#865 because I thought it was a bug. I hope this will help others set it up correctly the first time. * Remove linki from SECURITY_CONTACTS As per responsibilities of a security contact: https://github.com/kubernetes/sig-release/blob/master/security-release-process-documentation/security-release-process.md#responsibilities * Update cloudflare.go * chore: update changelog for v0.5.10 * Fixes some style in the faq.md file * fix: reduce number of API requests by caching a bit * only compare provider-specific annotations when they exist in the provider, skip target-health annotation * fix test of ProviderSpecific comparison Signed-off-by: Joe Hohertz <joe@viafoura.com> * Fixed typo in debug output * fix broken test after merge * Fixed PowerDNS Domain Filter Bug * When using Domain Filters with PowerDNS provider and providing no domain filter, the provider ignores all zones instead of including all zones which is the default behaviour * Added test cases for PartitionZones function of PDNSClient * Add RcodeZero Anycast DNS provider * Apply doc review changes * Fix formating Fix linter issues * Run gofmt on main * Trigger travis * Added description for multiple dns name This PR is a comment about "Multiple DNS names per Service" setting. * Document make dep step which may be needed to run make build * Turns out sudo is not necessary * Clarify that hosted zone identifier is to be used * Use k8s informer cache instead of active API server calls in ingress and service sources. * Changelog for v0.5.11 * Update README.md Added a reference to a blogpost which uses ExternalDNS in a CI/CD setup. * Dropping owners * Fix rcodezero txt encrypt flag parameter Add rcodezero txt encrypt parameter tests * Make view configurable for infoblox provider * Add infoblox view flag to tests * Correct default of infoblox-view parameter * Add support for multiple Istio Ingress Gateways The --istio-ingress-gateway flag may now be specified multiple times. * set log level to debug when axfr is disabled * Added stability matrix and minor improvements to README * Bumping istio to 1.1.0, updating fake GatewayConfigStore Get method to work with 1.1.0 * Release v0.5.12 * Release v0.5.12 * Reduce verbosity of infoblox provider logs * remove unnecessary parameter check when started with insecure flag * Remove passwords from config output based on tag * Remove superfluous trailing period from hostname Tutorial specifies version >0.4 which also removed the requirement for a trailing period. New users could misunderstand the trailing dot as a significant syntax. Removing the dot simplifies the configuration of the annotation. * describe how to check if your cluster has a RBAC * aws-r53: adding china ELB endpoints and hosted zone id's * aws-r53: adding china ELB endpoints and hosted zone id's. fixed spacing * aws-r53: adding china ELB endpoints and hosted zone id's. corrected formatting * aws-r53: adding china ELB endpoints and hosted zone id's. fixed typo when reformatting * Streamline AWS ApplyChanges - collect the zones and records once * fix wrong arg 'alibaba-cloud-zone' -> 'alibaba-cloud-zone-type'

fraenkel · 2019-04-24T20:22:25Z

In our environment, we too are hitting rate limits on AWS. I have already increased our aws retries to 10 although now I am considering 13 with a much longer interval. We have added the -events support to combat the longer interval but that too can be rate limited. Which puts us back into the same situation.
There are two different features that I am thinking about which:

a separate retry interval on incomplete loops. With a larger interval, we cannot wait hours for a retry, there should be a separate back off for this type of situation.
caching through the plan/apply process would reduce the total call count by 2 and best case is 3. This is were I see a quick win for something simple to implement. I would have like to use the cache support but that has issues in the face of failures so I am going to avoid that. I realize this creates two "caching" solutions but I view one safer than the other.
handling multiple k8s clusters. This would also help greatly but is the most amount of change and even I don't want to go down this path yet.

tsuna · 2019-04-25T01:28:04Z

In our case we settled for one AWS account per cluster. Putting even just two k8s clusters on the same AWS account easily triggers the default rate limit. Thankfully we don't have that many so it's manageable this way. It also provides us with greater isolation and accounting across clusters so it's not like we did this solely for external-dns, but just saying...

fejta-bot · 2019-07-24T02:06:03Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

tbarrella · 2019-07-26T06:37:14Z

/remove-lifecycle stale

fejta-bot · 2019-10-24T07:25:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

george-angel · 2019-10-25T07:28:54Z

/remove-lifecycle stale

fejta-bot · 2020-01-23T07:59:06Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-triage-robot · 2021-12-31T23:59:49Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

george-angel · 2022-01-02T09:07:09Z

/remove-lifecycle stale

k8s-triage-robot · 2022-04-02T09:23:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

george-angel · 2022-04-02T23:06:02Z

/remove-lifecycle stale

…iagnostics. (kubernetes-sigs#484)

k8s-triage-robot · 2022-07-01T23:43:03Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ghostsquad · 2022-07-02T01:29:44Z

/remove-lifecycle stale

darkpixel · 2022-09-06T23:08:42Z

Probably not the best solution for everyone, but I ended up working around this by spinning up two $5/mo VPS instances at DigitalOcean in two different regios.

Installed powerdns with a sqlite3 backend, enabled the webserver, set an API key, and reconfigured external-dns.

It synced around 350 domains in ~2 seconds. Goodbye provider rate-limits.

k8s-triage-robot · 2022-12-05T23:15:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

george-angel · 2022-12-06T04:14:51Z

/remove-lifecycle stale

k8s-triage-robot · 2023-03-06T04:24:47Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ghostsquad · 2023-03-06T04:43:21Z

/remove-lifecycle stale

k8s-triage-robot · 2023-06-04T04:46:40Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

george-angel · 2023-06-05T08:36:19Z

/remove-lifecycle stale

szuecs · 2023-06-26T10:46:10Z

Please create a proposal that outlines the problem (I see the problem but this is needed to clarify it) and how a solution would work for all the cases outlined here.
Please link also this issue to make sure we can also dive here into the context.
Right now it's hard to suggest anything useful from random comments in this issue.
Everyone who is waiting here can help to make the proposal happen and with a proposal this will get more traction to an implementation.

Thanks for understanding the maintainers need some help here.

darkpixel · 2023-06-26T16:19:30Z

Proposal:
Add a feature to external-dns that allows you to limit the number of API requests in a timeframe (i.e. no more than 200 per minute) "sleep" the external-dns when they hit the limit and then continue on.

It might be easier to allow users to configure a "sleep between API calls" setting. Sleep for 0.5 seconds between API calls would end up causing ~120 API calls per minute.

Example:
Digital Ocean has some arbitrary limit that comes out to ~250 requests per minute. I have about 80 domains that external-dns manages. There's one API request per zone, then apparently one API request to get all the DNS records from said zone, then more API requests for updating/removing entries. Hitting the limit causes records to not be updated anymore.

Workarounds:

Set up my own nameservers running PowerDNS and set them to have no API limits.
Spend extra money to spin up multiple kubernetes clusters under different accounts and keep each account under ~50 domain names to avoid rate-limiting

Stono · 2024-03-08T16:56:00Z

+1 to this, the api clients need to be able to back off when getting 429s, currently the pod will just crash with:

time="2024-03-08T16:33:58Z" level=fatal msg="googleapi: Error 429: Resource has been exhausted (e.g. check quota)., rateLimitExceeded"

Which then causes it to come back up and sync again immediately which somewhat exacerbates the issue 🤷

linki mentioned this issue May 16, 2018

Add support for NodePort services #559

Merged

jlamillan mentioned this issue Aug 24, 2018

Add --watchers flag to allow controller to respond automatically to Ingress or Service updates #687

Merged

jhohertz mentioned this issue Jan 17, 2019

Adds a new flag --aws-api-retries which allows overriding the #858

Merged

fraenkel mentioned this issue Apr 25, 2019

Cache the endpoints on the controller loop #1001

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 25, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2022

lou-lan pushed a commit to lou-lan/external-dns that referenced this issue May 11, 2022

Add OS/Arch information in the krew version CLI output to help with d…

58a5184

…iagnostics. (kubernetes-sigs#484)

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2023

szuecs closed this as completed Jun 26, 2023

Why does external-dns poll? Polling causes too many API requests #484

Why does external-dns poll? Polling causes too many API requests #484

Comments

azuretek commented Mar 7, 2018

ideahitme commented Mar 7, 2018 • edited

azuretek commented Mar 7, 2018 • edited

ideahitme commented Mar 8, 2018

hjacobs commented Mar 8, 2018 • edited

ideahitme commented Mar 8, 2018

prydie commented May 9, 2018 • edited

jhohertz commented May 11, 2018

2rs2ts commented Jun 8, 2018

Evesy commented Feb 27, 2019

Raffo commented Feb 27, 2019 via email

Evesy commented Feb 27, 2019

jlamillan commented Feb 27, 2019

rtkgjacobs commented Apr 9, 2019 • edited

fraenkel commented Apr 24, 2019

tsuna commented Apr 25, 2019

fejta-bot commented Jul 24, 2019

tbarrella commented Jul 26, 2019

fejta-bot commented Oct 24, 2019

george-angel commented Oct 25, 2019

fejta-bot commented Jan 23, 2020

k8s-triage-robot commented Dec 31, 2021

george-angel commented Jan 2, 2022

k8s-triage-robot commented Apr 2, 2022

george-angel commented Apr 2, 2022

k8s-triage-robot commented Jul 1, 2022

ghostsquad commented Jul 2, 2022

darkpixel commented Sep 6, 2022

k8s-triage-robot commented Dec 5, 2022

george-angel commented Dec 6, 2022

k8s-triage-robot commented Mar 6, 2023

ghostsquad commented Mar 6, 2023

k8s-triage-robot commented Jun 4, 2023

george-angel commented Jun 5, 2023

szuecs commented Jun 26, 2023 • edited

darkpixel commented Jun 26, 2023

Stono commented Mar 8, 2024 • edited

ideahitme commented Mar 7, 2018 •

edited

azuretek commented Mar 7, 2018 •

edited

hjacobs commented Mar 8, 2018 •

edited

prydie commented May 9, 2018 •

edited

rtkgjacobs commented Apr 9, 2019 •

edited

szuecs commented Jun 26, 2023 •

edited

Stono commented Mar 8, 2024 •

edited