external-dns calls aws api "ListResourceRecordSets" too frequently #905

yuanlinios · 2019-02-19T10:04:37Z

I am piloting AWS EKS with external-DNS. There is 1 private hosted zone with around 700 pre-existing records.

The function works OK. However, from cloudtrail, I can see external-dns calls "ListHostedZones" once every minute which I can understand, but it also issues 8 or 9 "ListResourceRecordSets" every minute.

Is it as expected? This frequency for API calling is too much for me. Is it possible to increase the time interval?

spender0 · 2019-02-20T12:27:56Z

Same thing. v0.5.11
with interval=1m is does a sequence of api requests every 1 minute. The sequence mostly consists of ListResourceRecordSets requests.
The problem is that it does it without any pause between the requests and exceeds AWS global route53 api rate that is 5 requests per 1 second.

tewing-riffyn · 2019-02-26T02:43:32Z

I am experiencing a similar problem. Cloudtrail shows a high volume of external-DNS upserts. This is causing "rate exceeded" messages when I use other tools like Terraform.

I am running 8 kubernetes clusters within the same AWS account. Each cluster is running a separate instance of external-dns and is updating a private zone and a public zone.

njuettner · 2019-02-26T10:35:11Z

As long as you don't use other ProviderSpecific than target-health is shouldn't run differently than using < v0.5.9

see: https://github.com/kubernetes-incubator/external-dns/blob/master/plan/plan.go#L188

We introduced a bug in v0.5.10 when we merged ProviderSpecific, however this should be fixed in the latest version.

One thing you could try out is to test if v0.5.9 has the same problems.

wallentx · 2019-02-27T19:51:49Z

Not seeing any issues with v0.5.9, but had problems with v0.5.11 and v0.5.10. Running with:
args:
- '--source=service'
- '--source=ingress'
- '--provider=aws'
- '--registry=txt'
- '--txt-owner-id=k8s-external-dns'

xanonid · 2019-03-05T14:17:08Z

Duplicate issue: #891

fraenkel · 2019-04-05T20:46:13Z

So I don't believe this is a duplicate issue but more an issue of how the code is currently structured and the interaction between Controller and Registry.
Let me walk through what I am talking about.

I am ignoring the cache because in the worst case it won't matter.

The Controller calls Registry.Records() calls the provider.Records().
p.Records() calls p.Zones() and route53.ListResourceRecordSetsPages().
p.Zones() calls route53.ListHostedZonesPages.

So if we count thus far, we have made z (# of zone pages) + r (# of resourcerecord pages) calls.

Eventually the Controller will call Registry.ApplyChanges() which calls provider.ApplyChanges()
Now the problem ensues.
p.ApplyChanges does 3 calls to p.newChanges() and then calls p.submitChanges()
p.newChanges() calls p.Records() and calls p.newChange()
p.newChange() will call p.Zones() for Alias records.
p.submitChanges() will call p.Zones() and route53.ChangeResourceRecordSets()

For newChanges(), we have z + r + z calls.
For submitChanges(), we have z calls.

For a single pass, Records + ApplyChanges, we have (z + r ) + 3 *(2z+r) + z = 8z + 4r

Just to prove I wasn't crazy, I ran the simple TestAWSApplyChanges, and saw 4 zone calls ,and 3 records calls. Its not the worst case but it's not good.

The ideal would be to do 1 call for both zones and resources. With minimal effort it should be possible to do 1 zone and 1 resource call just within ApplyChanges.

tewing-riffyn · 2019-04-05T21:16:12Z

@fraenkel - good write-up. Yes, refactoring some of the duplicate calls would help greatly.

I solved my issue by setting zoneIdFilters within the helm values. External-dns was spending a lot of work evaluating other suddomain zones only to determine they weren't authoritative for the record it was manipulating. The problem was compounded because I have 8 clusters running in the same AWS account with their own external-dns controller.

If external-dns were more efficient about evaluating the zones I wouldn't need to do this.

fraenkel · 2019-04-05T21:38:34Z

I will put together a PR which I believe can reduce this to the bare minimum. Shouldn't take long.

fejta-bot · 2019-07-04T22:37:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

so0k · 2019-07-18T09:06:23Z

I think this can be closed?

DTTerastar · 2019-07-31T16:35:12Z

I am seeing rate throttling errors as well. It'd be nice if it retried in a delayed backoff loop to work around

joeharrison714 · 2019-08-09T20:04:58Z

@njuettner I had this issue so I tested out v0.5.9 and it instantly worked. I don't think the bug you mentioned was fixed.

fejta-bot · 2019-09-08T20:39:37Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-10-08T21:25:23Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-10-08T21:25:30Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

helgi · 2019-11-07T20:58:21Z

/reopen

k8s-ci-robot · 2019-11-07T20:58:28Z

@helgi: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

burdzwastaken · 2019-11-07T21:43:37Z

This fix in the latest release resolved this issue for us.

yuanlinios changed the title ~~external-dns calsl aws api "ListResourceRecordSets" too frequently~~ external-dns calls aws api "ListResourceRecordSets" too frequently Feb 19, 2019

fraenkel mentioned this issue Apr 6, 2019

Streamline AWS ApplyChanges #966

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 8, 2019

k8s-ci-robot closed this as completed Oct 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

external-dns calls aws api "ListResourceRecordSets" too frequently #905

external-dns calls aws api "ListResourceRecordSets" too frequently #905

yuanlinios commented Feb 19, 2019

spender0 commented Feb 20, 2019 •

edited

Loading

tewing-riffyn commented Feb 26, 2019

njuettner commented Feb 26, 2019

wallentx commented Feb 27, 2019

xanonid commented Mar 5, 2019

fraenkel commented Apr 5, 2019

tewing-riffyn commented Apr 5, 2019

fraenkel commented Apr 5, 2019

fejta-bot commented Jul 4, 2019

so0k commented Jul 18, 2019

DTTerastar commented Jul 31, 2019

joeharrison714 commented Aug 9, 2019

fejta-bot commented Sep 8, 2019

fejta-bot commented Oct 8, 2019

k8s-ci-robot commented Oct 8, 2019

helgi commented Nov 7, 2019

k8s-ci-robot commented Nov 7, 2019

burdzwastaken commented Nov 7, 2019

external-dns calls aws api "ListResourceRecordSets" too frequently #905

external-dns calls aws api "ListResourceRecordSets" too frequently #905

Comments

yuanlinios commented Feb 19, 2019

spender0 commented Feb 20, 2019 • edited Loading

tewing-riffyn commented Feb 26, 2019

njuettner commented Feb 26, 2019

wallentx commented Feb 27, 2019

xanonid commented Mar 5, 2019

fraenkel commented Apr 5, 2019

tewing-riffyn commented Apr 5, 2019

fraenkel commented Apr 5, 2019

fejta-bot commented Jul 4, 2019

so0k commented Jul 18, 2019

DTTerastar commented Jul 31, 2019

joeharrison714 commented Aug 9, 2019

fejta-bot commented Sep 8, 2019

fejta-bot commented Oct 8, 2019

k8s-ci-robot commented Oct 8, 2019

helgi commented Nov 7, 2019

k8s-ci-robot commented Nov 7, 2019

burdzwastaken commented Nov 7, 2019

spender0 commented Feb 20, 2019 •

edited

Loading