Some new TXT records are not being cleaned up, causing an "InvalidChangeBatch" error #3186

born4new · 2022-11-23T15:03:23Z

What happened:

After deleting some ingress resources, it seems that the new TXT record is not being cleaned up, but the other two DNS entries (the A record and the legacy TXT record) are being cleaned up. When searching for DNS records in AWS53, this is what we see:

Searching for `<our-dns-name>.`

[]

Searching for `a-<our-dns-name>.`

    {
        "Name": "a-<our-dns-name>.",
        "Type": "TXT",
        "TTL": 300,
        "ResourceRecords": [
            {
                "Value": "\"heritage=external-dns,external-dns/owner=<our-owner-string>,external-dns/resource=ingress/<our-ingress>\""
            }
        ]
    }

This later on causes an issue when we redeploy the application, as external-dns tries to create those three DNS entries (A record, legacy TXT and new TXT):

time="2022-11-23T14:06:14Z" level=info msg="Desired change: CREATE a-<our-dns-name> TXT [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=info msg="Desired change: CREATE <our-dns-name> A [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=info msg="Desired change: CREATE <our-dns-name> TXT [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=error msg="Failure in zone <our-dns-zone>. [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=error msg="InvalidChangeBatch: [Tried to create resource record set [name='a-<our-dns-name>.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: d65bc8e2-4055-4d9f-8412-4653debd76ff"

What you expected to happen:

The new TXT record should be cleaned up in the first place, or maybe we could also replace the TXT record if it already exists, or have an option to do so.

How to reproduce it (as minimally and precisely as possible):

I do not know how to reproduce this issue easily, but I'm more than happy to provide as much debugging info as needed.

Anything else we need to know?:

N/A

Environment:

External-DNS version (use external-dns --version): 0.13.1
DNS provider: AWS
Others:

The text was updated successfully, but these errors were encountered:

rymai · 2022-11-23T19:51:36Z

This definitely looks similar to #3007, #2421, and #2793.

benjimin · 2022-11-23T21:49:11Z

@born4new does setting --aws-batch-change-size=1 resolve your problem? (i.e., is it purely the batching that is broken?)

born4new · 2022-12-06T13:47:00Z

does setting --aws-batch-change-size=1 resolve your problem?

We haven't specifically tried a size of 1, but we have tried a few values (e.g. 20, 200, 1000), none of them helped.

The fix for us was to go back to an external-dns version below 0.12.0, so that external-dns wouldn't be aware of the newly introduced TXT record. This seems to indicate a problem in the way the new TXT records are cleaned up...

JonathanLachapelle · 2022-12-09T17:29:12Z

We are facing the exact same issue.

JonathanLachapelle · 2022-12-12T13:58:33Z

What happened:

After deleting some ingress resources, it seems that the new TXT record is not being cleaned up, but the other two DNS entries (the A record and the legacy TXT record) are being cleaned up. When searching for DNS records in AWS53, this is what we see:

Searching for <our-dns-name>.
[]
Searching for a-<our-dns-name>.
    {
        "Name": "a-<our-dns-name>.",
        "Type": "TXT",
        "TTL": 300,
        "ResourceRecords": [
            {
                "Value": "\"heritage=external-dns,external-dns/owner=<our-owner-string>,external-dns/resource=ingress/<our-ingress>\""
            }
        ]
    }
This later on causes an issue when we redeploy the application, as external-dns tries to create those three DNS entries (A record, legacy TXT and new TXT):
time="2022-11-23T14:06:14Z" level=info msg="Desired change: CREATE a-<our-dns-name> TXT [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=info msg="Desired change: CREATE <our-dns-name> A [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=info msg="Desired change: CREATE <our-dns-name> TXT [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=error msg="Failure in zone <our-dns-zone>. [Id: /hostedzone/<redacted>]"
time="2022-11-23T14:06:14Z" level=error msg="InvalidChangeBatch: [Tried to create resource record set [name='a-<our-dns-name>.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: d65bc8e2-4055-4d9f-8412-4653debd76ff"
What you expected to happen:

The new TXT record should be cleaned up in the first place, or maybe we could also replace the TXT record if it already exists, or have an option to do so.

How to reproduce it (as minimally and precisely as possible):

I do not know how to reproduce this issue easily, but I'm more than happy to provide as much debugging info as needed.

Anything else we need to know?:

N/A

Environment:

External-DNS version (use external-dns --version): 0.13.1

DNS provider: AWS

Others:

Does it happen on all record or just sometime?

born4new · 2022-12-14T09:09:01Z

Does it happen on all record or just sometime?

@JonathanLachapelle It was happening on some records only.

xavidop · 2022-12-14T11:21:29Z

we faced the same issue today:
We are using AWS Route53 and our External DNS version is 0.12.2

{"level":"error","msg":"InvalidChangeBatch: [Tried to create resource record set [name='cname-runtime-api-dev-amy.development.voiceflow.com.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: 4db33c47-f34f-4a36-8d60-b2cb0750578d","time":"2022-12-14T11:11:20Z"}

ArturChe · 2022-12-14T13:50:45Z

I have faced the same issue after updating the external-dns version from 0.12.0 to 0.13.1. And instead of syncing with previously created TXT record graylog.<domain> it tries to create cname-graylog.<domain> and it fails with output below:

time="2022-12-14T11:49:54Z" level=error msg="InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'TXT cname-graylog.<domain>.', The request contains an invalid set of changes for a resource record set 'TXT cname-mongodb.<domain>.', The request contains an invalid set of changes for a resource record set 'TXT cname-tcp.graylog.<domain>.']\n\tstatus code: 400, request id: <Id>"
time="2022-12-14T11:49:54Z" level=info msg="Desired change: CREATE cname-graylog.<domain> TXT [Id: /hostedzone/<hostedzone>]"
...

IKohli09 · 2023-01-26T03:27:56Z

I have faced the same issue.
I got a new cluster up with external chart version 6.12.1 which is using image 0.13.1
But it errors out with InvalidChangeBatch when trying to create cname-<domain> entry.

Also, when I switch back to version 0.11.0, it keeps on deleting and creating the route53 records instead of updating them.
here, I am using --upsert-policy.

Desired change: CREATE 123.dev.cloud A "Desired change: CREATE 123.dev.cloud TXT Applying provider record filter for domains Desired change: CREATE 123.dev.cloud A Desired change: CREATE 123.dev.cloud TXT

It's a huge blocker.

liad5h · 2023-02-01T13:02:58Z

We are experiencing the same issue with version 0.13.1 and kubernetes 1.21 or higher.
In our case when the issue happens, external-dns stops processing requests until we go to AWS and manually remove the leftovers.

logs:

time="2023-02-01T12:55:01Z" level=error msg="Failure in zone qa.controlup.com. [Id: /hostedzone/XXXXXXXXXX]"
time="2023-02-01T12:55:01Z" level=error msg="InvalidChangeBatch: [Tried to create resource record set [name='cname-x.com.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: 8b8e55e1-efe0-452d-96da-af65ff122fca"
time="2023-02-01T12:55:01Z" level=error msg="failed to submit all changes for the following zones: [/hostedzone/XXXXXXXXXX]"

msvticket · 2023-02-08T15:38:22Z

I'm having the same problem with version 0.13.1 and --aws-batch-change-size=100. I tried --aws-batch-change-size=1 and started to get warnings like

time="2023-02-08T15:32:34Z" level=warning msg="Total changes for xxx.yyy.zzz exceeds max batch size of 1, total changes: 2"

and the errors as described above kept coming.

So I tried --aws-batch-change-size=2 and that has actually resolved the problem for me.

jbilliau-rcd · 2023-02-13T16:14:25Z

Same problem as well. I wish there was a "force-overwrite" options where we could just tell external-dns to overwrite records; we have multiple clusters who have this error and are seemingly stuck. The worse part is good, new ingresses never have their DNS records created since they get batched up with these bogus retries.

martinohmann · 2023-02-16T12:55:34Z

We're facing the same issue with v0.13.2 and the suggested batch size changes do not work:

With --aws-batch-change-size=1: it tries to create the already existing TXT record, which fails. It does not even attempt to create the A record, presumably because the first batch change within the sync interval failed. This does not resolve itself eventually and continues like this in every sync interval.
With --aws-batch-change-size=2: it tries to create the A record and the already existing TXT record in a batch and this fails. Same behaviour as above, it's stuck.

The only option we have is to either manually create the A record, or to delete the existing TXT records so that external-dns can properly recreate everything.

The expected behaviour would be to not attempt to create the TXT records again (if anything, it should upsert existing records).

Update: from what I can see, there's already a change in master which might partially fix this (7dd84a5), but it's still unreleased.

cyril94440 · 2023-05-05T17:48:02Z

Same problem here...

Kulagin-G · 2023-06-08T13:40:22Z

The same problem after updating external-dns from 0.10.2 to 0.13.4

There are some details about the environment:

Provider: aws
EKS: 1.24.0

There are details about the issue:

At the star we have 3 records:

(A) - alias for LB, host.example.com
(TXT) - old-style TXT for backward-compatibility, host.example.com
(TXT) - new-style TXT cname-host.example.com

Test - Removing new-style TXT cname-host.example.com
Result: Looks ok, record was restored.
time="2023-06-08T13:05:04Z" level=info msg="Desired change: CREATE cname-host.example.com. TXT [Id: /hostedzone/xxx]"
Test - Removing old-style TXT host.example.com
Result: Looks ok, record was restored.
time="2023-06-08T13:07:05Z" level=debug msg="Adding host.example.com. [Id: /hostedzone/xxx]"
Test - Removing old-style TXT and new-style TXT
Result: records were not restored, and no issues or attempts in the logs.
Test - Removing alias host.example.com and both TXT
Result: ok, all 3 records were restored.
time="2023-06-08T13:18:18Z" level=debug msg="Adding host.example.com. to zone xxx. [Id: /hostedzone/xxx]" time="2023-06-08T13:18:18Z" level=debug msg="Adding host.example.com. to zone xxx. [Id: /hostedzone/xxx]" time="2023-06-08T13:18:18Z" level=debug msg="Adding cname-host.example.com. to zone xxx. [Id: /hostedzone/Z010946512D3RO332W8MB]"
time="2023-06-08T13:18:19Z" level=info msg="Desired change: CREATE host.example.com TXT [Id: /hostedzone/xxx]" time="2023-06-08T13:18:19Z" level=info msg="Desired change: CREATE host.example.com A [Id: /hostedzone/xxx]" time="2023-06-08T13:18:19Z" level=info msg="Desired change: CREATE cname-host.example.com TXT [Id: /hostedzone/xxx]"
Test - Removing alias host.example.com only
Result: failure, alias was not restored.
time="2023-06-08T13:22:23Z" level=error msg="Failure in zone xxx. [Id: /hostedzone/Z010946512D3RO332W8MB] when submitting change batch: InvalidChangeBatch: [Tried to create resource record set [name='cname-host.example.com.', type='TXT'] but it already exists, Tried to create resource record set [name='host.example.com.', type='TXT'] but it already exists]\n\tstatus code: 400, request id: xxx" time="2023-06-08T13:22:24Z" level=error msg="failed to submit all changes for the following zones: [/hostedzone/xxx]"

I guess force override won't lead to issues with Rate exceeded from AWS API, because a case when we lost an alias record is very rare, for us at least.
But still, the current behavior is pretty uncomfortable and non-expected, I want to be 100% sure that all our records will be restored automatically if any shit happens.

Additionally, it's weird I don't see any logs in case p.3

k8s-triage-robot · 2024-01-22T04:27:00Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ddieulivol · 2024-01-22T07:14:23Z

/remove-lifecycle stale

k8s-triage-robot · 2024-04-21T08:04:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

CameronMackenzie99 · 2024-05-02T12:49:52Z

/remove-lifecycle stale

rookiehelm · 2024-05-10T07:18:17Z

I'm seeing this issue when installing v0.14.1 on a brand new EKS 1.25.

sileyang-sf · 2024-05-13T20:30:36Z

Same issue happened in our EKS cluster in version 1.26.

rookiehelm · 2024-05-14T08:53:44Z

Hi guys, I was able to resolve my errors. Couple of pointers that helped:

First thing is that the external-dns repo has various branches tagged with release versions. But the release versions don't correspond directly to the image version hosted on GCR.
My issue got resolved after I used the following image: registry.k8s.io/external-dns/external-dns:v0.14.1. Please also follow the instructions using the branch tagged v0.14.1 (and not master or some other branch)
In my case my cluster was setup using terraform scripts, as I needed to deploy kubeflow. I accidentally have configured the IRSA using 'eksctl' command which was incorrect. The docs suggest directly creating the serviceaccount via 'kubectl'. Please be careful here. I had to manually delete the previous SA and re-create my SA using the right commands. Post that everything worked fine.
I also needed to configure 'ingress-nginx' controller first (and not later) as 'external-dns' needs to work with the loadbalancer during the creation of the records (correct me if I'm wrong here)

born4new added the kind/bug Categorizes issue or PR as related to a bug. label Nov 23, 2022

benjimin mentioned this issue Dec 1, 2022

Tries to create records in Route53 that already exist (v1.12.2) #3007

Closed

Sewci0 mentioned this issue Jun 27, 2023

[TXT Registry] Fix handling of records produced by toNewTXTName in toEndpoint #3724

Merged

1 task

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 21, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some new TXT records are not being cleaned up, causing an "InvalidChangeBatch" error #3186

Some new TXT records are not being cleaned up, causing an "InvalidChangeBatch" error #3186

born4new commented Nov 23, 2022

rymai commented Nov 23, 2022 •

edited

benjimin commented Nov 23, 2022

born4new commented Dec 6, 2022

JonathanLachapelle commented Dec 9, 2022

JonathanLachapelle commented Dec 12, 2022

Searching for `<our-dns-name>.`

Searching for `a-<our-dns-name>.`

born4new commented Dec 14, 2022

xavidop commented Dec 14, 2022 •

edited

ArturChe commented Dec 14, 2022

IKohli09 commented Jan 26, 2023 •

edited

liad5h commented Feb 1, 2023

msvticket commented Feb 8, 2023

jbilliau-rcd commented Feb 13, 2023

martinohmann commented Feb 16, 2023 •

edited

cyril94440 commented May 5, 2023

Kulagin-G commented Jun 8, 2023 •

edited

k8s-triage-robot commented Jan 22, 2024

ddieulivol commented Jan 22, 2024

k8s-triage-robot commented Apr 21, 2024

CameronMackenzie99 commented May 2, 2024

rookiehelm commented May 10, 2024

sileyang-sf commented May 13, 2024

rookiehelm commented May 14, 2024 •

edited

Some new TXT records are not being cleaned up, causing an "InvalidChangeBatch" error #3186

Some new TXT records are not being cleaned up, causing an "InvalidChangeBatch" error #3186

Comments

born4new commented Nov 23, 2022

Searching for <our-dns-name>.

Searching for a-<our-dns-name>.

rymai commented Nov 23, 2022 • edited

benjimin commented Nov 23, 2022

born4new commented Dec 6, 2022

JonathanLachapelle commented Dec 9, 2022

JonathanLachapelle commented Dec 12, 2022

Searching for <our-dns-name>.

Searching for a-<our-dns-name>.

born4new commented Dec 14, 2022

xavidop commented Dec 14, 2022 • edited

ArturChe commented Dec 14, 2022

IKohli09 commented Jan 26, 2023 • edited

liad5h commented Feb 1, 2023

msvticket commented Feb 8, 2023

jbilliau-rcd commented Feb 13, 2023

martinohmann commented Feb 16, 2023 • edited

cyril94440 commented May 5, 2023

Kulagin-G commented Jun 8, 2023 • edited

k8s-triage-robot commented Jan 22, 2024

ddieulivol commented Jan 22, 2024

k8s-triage-robot commented Apr 21, 2024

CameronMackenzie99 commented May 2, 2024

rookiehelm commented May 10, 2024

sileyang-sf commented May 13, 2024

rookiehelm commented May 14, 2024 • edited

Searching for `<our-dns-name>.`

Searching for `a-<our-dns-name>.`

rymai commented Nov 23, 2022 •

edited

Searching for `<our-dns-name>.`

Searching for `a-<our-dns-name>.`

xavidop commented Dec 14, 2022 •

edited

IKohli09 commented Jan 26, 2023 •

edited

martinohmann commented Feb 16, 2023 •

edited

Kulagin-G commented Jun 8, 2023 •

edited

rookiehelm commented May 14, 2024 •

edited