GCP error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet #467

errordeveloper · 2018-02-15T09:25:35Z

I am seeing this in the in the logs:

"Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet"

I am trying to use a dirty zone, and it worked well in some initial tests I did, but somehow eventually I started seeing this. I am going to work around the problem by clearing my zone, but it would be good to understand what this error really means and why it's working okay sometimes and sometimes it doesn't.

The text was updated successfully, but these errors were encountered:

errordeveloper · 2018-02-15T09:36:41Z

I thought I should provide more background on my use-case.

I have a managed zone in GCP, it's called training.weave.works.

I spin up a few clusters, where each cluster is called training-user-<N>, and it has one service that sets the following annotations:

external-dns.alpha.kubernetes.io/hostname: "training-user-<N>.training.weave.works"
external-dns.alpha.kubernetes.io/ttl: "5"

So for each cluster I have a DNS record that points at the service inside that cluster.

I have configured external-dns like this:

          - name: external-dns
            image: registry.opensource.zalan.do/teapot/external-dns:v0.4.8
            args:
            - --source=service
            - --source=ingress
            - --policy=upsert-only
            - --provider=google
            - --registry=txt
            - --txt-owner-id=dx-training-external-dns
            - --domain-filter=training.weave.works
            - --google-project=dx-training

I wonder whether I should try tweaking --policy, --domain-filter or --txt-owner-id to more specifically assign each controller to it's own subset of records?
E.g., I suppose I could use --domain-filter="training-user-0.training.weave.works", and set policy to delete anything under that, but should I then add a subdomain (like app.training-user-0.training.weave.works) or that's not essential and it's okay to make the filter narrow like this?

Besides, I'd be good to understand why that error happens in the first place, because it didn't occur to me in earlier tests.

linki · 2018-02-22T15:08:42Z

Currently a single ExternalDNS instance is designed to manage a single Kubernetes cluster, similar to e.g. an autoscaler, ingress-controller etc.

Therefore, for each of your training clusters you'll want to deploy a dedicated ExternalDNS instance. In a simple world each cluster would have its own dedicated subdomain and you'd use --domain-filter so that every attempt to declare a DNS name outside of this domain is ignored. The whole subdomain would be managed by ExternalDNS, hence there'd be no conflicts.

If multiple clusters share the same DNS namespace the different ExternalDNS instances need to coordinate themselves a bit. This is achieved in two ways:

--domain-filter which instructs ExternalDNS to ignore desired DNS names that are not ending in a particular suffix
--txt-owner-id which is a view on a DNS domain that hides any existing DNS records that don't belong to this particular instance of ExternalDNS. The goal is that multiple ExternalDNS instances can happily sync their records in the very same DNS domain without removing each others records. (A multi-tenant DNS zone where ExternalDNS is the tenant, if you will)

What I would suggest:

For each cluster deploy a dedicated ExternalDNS instance in that cluster
For each instance use a different value for --txt-owner-id, such as training-user-<N>
For each instance use the --domain-filter=training.weave.works (like you did) so that ExternalDNS ignores any annotations stating something else, e.g. bad.prod.weave.works.

With that setup users of cluster training-user-<N> could still create services with annotations that instruct its ExternalDNS instance to create, e.g. training-user-<N+1>.training.weave.works. However, the --txt-owner-id at least ensures that either cluster <N> or <N+1> would manage that record but never both.

If you want to ensure that even those cases are not possible you could use a different --domain-filter for each ExternalDNS instance. The domain filter is a simple suffix match so you could use --domain-filter=-<N>.training.weave.works for cluster <N> and so on. Since this looks a little odd you may also consider to give each cluster a full subdomain so your domain filter looks more like --domain-filter=".cluster-<N>.training.weave.works".

Finally, it looks to me your clusters are short lived which raises the question of cleanup. If you just terminate your cluster your DNS records will survive and they will be owned by this particular cluster's ExternalDNS instance, therefore you will never be able to reuse them in another cluster (they are claimed and you just terminated the only instance that can unclaim them, besides your manual hands of course).

Either delete all Services and Ingresses from your cluster and wait for ExternalDNS to do another syncronization before you terminate it or delete all records manually that belong to this particular --txt-owner-id after you terminated the cluster to unclaim them.

Regarding the error: afaik, this precondition error occurs when you try to delete a DNS record that doesn't exist. I believe that multiple concurrent ExternalDNS instances do conflicting changes because they share the same --txt-owner-id but since they manage different clusters see different Services.

ExternalDNS instance <N> constantly creates training-user-<N> and drops training-user-<N+1>
ExternalDNS instance <N+1> constantly creates training-user-<N+1> and drops training-user-<N>

Using different values for --txt-owner-id solves that issue.

On a side node, you can also have DNS names automatically being generated without having to add annotations by using the --fqdn-template feature.

@errordeveloper What you are trying looks interesting. Please let us know about your progress. 😃

errordeveloper · 2018-02-23T10:39:01Z

Martin, Thanks for your input! I already have a working setup. I will try adding unique TXT ID, I didn't do that for some reason. If you are interested in this training setup of ours, I have most of it the bits under https://github.com/errordeveloper/k9c/blob/master/README.md, but there is also an internal repo that has more glue scripts that I am not ready to share publicaly yet (however happy to walk through in private).

…

On Thu, 22 Feb 2018, 3:08 pm Martin Linkhorst, ***@***.***> wrote: Currently a single ExternalDNS instance is designed to manage a single Kubernetes cluster, similar to e.g. an autoscaler, ingress-controller etc. Therefore, for each of your training clusters you'll want to deploy a dedicated ExternalDNS instance. In a simple world each cluster would have its own dedicated subdomain and you'd use --domain-filter so that every attempt to declare a DNS name outside of this domain is ignored. The whole subdomain would be managed by ExternalDNS, hence there'd be no conflicts. If multiple clusters share the same DNS namespace the different ExternalDNS instances need to coordinate themselves a bit. This is achieved in two ways: - --domain-filter which instructs ExternalDNS to ignore desired DNS names that are not ending in a particular suffix - --txt-owner-id which is a view on a DNS domain that hides any existing DNS records that don't belong to this particular instance of ExternalDNS. The goal is that multiple ExternalDNS instances can happily sync their records in the very same DNS domain without removing each others records. (A multi-tenant DNS zone where ExternalDNS is the tenant, if you will) What I would suggest: - For each cluster deploy a dedicated ExternalDNS instance in that cluster - For each instance use a different value for --txt-owner-id, such as training-user-<N> - For each instance use the --domain-filter=training.weave.works (like you did) so that ExternalDNS ignores any annotations stating something else, e.g. bad.prod.weave.works. With that setup users of cluster training-user-<N> could still create services with annotations that instruct its ExternalDNS instance to create, e.g. training-user-<N+1>.training.weave.works. However, the --txt-owner-id at least ensures that either cluster <N> or <N+1> would manage that record but never both. If you want to ensure that even those cases are not possible you could use a different --domain-filter for each ExternalDNS instance. The domain filter is a simple suffix match so you could use --domain-filter=-<N>.training.weave.works for cluster <N> and so on. Since this looks a little odd you may also consider to give each cluster a full subdomain so your domain filter looks more like --domain-filter=".cluster-<N>.training.weave.works". Finally, it looks to me your clusters are short lived which raises the question of cleanup. If you just terminate your cluster your DNS records will survive and they will be owned by this particular cluster's ExternalDNS instance, therefore you will never be able to reuse them in another cluster (they are claimed and you just terminated the only instance that can unclaim them, besides your manual hands of course). Either delete all Services and Ingresses from your cluster and wait for ExternalDNS to do another syncronization before you terminate it or delete all records manually that belong to this particular --txt-owner-id after you terminated the cluster to unclaim them. Regarding the error: afaik, this precondition error occurs when you try to delete a DNS record that doesn't exist. I believe that multiple concurrent ExternalDNS instances do conflicting changes because they share the same --txt-owner-id but since they manage different clusters see different Services. - ExternalDNS instance <N> constantly creates training-user-<N> and drops training-user-<N+1> - ExternalDNS instance <N+1> constantly creates training-user-<N+1> and drops training-user-<N> Using different values for --txt-owner-id solves that issue. On a side node, you can also have DNS names automatically being generated without having to add annotations by using the --fqdn-template feature. @errordeveloper <https://github.com/errordeveloper> What you are trying looks interesting. Please let us know about your progress. 😃 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#467 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPWS1wO8ffy2w9da8SH_kGpX3sqNC_gks5tXYMBgaJpZM4SGiwm> .

dereulenspiegel · 2018-03-06T14:50:15Z

Unfortunately I am currently running into the same issue. I am using the current master as of today. external-dns has already some A records from ingress resources. After that I added an annotation specifying the TTL of that resource. But external-dns is not able to update them in Google Cloud DNS. I get the following log messages:

{"level":"info","msg":"Change zone: my-zone","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Del records: api.my.zone. A [37.137.52.2 35.90.152.2] 300","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Del records: external-dnsapi.my.zone. TXT [\"heritage=external-dns,external-dns/owner=external-dns,external-dns/resource=ingress/default/api-ingress\"] 300","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Add records: api.my.zone. A [37.137.52.2 35.90.152.2] 60","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Add records: external-dnsapi.my.zone. TXT [\"heritage=external-dns,external-dns/owner=external-dns,external-dns/resource=ingress/default/api-ingress\"] 300","time":"2018-03-06T14:33:25Z"}
{"level":"error","msg":"googleapi: Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet","time":"2018-03-06T14:33:26Z"}

My first thought was, that it tried to specify the wrong (new) TTL when trying to delete the records, but as the info log shows, the TTL is correctly the old one.
Is this a known problem? And if not any tips how I can see the raw requests against the Cloud DNS API? At least q quick look didn't reveal a quick way to print the requests to the log.

errordeveloper · 2018-03-06T22:14:04Z

I just realised that for me this error didn't appear until I have tweaked the TTL to lowest possible (5s, IIRC). Perhaps this is a more general issue to do with low TTL? I also noticed that TTL doesn't apply to TXT records, which could be related, but I don't know.

dereulenspiegel · 2018-03-07T08:54:27Z

I think the change in TTL is the problem, not the length of the TTL. The records were probably first created without the TTL annotation, and then you probably added the annotation later to modify the TTL. At least that is what I was doing.
After that updates of the records are not possible any more. Actually all updates are failing, because they are batched together in a dns change request and since the delete portion fails, creates also are never executed (which is probably very sane behavior from the Cloud DNS backend).
But right now I don't really have an explanation why this is happening. Records you want to delete need to match the existing records exactly. And looking at the implementation this should be the case.

clement-buchart · 2018-04-05T10:38:52Z

Just encountered this : the issue is that when deleting a record after a change of TTL via annotations, external-dns tries to delete a record with the newly specified TTL, so GCP can't find it and throw an error (since the existing record has the previous TTL)

damomurf · 2018-05-09T23:15:31Z

I'm seeing this issue even after completely cleaning out all A and TXT records and having external-dns recreate them. As soon as it's finished creating the new ones with my annotated 60s ttl, it fails again with the "Precondition not met" error and refuses to do anything more. I've had to remove the ttl annotations to move forward.

Evesy · 2018-05-30T21:19:14Z

Just to confirm the above, I've experienced a similar issues with records not being cleaned up when the TTL has been specified via an annotation:

external-dns.alpha.kubernetes.io/ttl: "30"

After removing the service the record it then tries to delete has a TTL of 300:

time="2018-05-30T21:14:25Z" level=info msg="Del records: record.mydomain.tld. A [10.193.96.17] 300"

Version: 0.5.1

Args:

        - --source=ingress
        - --source=service
        - --domain-filter=mydomain.tld
        - --provider=google
        - --policy=sync
        - --google-project=my-project
        - --registry=txt
        - --txt-owner-id=kubernetes
        - --log-level=debug

damaestro · 2018-06-06T02:33:27Z

I can confirm that when I remove external-dns.alpha.kubernetes.io/ttl I'm not seeing this issue on external-dns-0.6.0 and external-dns-0.6.1.

When I change the TTL manually, this does not cause an issue until an update needs to happen.
When I set external-dns.alpha.kubernetes.io/ttl, the record is unable to be updated.
When I leave everything default, updates work correctly.

ffilippopoulos · 2018-07-31T12:21:41Z

Same issue here, from what I can tell even though the record is updated in the first place, then external-dns tries to delete a record with the default ttl (300) which doesn't exist:

time="2018-07-31T12:10:18Z" level=info msg="Change zone: my-dns-zone"
time="2018-07-31T12:10:18Z" level=info msg="Del records: record.mydomain.tld. A [10.22.22.7] 300"
time="2018-07-31T12:10:18Z" level=info msg="Add records: record.mydomain.tld. A [10.22.22.7] 30"
time="2018-07-31T12:10:19Z" level=error msg="googleapi: Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet"

that shouldn't happen since a record with the correct ttl is already there

linki added the kind/support Categorizes issue or PR as a support question. label Feb 22, 2018

errordeveloper added a commit to errordeveloper/k9c that referenced this issue Apr 23, 2018

Update external-dns flags (see kubernetes-sigs/external-dns#467)

b1b2a15

hjacobs added the provider/google label May 30, 2018

tclift mentioned this issue Jun 29, 2018

Owner id error #583

Closed

kevinmdavis mentioned this issue Sep 10, 2018

Fix handling of custom TTL values with Google DNS. #704

Merged

njuettner closed this as completed in #704 Sep 12, 2018

sreis mentioned this issue Jan 20, 2021

ci: dns service quota reached opstrace/opstrace#277

Closed

jgehrcke mentioned this issue Aug 3, 2021

ci-instability: gcloud.dns.record-sets.transaction.execute HTTPError 412: Precondition not met opstrace/opstrace#1068

Closed

lou-lan pushed a commit to lou-lan/external-dns that referenced this issue May 11, 2022

Fixes typo (kubernetes-sigs#467)

c32dc34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCP error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet #467

GCP error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet #467

errordeveloper commented Feb 15, 2018

errordeveloper commented Feb 15, 2018 •

edited

linki commented Feb 22, 2018

errordeveloper commented Feb 23, 2018 via email

dereulenspiegel commented Mar 6, 2018

errordeveloper commented Mar 6, 2018

dereulenspiegel commented Mar 7, 2018

clement-buchart commented Apr 5, 2018

damomurf commented May 9, 2018

Evesy commented May 30, 2018

damaestro commented Jun 6, 2018

ffilippopoulos commented Jul 31, 2018

GCP error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet #467

GCP error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet #467

Comments

errordeveloper commented Feb 15, 2018

errordeveloper commented Feb 15, 2018 • edited

linki commented Feb 22, 2018

errordeveloper commented Feb 23, 2018 via email

dereulenspiegel commented Mar 6, 2018

errordeveloper commented Mar 6, 2018

dereulenspiegel commented Mar 7, 2018

clement-buchart commented Apr 5, 2018

damomurf commented May 9, 2018

Evesy commented May 30, 2018

damaestro commented Jun 6, 2018

ffilippopoulos commented Jul 31, 2018

errordeveloper commented Feb 15, 2018 •

edited