Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet #467

Closed
errordeveloper opened this issue Feb 15, 2018 · 11 comments
Labels
kind/support Categorizes issue or PR as a support question. provider/google

Comments

@errordeveloper
Copy link

I am seeing this in the in the logs:

"Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet"

I am trying to use a dirty zone, and it worked well in some initial tests I did, but somehow eventually I started seeing this. I am going to work around the problem by clearing my zone, but it would be good to understand what this error really means and why it's working okay sometimes and sometimes it doesn't.

@errordeveloper
Copy link
Author

errordeveloper commented Feb 15, 2018

I thought I should provide more background on my use-case.

I have a managed zone in GCP, it's called training.weave.works.

I spin up a few clusters, where each cluster is called training-user-<N>, and it has one service that sets the following annotations:

external-dns.alpha.kubernetes.io/hostname: "training-user-<N>.training.weave.works"
external-dns.alpha.kubernetes.io/ttl: "5"

So for each cluster I have a DNS record that points at the service inside that cluster.

I have configured external-dns like this:

          - name: external-dns
            image: registry.opensource.zalan.do/teapot/external-dns:v0.4.8
            args:
            - --source=service
            - --source=ingress
            - --policy=upsert-only
            - --provider=google
            - --registry=txt
            - --txt-owner-id=dx-training-external-dns
            - --domain-filter=training.weave.works
            - --google-project=dx-training

I wonder whether I should try tweaking --policy, --domain-filter or --txt-owner-id to more specifically assign each controller to it's own subset of records?
E.g., I suppose I could use --domain-filter="training-user-0.training.weave.works", and set policy to delete anything under that, but should I then add a subdomain (like app.training-user-0.training.weave.works) or that's not essential and it's okay to make the filter narrow like this?

Besides, I'd be good to understand why that error happens in the first place, because it didn't occur to me in earlier tests.

@linki
Copy link
Member

linki commented Feb 22, 2018

Currently a single ExternalDNS instance is designed to manage a single Kubernetes cluster, similar to e.g. an autoscaler, ingress-controller etc.

Therefore, for each of your training clusters you'll want to deploy a dedicated ExternalDNS instance. In a simple world each cluster would have its own dedicated subdomain and you'd use --domain-filter so that every attempt to declare a DNS name outside of this domain is ignored. The whole subdomain would be managed by ExternalDNS, hence there'd be no conflicts.

If multiple clusters share the same DNS namespace the different ExternalDNS instances need to coordinate themselves a bit. This is achieved in two ways:

  • --domain-filter which instructs ExternalDNS to ignore desired DNS names that are not ending in a particular suffix
  • --txt-owner-id which is a view on a DNS domain that hides any existing DNS records that don't belong to this particular instance of ExternalDNS. The goal is that multiple ExternalDNS instances can happily sync their records in the very same DNS domain without removing each others records. (A multi-tenant DNS zone where ExternalDNS is the tenant, if you will)

What I would suggest:

  • For each cluster deploy a dedicated ExternalDNS instance in that cluster
  • For each instance use a different value for --txt-owner-id, such as training-user-<N>
  • For each instance use the --domain-filter=training.weave.works (like you did) so that ExternalDNS ignores any annotations stating something else, e.g. bad.prod.weave.works.

With that setup users of cluster training-user-<N> could still create services with annotations that instruct its ExternalDNS instance to create, e.g. training-user-<N+1>.training.weave.works. However, the --txt-owner-id at least ensures that either cluster <N> or <N+1> would manage that record but never both.

If you want to ensure that even those cases are not possible you could use a different --domain-filter for each ExternalDNS instance. The domain filter is a simple suffix match so you could use --domain-filter=-<N>.training.weave.works for cluster <N> and so on. Since this looks a little odd you may also consider to give each cluster a full subdomain so your domain filter looks more like --domain-filter=".cluster-<N>.training.weave.works".

Finally, it looks to me your clusters are short lived which raises the question of cleanup. If you just terminate your cluster your DNS records will survive and they will be owned by this particular cluster's ExternalDNS instance, therefore you will never be able to reuse them in another cluster (they are claimed and you just terminated the only instance that can unclaim them, besides your manual hands of course).

Either delete all Services and Ingresses from your cluster and wait for ExternalDNS to do another syncronization before you terminate it or delete all records manually that belong to this particular --txt-owner-id after you terminated the cluster to unclaim them.

Regarding the error: afaik, this precondition error occurs when you try to delete a DNS record that doesn't exist. I believe that multiple concurrent ExternalDNS instances do conflicting changes because they share the same --txt-owner-id but since they manage different clusters see different Services.

  • ExternalDNS instance <N> constantly creates training-user-<N> and drops training-user-<N+1>
  • ExternalDNS instance <N+1> constantly creates training-user-<N+1> and drops training-user-<N>

Using different values for --txt-owner-id solves that issue.

On a side node, you can also have DNS names automatically being generated without having to add annotations by using the --fqdn-template feature.

@errordeveloper What you are trying looks interesting. Please let us know about your progress. 😃

@linki linki added the kind/support Categorizes issue or PR as a support question. label Feb 22, 2018
@errordeveloper
Copy link
Author

errordeveloper commented Feb 23, 2018 via email

@dereulenspiegel
Copy link
Contributor

Unfortunately I am currently running into the same issue. I am using the current master as of today. external-dns has already some A records from ingress resources. After that I added an annotation specifying the TTL of that resource. But external-dns is not able to update them in Google Cloud DNS. I get the following log messages:

{"level":"info","msg":"Change zone: my-zone","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Del records: api.my.zone. A [37.137.52.2 35.90.152.2] 300","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Del records: external-dnsapi.my.zone. TXT [\"heritage=external-dns,external-dns/owner=external-dns,external-dns/resource=ingress/default/api-ingress\"] 300","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Add records: api.my.zone. A [37.137.52.2 35.90.152.2] 60","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Add records: external-dnsapi.my.zone. TXT [\"heritage=external-dns,external-dns/owner=external-dns,external-dns/resource=ingress/default/api-ingress\"] 300","time":"2018-03-06T14:33:25Z"}
{"level":"error","msg":"googleapi: Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet","time":"2018-03-06T14:33:26Z"}

My first thought was, that it tried to specify the wrong (new) TTL when trying to delete the records, but as the info log shows, the TTL is correctly the old one.
Is this a known problem? And if not any tips how I can see the raw requests against the Cloud DNS API? At least q quick look didn't reveal a quick way to print the requests to the log.

@errordeveloper
Copy link
Author

I just realised that for me this error didn't appear until I have tweaked the TTL to lowest possible (5s, IIRC). Perhaps this is a more general issue to do with low TTL? I also noticed that TTL doesn't apply to TXT records, which could be related, but I don't know.

@dereulenspiegel
Copy link
Contributor

I think the change in TTL is the problem, not the length of the TTL. The records were probably first created without the TTL annotation, and then you probably added the annotation later to modify the TTL. At least that is what I was doing.
After that updates of the records are not possible any more. Actually all updates are failing, because they are batched together in a dns change request and since the delete portion fails, creates also are never executed (which is probably very sane behavior from the Cloud DNS backend).
But right now I don't really have an explanation why this is happening. Records you want to delete need to match the existing records exactly. And looking at the implementation this should be the case.

@clement-buchart
Copy link

Just encountered this : the issue is that when deleting a record after a change of TTL via annotations, external-dns tries to delete a record with the newly specified TTL, so GCP can't find it and throw an error (since the existing record has the previous TTL)

errordeveloper added a commit to errordeveloper/k9c that referenced this issue Apr 23, 2018
@damomurf
Copy link

damomurf commented May 9, 2018

I'm seeing this issue even after completely cleaning out all A and TXT records and having external-dns recreate them. As soon as it's finished creating the new ones with my annotated 60s ttl, it fails again with the "Precondition not met" error and refuses to do anything more. I've had to remove the ttl annotations to move forward.

@Evesy
Copy link
Contributor

Evesy commented May 30, 2018

Just to confirm the above, I've experienced a similar issues with records not being cleaned up when the TTL has been specified via an annotation:

external-dns.alpha.kubernetes.io/ttl: "30"

After removing the service the record it then tries to delete has a TTL of 300:

time="2018-05-30T21:14:25Z" level=info msg="Del records: record.mydomain.tld. A [10.193.96.17] 300"

Version: 0.5.1

Args:

        - --source=ingress
        - --source=service
        - --domain-filter=mydomain.tld
        - --provider=google
        - --policy=sync
        - --google-project=my-project
        - --registry=txt
        - --txt-owner-id=kubernetes
        - --log-level=debug

@damaestro
Copy link

I can confirm that when I remove external-dns.alpha.kubernetes.io/ttl I'm not seeing this issue on external-dns-0.6.0 and external-dns-0.6.1.

When I change the TTL manually, this does not cause an issue until an update needs to happen.
When I set external-dns.alpha.kubernetes.io/ttl, the record is unable to be updated.
When I leave everything default, updates work correctly.

@tclift tclift mentioned this issue Jun 29, 2018
@ffilippopoulos
Copy link

Same issue here, from what I can tell even though the record is updated in the first place, then external-dns tries to delete a record with the default ttl (300) which doesn't exist:

time="2018-07-31T12:10:18Z" level=info msg="Change zone: my-dns-zone"
time="2018-07-31T12:10:18Z" level=info msg="Del records: record.mydomain.tld. A [10.22.22.7] 300"
time="2018-07-31T12:10:18Z" level=info msg="Add records: record.mydomain.tld. A [10.22.22.7] 30"
time="2018-07-31T12:10:19Z" level=error msg="googleapi: Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet"

that shouldn't happen since a record with the correct ttl is already there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. provider/google
Projects
None yet
Development

No branches or pull requests

9 participants