Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cert-manager can't find GoogleCloud subdomain. #1507

Closed
m22r opened this issue Mar 28, 2019 · 14 comments
Closed

Cert-manager can't find GoogleCloud subdomain. #1507

m22r opened this issue Mar 28, 2019 · 14 comments
Labels
area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code triage/support Indicates an issue that is a support question.

Comments

@m22r
Copy link

m22r commented Mar 28, 2019

Describe the bug:
Cert-manager could not find GoogleCloud subdomain.

I has a zone a.foobar.com, which managed in CloudDNS.
And I want to create SSL certificates of x.a.foobar.com and y.a.foobar.com
But cert-manager attempt to find domain foobar.com

Expected behaviour:
Cert-manager attempt to find domain a.foobar.com

Steps to reproduce the bug:

kubectl logs cert-manager-54f65df574-mvmmf --namespace=cert-manager

E0328 04:39:16.988310       1 controller.go:208] challenges controller: Re-queuing item "default/dev-superset-tls-591878805-1" due to error processing: No matching GoogleCloud domain found for domain foobar.com.
E0328 04:39:17.062437       1 controller.go:208] challenges controller: Re-queuing item "default/dev-superset-tls-591878805-0" due to error processing: No matching GoogleCloud domain found for domain foobar.com.
I0328 05:09:16.988625       1 controller.go:206] challenges controller: syncing item 'default/dev-superset-tls-591878805-1'
I0328 05:09:16.988826       1 logger.go:103] Calling Discover
I0328 05:09:17.062725       1 controller.go:206] challenges controller: syncing item 'default/dev-superset-tls-591878805-0'
I0328 05:09:17.062857       1 logger.go:103] Calling Discover
I0328 05:09:17.178510       1 dns.go:89] Presenting DNS01 challenge for domain "x.a.foobar.com"
I0328 05:09:17.181176       1 dns.go:89] Presenting DNS01 challenge for domain "y.a.foobar.com"
E0328 05:09:18.445717       1 controller.go:208] challenges controller: Re-queuing item "default/dev-superset-tls-591878805-1" due to error processing: No matching GoogleCloud domain found for domain foobar.com.
E0328 05:09:18.506470       1 controller.go:208] challenges controller: Re-queuing item "default/dev-superset-tls-591878805-0" due to error processing: No matching GoogleCloud domain found for domain foobar.com.

Anything else we need to know?:
foobar.com. is managed in route53
a.foobar.com is managed in clouddns

Environment details::

  • Kubernetes version (e.g. v1.10.2): v1.11.7-gke.12
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): GKE
  • cert-manager version (e.g. v0.4.0): v0.7
  • Install method (e.g. helm or static manifests): static manifests

/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 28, 2019
@m22r
Copy link
Author

m22r commented Mar 28, 2019

I don't know why but I can get wildcard certificate *.a.foobar.com and a.foobar.com

@smoll
Copy link

smoll commented Apr 11, 2019

Same issue here, for a Cloud DNS subdomain delegated to another Cloud DNS zone. I am able to e.g. dig an A record in the subdomain so the delegate NS record is definitely set up properly.

This appears to be the same issue described at the bottom of #728 - is this a regression? I would try v0.4.1 but it doesn't appear to be hosted on the jetstack helm repo any more:

$ helm search -l jetstack/cert-manager
NAME                 	CHART VERSION 	APP VERSION   	DESCRIPTION
jetstack/cert-manager	v0.7.0        	v0.7.0        	A Helm chart for cert-manager
jetstack/cert-manager	v0.7.0-beta.0 	v0.7.0-beta.0 	A Helm chart for cert-manager
jetstack/cert-manager	v0.7.0-alpha.1	v0.7.0-alpha.0	A Helm chart for cert-manager
jetstack/cert-manager	v0.6.0        	v0.6.0        	A Helm chart for cert-manager
jetstack/cert-manager	v0.5.2        	v0.5.2        	A Helm chart for cert-manager

Edit: perhaps it took some time for the records to propagate, but it looks like it's working for me now. Not sure what else it could be, the only other thing I changed was adding the following flags:
--dns01-recursive-nameservers=8.8.8.8:53,8.8.4.4:53 --dns01-recursive-nameservers-only=true because I'm on a split-horizon DNS setup. Now my other issue is that cert-manager tries to update the wrong zone ID, but that's an unrelated problem...

@wbaumann
Copy link

wbaumann commented May 5, 2019

I also encountered the same issue when using a Cloud DNS domain, which delegates to another CloudDNS subdomain. In my case, I was able to get everything working with only this flag: --set extraArgs={--dns01-recursive-nameservers-only=true}. Here's an example of the complete helm script:

helm upgrade --install \
  --wait \
  --version v0.7.2 \
  --set extraArgs={--dns01-recursive-nameservers-only=true} \
  --namespace cert-manager \
  "cert-manager" \
  jetstack/cert-manager

Hope this helps anyone else that encounters this!

@justingrayston
Copy link

Interestingly this worked fine in the first project I used it project-1.example.com. Then when I used the same TLD for a second project, project-2.example.com I got this error in project 2.

I am still debugging and making sure I haven't missed something. I can't think why this would change anything.

@justingrayston
Copy link

I need to try and recreate this, but deleting the zone in the other project, creating a new subdomain (to get around any DNS caching) worked as expected.

@rjanovski
Copy link

rjanovski commented Aug 4, 2019

I have the same issue with route53:

  • foobar.com. is managed elsewhere (not by me)
  • a.foobar.com is managed in my route53

cert-manager's log:

I0804 10:01:26.673949       1 dns.go:101] Presenting DNS01 challenge for domain "a.foobar.com"
E0804 10:01:26.763464       1 controller.go:215] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to determine Route 53 hosted zone ID: Zone foobar.com. not found in Route 53 for domain _acme-challenge.a.foobar.com." "key"="prod/a-foobar-com-cert-3307869370-0"

challenge status:

Status:
  Presented:   false
  Processing:  true
  Reason:      Failed to determine Route 53 hosted zone ID: Zone foobar.com. not found in Route 53 for domain _acme-challenge.a.foobar.com.
  State:       pending
Events:
  Type     Reason        Age                  From          Message
  ----     ------        ----                 ----          -------
  Normal   Started       3m16s                cert-manager  Challenge scheduled for processing
  Warning  PresentError  50s (x6 over 3m16s)  cert-manager  Error presenting challenge: Failed to determine Route 53 hosted zone ID: Zone foobar.com. not found in Route 53 for domain _acme-challenge.a.foobar.com.

cert-manager v0.8.1

@Freyert
Copy link
Contributor

Freyert commented Oct 30, 2019

So cert-manager doesn't take your FQDN blindly and try to manage the zone for you with CloudDNS. What it does is it takes your FQDN and then searches from left to right for a SOA record.

So if you have a root domain cars.com hosted in Route53, with an NS record pointing to cool.cars.com hosted in CloudDNS, both have SOA records. Cert manager is searching for the SOA of cool.cars.com first, but for some reason cert-manager skips it and sees cars.com SOA and then stops. This definitely worked in the past from what I've seen.

Its this block of code which causes the issues:

https://github.com/jetstack/cert-manager/blame/79711c5e3454b846fd661ecc2b5788a8efb7a920/pkg/issuer/acme/dns/util/wait.go#L313-L349

  1. It splits your domain.
  2. It searches through the domains left to right.
  3. Tries to find an SOA record.
  4. If an SOA record appears try to find the managed zone and if we see it everything is OK.

So why does this code skip the first SOA? Well in our case it was because we had multiple zones we were delegating and I put my NS record of interest in the wrong one.

For example, we have zones cars.com, cool.cars.com, and really.cool.cars.com. cool.cars.com has an NS record in cars.com which makes it an authority for cool.cars.com. I incorrectly put the NS record for really.cool.cars.com in cars.com. This creates a conflict because:

  • cars.com is an SOA for really.cool.cars.com
  • cool.cars.com is an SOA for cool.cars.com

So when we were querying for red.really.cool.cars.com we would sometimes get an SOA record starting at cool.cars.com (which didn't have our NS record to really.cool.cars.com) and sometimes we would get the SOA for really.cool.cars.com correctly.

The way to correct this was to remove really.cool.cars.com NS from cars.com to cool.cars.com.

cars.com -> NS -> cool.cars.com -> really.cool.cars.com

when before we had

cars.com -> NS -> really.cool.cars.com
                |-> NS -> cool.cars.com

The way to debug this is to just dig really.cool.cars.com several times and see if you get the same SOA record.

@justingrayston
Copy link

Just on my 403 issue, there is an internal tracking bug at Google, I am just trying to get the exact replication steps. It seems that you can only use a service account once, if you use the same service account but with a new key the Cloud DNS api won't accept the call. Call SA a different name, or clear all keys it seems to work.

I can check your code in a bit, but if you could let me know which version of the API you are using that would be a great help.

@arianitu
Copy link

I'm running into this as well, but trying to generate a wildcard domain:

so I two zones in Google Cloud:

stage.mydomain.com
mydomain.com

I can generate a wildcard fine for:

*.mydomain.com

but fails for:

*.stage.mydomain.com

I think it's adding the entry in the wrong zone or something? Is there a way to fix this?

@arianitu
Copy link

Adding the TXT record to the correct zone generates the cert successfully, but I had to move it manually.

Maybe I don't need two zones for the same domain? I have a zone for the domain and then also a zone for the subdomain. Maybe I need 1 zone for both the domain and the subdomain.

@moriyoshi
Copy link

moriyoshi commented Jan 8, 2020

https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/dns/util/wait.go#L306

If CertManager were to do the traversal on the zones which are in a premature state, it might well resolve to a wrong FQDN and would cache it until the process terminates. So when you got the strange "No matching GoogleCloud domain found for domain", try deleting the cert-manager pod and waiting for it to respawn.

IMO, the cache implementation should have got a TTL for each entry.

@pabloli84
Copy link

https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/dns/util/wait.go#L306

If CertManager were to do the traversal on the zones which are in a premature state, it might well resolve to a wrong FQDN and would cache it until the process terminates. So when you got the strange "No matching GoogleCloud domain found for domain", try deleting the cert-manager pod and waiting for it to respawn.

IMO, the cache implementation should have got a TTL for each entry.

Thanks! Restarting cert-manager POD solved my issue.
In my case it was - I created new DNS zone in one GCP project, then deployed cert-manager, and after that added delegation to that zone in another project. So cert-manager was throwing this error all the time, till I restarted it.

@munnerz munnerz closed this as completed Apr 23, 2020
@munnerz munnerz added triage/support Indicates an issue that is a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Apr 23, 2020
@munnerz
Copy link
Member

munnerz commented Apr 23, 2020

As @Freyert points out very well in #1507 (comment), I think this issue can be resolved by properly configuring your DNS hierarchy to point to the correct zone. I don't think there's an inherent issue in our resolution logic, rather it is working as intended.

/area acme/dns01

@thicolares
Copy link

In my case, I have a domain on Google Domains. And I fixed this issue by moving the DNS solving from Google Domains to Cloud DNS: https://cloud.google.com/dns/docs/tutorials/create-domain-tutorial#set-up-domain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests