Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zone apex Subject alternative name not working with wildcards #2211

Closed
3 tasks done
wirepatch opened this issue Jun 14, 2024 · 10 comments
Closed
3 tasks done

Zone apex Subject alternative name not working with wildcards #2211

wirepatch opened this issue Jun 14, 2024 · 10 comments

Comments

@wirepatch
Copy link

wirepatch commented Jun 14, 2024

Welcome

  • Yes, I'm using a binary release within 2 latest releases.
  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've included all information below (version, config, etc).

What did you expect to see?

The issue has also been reported at vancluever/terraform-provider-acme#419 . The maintainer referred me to this site replicating my original bug report.

I was creating a wildcard certificate adding a domain apex using Terraform:

resource "acme_certificate" "certificate" {
  account_key_pem           = acme_registration.registration.account_key_pem
  common_name               = "*.goik.sdi.hdm-stuttgart.cloud"
  subject_alternative_names = ["goik.sdi.hdm-stuttgart.cloud"]

  dns_challenge {
    provider = "rfc2136"
     ...
  }
  depends_on = [acme_registration.registration]
}

I was expecting a valid certificate being created.

What did you see instead?

Certificate generation fails. When omitting the subject_alternative_names = ... entry everything works fine. See my detailed bind name server log analysis at vancluever/terraform-provider-acme#419

How do you use lego?

Through Terraform ACME provider

Reproduction steps

  1. Defining above mentioned resource "acme_certificate" "certificate" {...}.

  2. Executing terraform apply

The challenge fails due to not using two separate zones related to the wildcard and the apex as being described at vancluever/terraform-provider-acme#419 .

Version of lego

Sorry but I'm using the latest acme Terraform provider unable to execute the lego binary explicitly.

Logs

DNS Bind 9 logs:

... updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': deleting rrset at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT
... updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': adding an RR at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT "9xRJx_tyhCOIY-17tpZQZOi608d8yZMd03xJgQA6Gio"
... updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': deleting rrset at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT
... updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': adding an RR at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT "a4GltWzN7vA4QgXs_55dpetl5x5nt2aHsYhTYoKMSvQ"

Terraform execution result:

...
acme_certificate.certificate: Still creating... [1m10s elapsed]
acme_certificate.certificate: Still creating... [1m20s elapsed]
╷
│ Error: error creating certificate: error: one or more domains had a problem:
│ [*.goik.sdi.hdm-stuttgart.cloud] propagation: time limit exceeded: last error: NS ns1.goik.sdi.hdm-stuttgart.cloud. did not return the expected TXT record [fqdn: _acme-challenge.goik.sdi.hdm-stuttgart.cloud., value: 9xRJx_tyhCOIY-17tpZQZOi608d8yZMd03xJgQA6Gio]: a4GltWzN7vA4QgXs_55dpetl5x5nt2aHsYhTYoKMSvQ
│
...

Go environment (if applicable)

$ go version && go env
# paste output here
@ldez
Copy link
Member

ldez commented Jun 14, 2024

Hello,

did not return the expected TXT record [fqdn: _acme-challenge.goik.sdi.hdm-stuttgart.cloud., value: 9xRJx_tyhCOIY-17tpZQZOi608d8yZMd03xJgQA6Gio]: a4GltWzN7vA4QgXs_55dpetl5x5nt2aHsYhTYoKMSvQ

This feels like a DNS propagation issue: some TXT records are absent when Let's Encrypt checks the records.

The problem is not directly related to SAN (FYI CN is considered deprecated) but to the need to create and propagate several TXT records.

For me, it's neither a lego problem nor terraform-provider-acme problem but something related to your DNS: the propagation seems very slow.

@ldez
Copy link
Member

ldez commented Jun 14, 2024

Wait a minute 🤔 Your DNS logs are unexpected.

EDIT: I was surprised by the DNS logs: I thought that the 4 logs were at the same time.
But there is no timestamp for the logs.

@ldez
Copy link
Member

ldez commented Jun 14, 2024

The rfc2136 implementation is sequential, so lego will try to handle the challenge domain by domain, not at the same time.
So this is not related to the availability of several TXT records, but purely to the DNS propagation:

  • the first TXT record is created, the challenge happens, and then the TXT record is removed.
  • the second TXT record is created, the challenge happens, and then the TXT record is removed.

But when LE asks for the second TXT record, the first TXT record is still here, because the propagation of the previous actions (delete, creation) is not done.

So same conclusion, a DNS propagation issue, the propagation seems very slow.


More details:

  common_name                =  "*.goik.sdi.hdm-stuttgart.cloud"
  subject_alternative_names  =  ["goik.sdi.hdm-stuttgart.cloud"]

A wildcard domain and the "base domain" will request the creation of TXT records with the same name:

Domain TXT record name
*.goik.sdi.hdm-stuttgart.cloud _acme-challenge.goik.sdi.hdm-stuttgart.cloud.
goik.sdi.hdm-stuttgart.cloud _acme-challenge.goik.sdi.hdm-stuttgart.cloud.

This is different from two non-wildcard domains:

Domain TXT record name
goik.sdi.hdm-stuttgart.cloud _acme-challenge.goik.sdi.hdm-stuttgart.cloud.
wwww.goik.sdi.hdm-stuttgart.cloud _acme-challenge.wwww.goik.sdi.hdm-stuttgart.cloud.

In the context (wildcard + "base domain") the propagation delay is important because of this name overlap.

@wirepatch
Copy link
Author

wirepatch commented Jun 14, 2024

EDIT: I was surprised by the DNS logs: I thought that the 4 logs were at the same time.
But there is no timestamp for the logs.

The DNS updates on the server side happen within one second. Complete log without truncation:

Jun 14 19:21:28 sdiservice named[28361]: client @0x7f234acbd168 217.245.243.187#48172/key goik.key: updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': deleting rrset at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT
Jun 14 19:21:28 sdiservice named[28361]: client @0x7f234acbd168 217.245.243.187#48172/key goik.key: updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': adding an RR at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT "JcFY2gug0IP9SAbOYCA6lrxbgilQr-YjpcVZiPDu9d0"
Jun 14 19:21:28 sdiservice named[28361]: client @0x7f234926d168 217.245.243.187#35971/key goik.key: updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': deleting rrset at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT
Jun 14 19:21:28 sdiservice named[28361]: client @0x7f234926d168 217.245.243.187#35971/key goik.key: updating zone 'goik.sdi.hdm-stuttgart.cloud/IN': adding an RR at '_acme-challenge.goik.sdi.hdm-stuttgart.cloud' TXT "mCiuV5VbdfCmT4CkdyvQFh5whtFRbDTqEK1DeARbv7s"

I do understand your conclusion about propagation times. But when using dig @8.8.8.8 ... the entries are visible quite instantaneous after the above bind log entries show up. I'd say within two seconds at max. And dig only shows the second TXT entry value from above. terraform apply then continues for more than a minute until finally failing.

@ldez
Copy link
Member

ldez commented Jun 14, 2024

The DNS updates on the server side happen within one second.

This doesn't change my conclusion because the log message did not return the expected TXT record.

This can only change something if the Terraform provider tries to overcome the sequential behavior, but I don't think so.

But when using dig @8.8.8.8 ... the entries are visible quite instantaneous after the above bind log entries show up. I'd say within two seconds at max.

  • 2 seconds it's not super slow but not super fast too (if the DNS changes happen within one second).
  • LE uses its own set of DNS to check the propagation (we don't know this list of DNS).

This tool can help to check the propagation: https://unboundtest.com/


FYI, I don't know how the Terraform provider works, I just know how lego works.

For example, I don't know if you are using a custom DNS resolver: https://go-acme.github.io/lego/usage/cli/options/#dns-resolvers-and-challenge-verification

@wirepatch
Copy link
Author

Thx for your swift reply and the detailed. I consider myself as a terraform user. I'll try your DNS resolver hint due to indeed using a delegation to a custom DNS server for the zone in question.

@vancluever
Copy link
Contributor

@wirepatch just FYI for next time when submitting an issue here (as mentioned in the referral doc) you'll want to replicate the issue with the lego CLI as since @ldez mentioned, they don't work on the TF provider, so it's important that any reproductions are done in the tools they are responsible for - this helps rule out issues with the provider as well. Most TF configurations can be replicated with the CLI.

@ldez thanks for the help on this! Looking over this deeper and looking at your replies here, funny enough, I wonder if I found the issue. We have a wrapper provider for the DNS providers that allows folks to configure multiple providers, but it does not implement sequential. Do you think that might be the culprit? Sounds like in order to implement this properly we'd have to probe through our wrapper and make an opinionated decision on whether or not parallel solve was possible depending on the results from all providers in the set. What do you think?

@wirepatch
Copy link
Author

Thx for the lego CLI link. I'm not sure however if all Terraform based scenarios are indeed easy to replicate with respect to timing issues: I tried a workaround handling the wildcard and apex zone separately forcing their respective certificate creations in sequence using depends_on:

resource "acme_certificate" "certificateWild" {
  ...
  common_name   =  "*.goik.sdi.hdm-stuttgart.cloud"

  dns_challenge {
    provider = "rfc2136"    ... 
  }
  depends_on  =  [acme_registration.registration]
}

resource "acme_certificate" "certificateApex" {
  ...
  common_name = "goik.sdi.hdm-stuttgart.cloud"
  dns_challenge {...}
  depends_on    =  [acme_certificate.certificateWild]
}

To my surprise this doesn't work either most likely because of timing / TTL issues.

Being just a Terraform user I may lack deeper (DNS) knowledge being required for the given topic(s). But am happy to follow your test proposals. Besides that you probably do have more than enough resources for testing. However if you feel so inclined I'll send required DNS bind HMAC keys for testing my particular Hetzner setup making logs accessible as well.

@ldez
Copy link
Member

ldez commented Jun 16, 2024

We have a wrapper provider for the DNS providers that allows folks to configure multiple providers, but it does not implement sequential. Do you think that might be the culprit?

@vancluever Based on the error and the DNS logs, it can be the problem: the sequential behavior is here for providers that don't support multiple TXT records for the same domain (it's for the case wildcard + base domain).

Those kinds of providers can only manage one DNS record at a time for a domain.

Sounds like in order to implement this properly we'd have to probe through our wrapper and make an opinionated decision on whether or not parallel solve was possible depending on the results from all providers in the set. What do you think?

You should either apply the "sequential behavior" on the wrapper (but you will slow down all the providers) or handle 2 clients (one for sequential, one for parallel).

@vancluever
Copy link
Contributor

@ldez thanks!

You should either apply the "sequential behavior" on the wrapper (but you will slow down all the providers) or handle 2 clients (one for sequential, one for parallel).

Yeah, I don't think it's a big deal to apply to the whole wrapper mainly because I'm pretty sure the multi-provider scenario is an edge case. So if for some reason one provider is sequential and the other is parallel, I don't think it's a huge deal if both become sequential.

I think there's enough information here to rule out lego at this time too, so feel free to close this and I'll handle it over on the provider side. Thanks again! 🙂

@ldez ldez closed this as completed Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants