Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DNS check #116

Merged
merged 2 commits into from Feb 11, 2016
Merged

Refactor DNS check #116

merged 2 commits into from Feb 11, 2016

Conversation

janeczku
Copy link
Contributor

@janeczku janeczku commented Feb 9, 2016

Replaces #96

  • Gets a list of all authoritative nameservers by looking up the NS RRs for the root domain (zone apex)
  • Verifies that the expected TXT record exists on all nameservers before sending off the challenge to ACME server

* Gets a list of all authoritative nameservers by looking up the NS RRs for the root domain (zone apex)
* Verifies that the expected TXT record exists on all nameservers before sending off the challenge to ACME server
@janeczku janeczku mentioned this pull request Feb 9, 2016
// checkDnsPropagation checks if the expected TXT record has been propagated to
// all authoritative nameservers. If not it waits and retries for some time.
func checkDnsPropagation(domain, fqdn, value string) error {
authoritativeNss, err := lookupNameservers(toFqdn(domain))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be lookupNameservers(toFqdn(fqdn)) instead.

The discussion in a previous thread still applies to the current commit. I'm commenting again so it appears here. [ cc: @janeczku ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation domain (fqdn) and domain are owned by the same DNS authority, hence they share the same set of nameservers. The reason we are using the domain to lookup the authoritative NS is that we expect the domain to exist, while the fqdn record may not yet have propagated in the DNS system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is legal to delegate the validation domain to a different set of nameservers, in which case lookupNameservers(toFqdn(domain)) and lookupNameservers(toFqdn(fqdn)) will have different results.

It is also legal to complete a DNS-01 challenge by publishing a TXT record for fqdn even if the public authoritative nameservers have no records at all for domain (e.g. split horizon DNS), as I mentioned in a second thread on the previous PR. It's not correct to require domain to exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The ACME spec could not be more clearer:

The client constructs the validation domain name by prepending the label
"_acme-challenge" to the domain name being validated, then provisions a TXT
record with the digest value under that name."

Where does it tell you to delegate the validation domain to another zone?

Also, doing so would undermine the whole objective of the DNS challenge, which is to establish proof of ownership of the certificate domain by the client.

"The client must demonstrate to the server both (1) that it holds the private key of the account key pair, and (2) that it has authority over the identifier being claimed."

If the validation domain is by itself a root domain (delegated to a different zone), then by provisioning a TXT record under it, you do not prove authority over the (parent) certificate domain, which may be owned by a third party. (e.g. _acme-challenge.co.uk. -> you, but co.uk. -> ns1.nic.uk)

Lastly, why are we even having this academic discussion? Lego solves DNS challenge by provisioning a TXT record under the certificate domain. No zone delegation is happening, which is why we don't need to handle such case in the DNS check (irrespective of whether that would be technically legal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also legal to complete a DNS-01 challenge by publishing a TXT record for fqdn even if the public authoritative nameservers have no records at all for domain (e.g. split horizon DNS), as I mentioned in a second thread on the previous PR. It's not correct to require domain to exist.

We do not require the certificate domain to exist. E.g, if the certificate domain www.example.com has no records (NXDOMAIN) the DNS check will still pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACME spec could not be more clearer:

The client constructs the validation domain name by prepending the label "_acme-challenge" to the domain name being validated, then provisions a TXT record with the digest value under that name."

Exactly. You don't provision records on the domain name being validated, you provision records for the validation domain name. The DNS-01 section doesn't concern the domain name at all – only the validation domain name, fqdn in this context. DNS-01 doesn't make any requirements of the domain name, only of the validation domain name.

Where does it tell you to delegate the validation domain to another zone?

Where does it tell you I can't delegate it?

Also, doing so would undermine the whole objective of the DNS challenge, which is to establish proof of ownership of the certificate domain by the client.

The DNS-01 challenge does not require asserting control over the certificate domain. If that were the goal, it could require the TXT record to be published on the certificate domain instead. (It's already specified to ignore non-matching responses, so there's no conflict even if you've already got TXT records on that name.) Why use a separate validation domain name at all?

There are legitimate use cases for delegating _acme-challenge – as I mentioned in the previous thread, the situation where it is impractical to modify bar.com from a cron job because of grumpy change control officers (and the compliance reasons behind their grumpiness). It is desirable in such situations to delegate _acme-challenge.foo.bar.com out of the bar.com zone.

If the validation domain is by itself a root domain (delegated to a different zone), then by provisioning a TXT record under it, you do not prove authority over the (parent) certificate domain, which may be owned by a third party. (e.g. _acme-challenge.co.uk. -> you, but co.uk. -> ns1.nic.uk)

DNS-01 does not concern itself with the certificate domain name, it concerns itself with the validation domain name. If you can control the validation domain name, regardless of which zone or nameservers you use to do it, that is 100% sufficient to satisfy a DNS-01 challenge. Delegation is a part of DNS, and I maintain that it is useful to delegate ACME challenges.

As for _acme-challenge.co.uk, please see the complex history of underscores in DNS. (Or just try registering that domain name.) It is not an accident that _acme-challenge contains an underscore.

Lastly, why are we even having this academic discussion? Lego solves DNS challenge by provisioning a TXT record under the certificate domain. No zone delegation is happening, which is why we don't need to handle such case in the DNS check (irrespective of whether that would be technically legal).

It provisions a TXT record under the validation domain. RFC2136_ZONE=_acme-challenge.foo.bar.com lego --dns=rfc2136 can easily create records in a delegated zone, and of course there's --dns=manual.

We're having this discussion because delegation is a situation where this check would fail while boulder would succeed, and I consider that a bug in this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it tell you to delegate the validation domain to another zone?

Where does it tell you I can't delegate it?

A TXT record can not co-exist with a NS record for the same name if the name is a subdomain and not the root domain. Hence the spec describing you to create a TXT record under the validation domain by implication means that you should not create a NS record there which would be necessary for your scenario of delegation.

But again, this discussion is out of scope for this PR as Lego's DNS providers don't delegate the validation domain anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TXT record can not co-exist with a NS record for the same name if the name is a subdomain and not the root domain.

It can and does. (Edit: Apologies, I read your statement wrong. You're correct that it can't exist as a sibling to the IN NS delegation in the parent zone, but as I illustrate, that's not at issue.)

Scenario: issuing a certificate for is-delegated.willglynn.com.

willglynn.com IN NS ns1.willglynn.com, which has willglynn.com IN SOA ns1.willglynn.com. Within the willglynn.com zone, is-delegated.willglynn.com does not exist, but _acme-challenge.is-delegated.willglynn.com IN NS ns1.lerfjhax.com.

ns1.lerfjhax.com has a zone _acme-challenge.is-delegated.willglynn.com IN SOA ns1.lerfjhax.com. Within that zone, _acme-challenge.is-delegated.willglynn.com IN NS ns1.lerfjhax.com, agreeing with the delegation, and there exists an _acme-challenge.is-delegated.willglynn.com IN TXT record as well.

Asking lookupNameservers(toFqdn(domain)) about the validation domain returns a referral, since ns1.willglynn.com knows it is not an authority for the validation domain:

$ dig txt _acme-challenge.is-delegated.willglynn.com @ns1.willglynn.com

; <<>> DiG 9.8.3-P1 <<>> txt _acme-challenge.is-delegated.willglynn.com @ns1.willglynn.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 988
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;_acme-challenge.is-delegated.willglynn.com. IN TXT

;; AUTHORITY SECTION:
_acme-challenge.is-delegated.willglynn.com. 300 IN NS ns1.lerfjhax.com.

;; Query time: 62 msec
;; SERVER: 205.251.198.58#53(205.251.198.58)
;; WHEN: Tue Feb  9 13:25:06 2016
;; MSG SIZE  rcvd: 87

Asking the nameserver which is actually an authority for the validation domain name – i.e. lookupNameservers(toFqdn(fqdn)) – returns the TXT record:

$ dig txt _acme-challenge.is-delegated.willglynn.com @ns1.lerfjhax.com

; <<>> DiG 9.8.3-P1 <<>> txt _acme-challenge.is-delegated.willglynn.com @ns1.lerfjhax.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24113
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;_acme-challenge.is-delegated.willglynn.com. IN TXT

;; ANSWER SECTION:
_acme-challenge.is-delegated.willglynn.com. 60 IN TXT "1Mvyj8Fkgch0YMz0PyTk83hrPIHrYgR-pbvN3KuevTw"

;; AUTHORITY SECTION:
_acme-challenge.is-delegated.willglynn.com. 259200 IN NS ns1.lerfjhax.com.

;; ADDITIONAL SECTION:
ns1.lerfjhax.com.   86400   IN  A   208.79.90.50
ns1.lerfjhax.com.   86400   IN  AAAA    2607:f2f8:a428::2

;; Query time: 76 msec
;; SERVER: 2607:f2f8:a428::2#53(2607:f2f8:a428::2)
;; WHEN: Tue Feb  9 13:25:08 2016
;; MSG SIZE  rcvd: 187

And yes, this works:

2016/02/09 13:25:14 [INFO][is-delegated.willglynn.com] The server validated our request
2016/02/09 13:25:14 [INFO] acme: You can now remove this TXT record from your DNS zone:
2016/02/09 13:25:14 [INFO] acme: _acme-challenge.is-delegated.willglynn.com. 120 IN TXT "..."
2016/02/09 13:25:14 [INFO][is-delegated.willglynn.com] acme: Validations succeeded; requesting certificates
2016/02/09 13:25:14 [INFO] acme: Requesting issuer cert from https://acme-staging.api.letsencrypt.org/acme/issuer-cert
2016/02/09 13:25:14 [INFO][is-delegated.willglynn.com] Server responded with a certificate.

But again, this discussion is out of scope for this PR as Lego's DNS providers don't delegate the validation domain anywhere.

…unless you delegate the validation domain ahead of time and only need lego to create the TXT record in the delegated zone. As I said, RFC2136_ZONE=_acme-challenge.foo.bar.com lego --dns=rfc2136 can do that right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…unless you delegate the validation domain ahead of time and only need lego to create the TXT record in the delegated zone. As I said, RFC2136_ZONE=_acme-challenge.foo.bar.com lego --dns=rfc2136 can do that right now.

You are right, thats a valid (if edgy) scenario.

@willglynn
Copy link
Contributor

I like this strategy. Unfortunately, it is insufficient to mimic the behavior of typical recursive resolvers:

$ dig txt _acme-challenge.is-a-cname.willglynn.com @8.8.8.8

; <<>> DiG 9.8.3-P1 <<>> txt _acme-challenge.is-a-cname.willglynn.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42356
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;_acme-challenge.is-a-cname.willglynn.com. IN TXT

;; ANSWER SECTION:
_acme-challenge.is-a-cname.willglynn.com. 59 IN CNAME come-and-find-me.lerfjhax.com.
come-and-find-me.lerfjhax.com. 59 IN    TXT "ENTER MY HALL OF MIRRORS AND TREMBLE, MORTAL"

;; Query time: 114 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Feb  9 08:44:37 2016
;; MSG SIZE  rcvd: 155

This particular example is rather contrived, but still, this approach isn't 100% equivalent to how boulder will be performing lookups.

@janeczku
Copy link
Contributor Author

janeczku commented Feb 9, 2016

This particular example is rather contrived, but still, this approach isn't 100% equivalent to how boulder will be performing lookups.

The example is not only contrived. You're breaking the ACME spec when you create a CNAME record instead of a TXT record at the validation domain.

The DNS challenge requires the client to provision a TXT record containing a designated value under a specific validation domain name.

https://ietf-wg-acme.github.io/acme/#dns

@willglynn
Copy link
Contributor

The spec describes validation as:

  1. Compute the SHA-256 digest of the key authorization
  2. Query for TXT records under the validation domain name
  3. Verify that the contents of one of the TXT records matches the digest value

I read that as "query for _acme-challenge.… IN TXT and see if one of the TXT records matches the digest". Recursive resolvers follow CNAMEs and return TXT records for those CNAMEs in the answer section to the query.

boulder follows this reading of the spec:

$ lego --server=https://acme-staging.api.letsencrypt.org/directory -d is-a-cname.willglynn.com -m … --dns=manual -x http-01 -x tls-sni-01 run
2016/02/09 10:57:13 [INFO][is-a-cname.willglynn.com] acme: Obtaining bundled SAN certificate
2016/02/09 10:57:13 [INFO][is-a-cname.willglynn.com] acme: Could not find solver for: http-01
2016/02/09 10:57:13 [INFO][is-a-cname.willglynn.com] acme: Trying to solve DNS-01
2016/02/09 10:57:13 [INFO] acme: Please create the following TXT record in your DNS zone:
2016/02/09 10:57:13 [INFO] acme: _acme-challenge.is-a-cname.willglynn.com. 120 IN TXT "ptnyX8Lgo8Qq4AdE-IrEy38jk0xyWT8nJXrfbt7epLM"
2016/02/09 10:57:13 [INFO] acme: Press 'Enter' when you are done

<switch terminals>
$ dig txt _acme-challenge.is-a-cname.willglynn.com @8.8.8.8

; <<>> DiG 9.8.3-P1 <<>> txt _acme-challenge.is-a-cname.willglynn.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20282
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;_acme-challenge.is-a-cname.willglynn.com. IN TXT

;; ANSWER SECTION:
_acme-challenge.is-a-cname.willglynn.com. 59 IN CNAME come-and-find-me.lerfjhax.com.
come-and-find-me.lerfjhax.com. 59 IN    TXT "ENTER MY HALL OF MIRRORS AND TREMBLE, MORTAL"
come-and-find-me.lerfjhax.com. 59 IN    TXT "ptnyX8Lgo8Qq4AdE-IrEy38jk0xyWT8nJXrfbt7epLM"

;; Query time: 97 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Feb  9 10:57:39 2016
;; MSG SIZE  rcvd: 211

<switch back, press enter>
2016/02/09 10:57:44 [INFO][is-a-cname.willglynn.com] The server validated our request
2016/02/09 10:57:44 [INFO] acme: You can now remove this TXT record from your DNS zone:
2016/02/09 10:57:44 [INFO] acme: _acme-challenge.is-a-cname.willglynn.com. 120 IN TXT "..."
2016/02/09 10:57:44 [INFO][is-a-cname.willglynn.com] acme: Validations succeeded; requesting certificates
2016/02/09 10:57:45 [INFO] acme: Requesting issuer cert from https://acme-staging.api.letsencrypt.org/acme/issuer-cert
2016/02/09 10:57:45 [INFO][is-a-cname.willglynn.com] Server responded with a certificate.

If the spec intends to require the TXT record to be provisioned directly on the validation name, we should clarify the language in the spec and fix boulder to also require that.

@janeczku
Copy link
Contributor Author

janeczku commented Feb 9, 2016

I agree, there is something to be wished for in terms of clarity of the spec. E.g. the description as it relates to the client behavior is very specific:

For example, if the domain name being validated is “example.com”, then the client would provision the following DNS record:
_acme-challenge.example.com. 300 IN TXT "gfj9Xq...Rg85nM"

I don't see how one could read that as "... the client may alternatively create a CNAME record under the validation domain and a TXT record in the target zone."

But apparently the consensus indeed seems to be that CNAME'ing the validation domain is legal: https://www.ietf.org/mail-archive/web/acme/current/msg00827.html

Anyhow, seeing that currently none of the Lego DNS providers supports this edge case, i think there is a point to be made that we can always factor in support for this in the DNS check later when necessary.

@willglynn
Copy link
Contributor

I don't see how one could read that as "... the client may alternatively create a CNAME record under the validation domain and a TXT record in the target zone."

I read the text as normative – suggestions, really – and then worked backwards from how the validation section is written.

Anyhow, seeing that currently none of the current Lego DNS providers supports this edge case, i think there is a point to be made that we can always factor in support for this in the DNS check later when necessary.

Well, the manual provider would permit this today.

We could accommodate this case with a flag that says "instead of checking and proceeding automatically, wait for manual confirmation", or by making the automatic check recursive (i.e. following CNAMEs up to a limited depth in the pursuit of TXT records).

@janeczku
Copy link
Contributor Author

Will,

I have pushed a commit to the PR that adds support for the scenarios discussed yesterday:

  • Validation domain is a CNAME
  • Validation domain is delegated to another nameserver.

I tested checkDnsPropagation(fqdn, value) against your testing records:

_acme-challenge.is-delegated.willglynn.com PASS
_acme-challenge.is-a-cname.willglynn.com PASS

Please let me know if you are 👍 with the PR as it is so we can move forward and ask @xenolf to land this.

@willglynn
Copy link
Contributor

👍

I think this is good to merge. It works better than the current code and it addresses the two cases I had in mind as clear sources of trouble.

This check still makes some assumptions that might be problematic in conceivable situations, like that the set of nameservers doesn't change throughout the operation, but as far as I can tell so far, such issues will shake themselves out just by retrying the command. That's a big improvement over a check that will never succeed and which requires recompiling lego to bypass.

I've been hacking on a way to test code like this (DNS fixtures modeling a constellation of authoritative servers which you can update on command), but it's not ready yet, and in any case seems like the subject of a separate PR.

xenolf added a commit that referenced this pull request Feb 11, 2016
@xenolf xenolf merged commit ba64faa into go-acme:master Feb 11, 2016
@janeczku
Copy link
Contributor Author

😋 🎉 Yipeeeeh

I've been hacking on a way to test code like this (DNS fixtures modeling a constellation of authoritative servers which you can update on command), but it's not ready yet, and in any case seems like the subject of a separate PR.

Looking forward to check that out once it's ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants