From Erik Nygren #50

moonshiner · 2023-04-01T00:53:52Z

A number of comments and suggestions:

APEX domains, and hostnames vs domains

You define APEX but don't then reference this. This is an important topic to cover in considerably more detail, however. In particular, some systems want to validate an apex domain while others want to validate each particular hostname. It is critical that validation record and its contents are unambiguous as to which of these is the case.

As an example, ACME has separate mechanisms for wildcard certs (eg, "*.example.com") vs individual names (eg, "bar.example.com").
This is likely to apply across the board to these systems: sometimes they want to validate usage for a domain and sometimes for just specific names.

For the individual hostname case, it is important to clarify that the challenge should be "_foo-challenge.bar.example.com".

For the whole domain case this could be "_foo-wildcard-challenge.example.com" or have an attribute in the TXT token (eg, wildcard=true). ACME (rfc8555 section-8.4) doesn't seem to have this differentiation, which seems unfortunate, unless I'm misreading.
I'd think that it should be unambiguous to domain admins whether a challenge is for just the "example.com" name, for "*.example.com", or for "example.com, *.example.com, ..example,com, etc".

What it means to validate a hostname or a domain or a wildcard set of hostnames may vary widely per application, and we may want to talk more about the security considerations here.

Ambiguities about whether a given verification token grants powers over a specific hostname or an entire domain also introduce security challenges that we may wish to talk about in Security Considerations. DNS domain administrators need to be able to understand the consequences of adding in particular challenge entries into their domain, especially in cases like a multi-tenant Enterprise environment.

Public suffixes

We may wish to encourage (or require) validating against Public Suffix lists (eg, https://publicsuffix.org/), in the absence of a more general DBOUND solution. At a minimum we should discuss this in security considerations.

One Security Consideration is that services operating a public suffix should take extreme care about when they allow underscore labels to be created within a shared domain. As an example, if a service provider allows "_foo-challenge.publicsuffix.example" to be registered as a domain (for a DNS registrar) or to be created as a CNAME or TXT record (eg, for a dynamic DNS provider or cloud provider) then this might grant unintended powers over all of "publicsuffix.example".

We may also want to (encourage? require?) confirming that a user isn't trying to place a validation token on a public suffix. ACME has this as a "CA Policy Consideration" (Section 10.5 of rfc8555). There are some legitimate use-cases here, but caution (and perhaps extra validation?) is needed.

(For the Appendix, another example would be the PSL itself. Per https://github.com/publicsuffix/list/wiki/Guidelines
It uses "_psl.alphaexample.com TXT publicsuffix/list#100" for validation.)

SaaS/Paas/intermediary provider cases (eg, CDNs)

A common use-case is for delegation of control over to an intermediate. For example, indicating that a SaaS provider or CDN may manage certificates for "foo.example.com". One way to handle this is to have CNAME the challenge to that intermediary and then the intermediary returns the TXT record. For example, you might have:

_acme-challenge.foo.example.com. IN CNAME ${TOKENA}.intermediate-provider.example.
${TOKENA}.intermediate-provider.example. IN TXT ${TOKENB}

This allows .intermediate-provider.example to keep updating TOKENB
for each renewal. (It's not reasonable for the intermediate provider to tell their customer
to go back and require updating _acme-challenge.foo.example.com every three months.)

This is often going to be done alongside delegating the hostname
to the intermediate provider. For example, there will likely also be a CNAME of:

foo.example.com. IN CNAME foo-example-com.cdn.example.

The separate CNAMEs (ie, these being distinct labels) are important because
the certificate and validation needs to happen before actually moving the hostname over.

This is a case where the CNAME for "_acme-challenge.foo.example.com" generally needs to be persistent
for frequent/periodic renewals.

Of critical importance is that TOKENA is also secure and has enough entropy
and is tied to the particular customer account that provisioned foo.example.com.

In the draft we probably want to talk about this as cases where there is a CNAME to the TXT record,
and that the target of the CNAME needs to itself always have a token with adequate cryptographic entropy.

We might mention in A.1.4 on Time-bound checking that cert renewals are a case where
persistence is required, at least of a CNAME to a provider who may be managing the renewals.

3a) Leveraging ACME challenges for other purposes

A related question worth considering: when is it acceptable to leverage ACME challenge for other purposes? For example, if moving a domain onto a CDN that is going to get a certificate for the domain prior to the migration but which also wants to validate that it is authorized for the domain to be transferred to it, when can the ACME challenge also be leveraged for both purposes?
I'm not sure we need to go into this, but perhaps it should be discussed.

Multi-provider / multi-CDN setups

A related and messier corner-case are multi-provider / multi-CDN setups.
For example, "foo.example.com" may CNAME to one of three different CDNs.
Each one of these needs to be able to manage a certificate and renew it every three months.
This likely applies to some of these other cases as well. I don't have good answers --- ACME doesn't
handle this terribly well today --- but it is worth some thought as to how to handle.

Token format / construction

It seems like the actual token contents should have more flexibility.
I don't think we want a "MUST" on that particular construct. It may be worth
a MUST that there is at least 128 bits of secure entropy, and that the token is
either base64 or hex encoded. But there may be a need to use other
constructs in the future (eg, not SHA256). Giving the current example
as a MAY seems reasonable.

There may be reasons for other constructs that embed state within the token.
For example: "HMAC-SHA256(private_key, label+account+domain)" may be appropriate in some cases,
although has enough security considerations that I'm not sure we want to include that.

Binding tokens to requests

We should have a note on the critical importance of binding the token to the requesting account and to the requested name.
At a minimum this should be in Security Considerations, but it may also wish to be normative.
Usage here typically follows a flow of:
a) user/account requests a token for a given $name from a service provider
b) user/account has their DNS admin put their token in for _challenge.$name
c) service validates that _challenge.$name has $token and then grants access to the user/account

There are chains of custody here and linkages that need to happen, and are exploitable if they break down.
For example, if steps (a) and (c) aren't explicitly linked then a different user on a different account
could potentially jump in at step c and grab access. There may be other corner cases here,
and it may be worth some more detailed formal analysis to be able to express what properties are critical
for safety.

There may be a related Security Consideration that I'm not sure how to handle where a MitM style attack could jump in before step (a). For example, if the user is phished into talking to a different service provider than they thought they were talking to. (I'm not sure this needs to be discussed, but is a risk.)

TTL recommendations

We should provide some TTL recommendations for the TXT record, and perhaps also provide
a warning on long SOA (negative caching) TTLs.

This seems like a case where we'd want to recommend using short TTLs on the TXT record
to allow recovering from misconfigurations. These shouldn't be polled frequently so cachability
is unlikely to be an issue, but if there's a typo and the TTL is long then there may not be a way
to recover since the validator may have the bad entry cached for the TTL.

A long SOA TTL (ie, negative caching TTL) could also cause issues.
Once the service provider issues the challenge the validator may start polling for its presence.
The first attempts are likely to get an NXDOMAIN, and if the NXDOMAIN is cached too long
this could cause user confusion and/or delay the validation.
(I'm not sure it's reasonable to suggest that validators bound the maximum NXDOMAIN caching time?)

Policy constraints as a variant

Within ACME, challenge tokens exist as only one part of the validation process.
They act as an explicit "allow this particular name to be issued this particular cert based on a CSR".
There is also another safeguard, however, which is the CAA record. That acts as a policy-based constraint.

As we are generalizing the challenges, it may be worth considering generalizing the policy-based constraints.
For example, in an enterprise environment "example.com" may wish to limit the use of _foo-challenge
under their domain so that bar.quux.example.com can't put in "_foo-challenge.bar.quux.example.com".
(More concretely, example.com may wish to limit the CDNs and/or SaaS providers that can be used
within their domain.)

This is almost certainly substantial scope creep for this draft, but without it domain admins
may be unable to apply policies or manage the sort of risks managed with CAA records.

This might be as simple as allowing the definition of _foo-constraint.example.com as a TXT record,
with whoever defines _foo-challenge also defining the format of _foo-constraint.
As part of validating _foo-challenge.bar.quux.example.com, validators should look for
_foo-constraint.example.com and _foo-constraint.quux.example.com and _foo-constraint.bar.quux.example.com
and implementing their constraints when present.

Registry of labels?

I hate to ask it, but is there a need for a registry of _foo-challenge labels?
It seems like there could be potential security and operational risks
of multiple entities starting to use "_foo-challenge" for unrelated purposes.

Security review

Given that domain verification is often used as part of security systems,
it seems like it would be worth getting some additional security review,
such as bringing this to SAAG?

Thanks again for working on this much needed draft!

moonshiner · 2023-07-01T20:26:01Z

For APEX, I suggest referencing 8499 (https://www.rfc-editor.org/rfc/rfc8499.html#section-7)
and discuss best practice is to not put records in the zone apex.

For registry of labels - I feel it's too early for this, but I can be convinced otherwise