Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/acme/autocert: Manager should support DNS-01 verification #23198

Open
winteraz opened this issue Dec 20, 2017 · 31 comments
Open

x/crypto/acme/autocert: Manager should support DNS-01 verification #23198

winteraz opened this issue Dec 20, 2017 · 31 comments
Milestone

Comments

@winteraz
Copy link

@winteraz winteraz commented Dec 20, 2017

What did you do?

I've tried to setup autocert behind a firewall.

What did you expect to see?

https working flawlessly (using letsencrypt infrastructure)

What did you see instead?

Verification failed due the firewall.

I believe dns-01 should be built into Manager. It could have a function (i.e. SetTXT) field which if mutated is used by the Manager to set the TXT records required for the DNS verification.

@gopherbot gopherbot added this to the Unreleased milestone Dec 20, 2017
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 20, 2017

Who terminates your TLS? What's your setup look like?

We're really trying to keep the autocert package simple and move any complexity into the acme package. Generally our answer for people with non-standard setups is to have them use the acme package directly, rather than try to make the autocert package be all things to all people.

If we did support dns-01, I'd rather it be automatic. What do your firewall allow in & out?

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 20, 2017

/cc @x1ddos

@winteraz
Copy link
Author

@winteraz winteraz commented Dec 20, 2017

A function in Manager(i.e. type DNSUpdater func(k, v)error // updates DNS TXT records with key/name k and value k should suffice and allow the setup to be fully automatic.

Depending on the DNS provider the client/developer will provide the appropriate DNSUpdater function.
The changes would be minimal and I believe all of them should be in the verify method. The dns-01 verification seems already in developed in the acme package. https://github.com/golang/crypto/blob/master/acme/autocert/autocert.go#L491
I don't think it will make it significantly more complex than it already is.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 20, 2017

@winteraz, I'm not asking about solutions. We try to describe & understand the problem before jumping to solutions. Could you answer my questions above?

@winteraz
Copy link
Author

@winteraz winteraz commented Dec 20, 2017

My server has all incoming traffic restricted to a limited set of IPs so letsencrypt can't access it for the tls-sni challenge verification. The outgoing traffic has no restriction. I used to allow incoming traffic temporarily(as long as it took to set-up the SSL) but this requires manual intervention and defeats the purpose of ACME.

@winteraz
Copy link
Author

@winteraz winteraz commented Dec 20, 2017

This may not be the most popular setup but firewalls are not that uncommon. The main issue is that Letsencrypt doesn't advertise their IP addresses and they actually forbid whitelist practices. A simple google "Letsencrypt firewall" may prove the issue is quite common and their response is to use dns-01.

@winteraz
Copy link
Author

@winteraz winteraz commented Dec 20, 2017

It's also worth to note that the upcoming wildcard certificates will be available using only dns-01

We intend to support wildcard certificates in January 2018 as part of the ACMEv2 endpoint. Wildcard issuance will require base domain validation using DNS-01 challenges.

https://letsencrypt.org/docs/faq/

@x1ddos
Copy link

@x1ddos x1ddos commented Dec 22, 2017

The changes would be minimal and I believe all of them should be in the verify method. The dns-01 verification seems already in developed in the acme package. https://github.com/golang/crypto/blob/master/acme/autocert/autocert.go#L491
I don't think it will make it significantly more complex than it already is.

I actually doubt the changes would be minimal. The tls-sni verification is almost instant, whereas anything DNS related may take hours. It's quite a different flow, i.e. out-of-band.

@winteraz
Copy link
Author

@winteraz winteraz commented Dec 27, 2017

@x1ddos below are the changes required. I've been using the fork for several days. I still stand by my statement that the changes are minimal. I'm using route53 and the DNS verification is almost instant(i.e. takes few seconds).
golang/crypto@master...winteraz:master

@x1ddos
Copy link

@x1ddos x1ddos commented Jan 12, 2018

Given the "tls-sni-xx" is probably gone for good, "http-01" in #21890 will become the default. Maybe we should reconsider and also add "dns-01" in case something happens to "http-01" too.

@rusenask
Copy link

@rusenask rusenask commented Jan 12, 2018

Hello, I have been looking into this as well. I agree with @x1ddos that DNS challenge might be too slow for some (might be the most) of the cases, but this recent issue with tls-sni demonstrated that we need to have multiple options as a lot of applications are now helpless.

@mpx
Copy link
Contributor

@mpx mpx commented Jan 12, 2018

I think it's a little early to say tls-sni is "probably gone for good", it's going to take a while before anyone knows how this is going to work out.

There are active discussions looking at how tls-sni might be fixed. Given the benefits of using SNI, I suspect it's more likely there will be a replacement -- how long it takes is another question.

@x1ddos
Copy link

@x1ddos x1ddos commented Jan 12, 2018

I think it's a little early to say tls-sni is "probably gone for good"

From https://community.letsencrypt.org/t/2018-01-11-update-regarding-acme-tls-sni-and-shared-hosting-infrastructure/50188:

The ACME TLS-SNI-01 validation method will remain disabled permanently for new accounts by default. Since the same problems apply to TLS-SNI-02, TLS-SNI-02 will remain disabled in our upcoming ACMEv2 API endpoint.

Mitigations for Existing TLS-SNI Users

Our recommendation for users is to begin a migration to the HTTP-01 or DNS-01 validation methods. We are working to provide a reasonable amount of migration time for as many users as possible, while maintaining our commitment to security.

@billinghamj
Copy link

@billinghamj billinghamj commented Jan 12, 2018

Compatibility will need to be maintained for some period of time, in order to allow for renewals etc., but yeah clearly is it is permanently deprecated from the perspective of new users.

@mpx
Copy link
Contributor

@mpx mpx commented Jan 12, 2018

The bottom of that announcement also says:

ACME Protocol Updates

We will engage with the IETF ACME working group to decide the future of TLS-SNI validation and remediations to the discovered problems.

tls-sni-01/02 won't be back. The announcement and the discussions I linked indicate that people haven't given up on using SNI yet. It's too early to say how it will turn out.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented May 16, 2018

Who terminates your TLS? What's your setup look like?

I have a server on my home network for controlling my lights. Currently it's a simple web form that runs over HTTP with a bare IP address. I'm interested in changing it into a Progressive Web App, which requires serving over HTTPS, hence looking at acme/autocert to facilitate handling fetching/renewing TLS certificates.

The server doesn't have a public IP address, so it's not trivial to arrange for it to handle HTTP/HTTPS requests itself. However, it is relatively easy for me to arrange the server to have authorization to modify my Route 53 DNS records.

It seems like if I could provide my own challenge responder logic, that would be the easiest way to reuse the rest of autocert's logic. Open to alternative suggestions though.

@keegancsmith
Copy link
Contributor

@keegancsmith keegancsmith commented May 17, 2018

@mdempsky another way to solve your issue with less work on your end is to use Caddy as a TLS terminating proxy. It supports challenges via Route 53 DNS.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented May 17, 2018

@keegancsmith Thanks for the tip about Caddy. That does seem like a better solution for my use case. I'll look into it.

@immesys
Copy link

@immesys immesys commented May 25, 2018

The learning curve on using the acme package for DNS challenges is pretty high compared to autocert. (e.g there are no examples in the docs). It would be nice to either make it more obvious how to use the acme package for DNS challenges, or make autocert support DNS challenges. I am not really concerned about the time it takes.

FYI my use case is a service on kubernetes that provides a GRPC API, but not on port 443. I can't listen on port 80 or 443 as there are other services doing that.

@bluecmd
Copy link

@bluecmd bluecmd commented Sep 25, 2018

+1. I'm writing a BMC firmware to run on low-powered ARM CPUs that will most likely not have publicly addressable IP addresses. Doing DNS-01 for these devices is what I'm going to implement with or without autocert. I'd be pleased if I don't have to implement the plumbing myself, and I'd rather use the nice Manager interface of autocert.

@x1ddos
Copy link

@x1ddos x1ddos commented Oct 2, 2018

Autocert supports both http-01 and tls-alpn challenges. So, that's already more than 1.

It's unclear how to handle dns-01 at the moment. It is a very different flow. The way autocert works is it requests issuance of a new cert during the first inflight request. As you all know, DNS propagation may take hours for a CA server to see, unlike HTTP requests for http-01 and tls-alpn challenges where hostname resolution is expected to be within milliseconds.

We could of course do something like what's proposed in winteraz/crypto@b97c106, adding a clean up function, but it needs implementation for various DNS severs/providers. Maybe hypothetical x/crypto/acme/autocert/dns/{gcp,aws,do,etc} packages could provide some initial implementations.

I'm afraid people will start enabling dns-01 and expecting it to work as fast as the other challenges, which it most likely won't. Maybe it works today specifically with Let's Encrypt but that's just their particular implementation.

For the time being, an alternative could be for one to run a separate process, renewing the certs say in recurring cron job, and let devices use them. Here's an example for dns-01 with lower level acme.Client:

package main

import (
	"context"
	"crypto/ecdsa"
	"crypto/elliptic"
	"crypto/rand"
	"crypto/x509"
	"log"
	"os"
	"time"

	"golang.org/x/crypto/acme"
)

func main() {
	ctx := context.Background()
	client := acmeClient(ctx)

	// Authorize all domains provided in the cmd line args.
	for _, domain := range os.Args[1:] {
		authz, err := client.Authorize(ctx, domain)
		if err != nil {
			log.Fatal(err)
		}
		if authz.Status == acme.StatusValid {
			// Already authorized.
			continue
		}

		// Pick the DNS challenge, if any.
		var chal *acme.Challenge
		for _, c := range authz.Challenges {
			if c.Type == "dns-01" {
				chal = c
				break
			}
		}
		if chal == nil {
			log.Fatalf("no dns-01 challenge for %q", domain)
		}

		// Fulfill the challenge.
		val, err := client.DNS01ChallengeRecord(chal.Token)
		if err != nil {
			log.Fatalf("dns-01 token for %q: %v", domain, err)
		}
		// TODO: Implement. This depends on your DNS hosting.
		// The function must provision a TXT record containing
		// the val value under "_acme-challenge" name.
		if err := updateMyDNS(ctx, domain, val); err != nil {
			log.Fatalf("DNS update for %q: %v", domain, err)
		}
		// Let CA know we're ready. But are we? Is DNS propagated yet?
		if _, err := client.Accept(ctx, chal); err != nil {
			log.Fatalf("dns-01 accept for %q: %v", domain, err)
		}
		// Wait for the CA to validate.
		if _, err := client.WaitAuthorization(ctx, authz.URL); err != nil {
			log.Fatalf("authorization for %q failed: %v", domain, err)
		}
	}

	// All authorizations are granted. Request the certificate.
	key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
	if err != nil {
		log.Fatal(err)
	}
	req := &x509.CertificateRequest{
		DNSNames: os.Args[1:],
	}
	csr, err := x509.CreateCertificateRequest(rand.Reader, req, key)
	if err != nil {
		log.Fatal(err)
	}
	crt, _, err := client.CreateCert(ctx, csr, 90*24*time.Hour, true /* inc. chain */)
	if err != nil {
		log.Fatal(err)
	}

	// TODO: Store cert key and crt ether as is, in DER format, or convert to PEM.
}

func newClient(ctx context.Context) *acme.Client {
	akey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
	if err != nil {
		log.Fatal(err)
	}
	client := &acme.Client{Key: akey}
	if _, err := client.Register(ctx, &acme.Account{}, acme.AcceptTOS); err != nil {
		log.Fatal(err)
	}
	return client
}
@bluecmd
Copy link

@bluecmd bluecmd commented Oct 2, 2018

Thanks @x1ddos for your input! I'm curious, what do you base the slowness of DNS on? It's certainly true that a lot of DNS providers are slow, but that's not inherent in the system.

In my scenario the domain will be owned by the server in question, and it will use a very low TTL time. Likewise the SOA of the domain in question will have a low negative cache TTL.

I agree that this will not be a very common setup, but I don't agree saying that DNS will always be slow.

Anecdotal evidence on another DNS-01 setup: I'm using cert-manager and DNS validation already for some other things, and it takes about 30 seconds to do the DNS validation and get the certificate. I could probably push that lower if I wanted to as well.

@x1ddos
Copy link

@x1ddos x1ddos commented Oct 2, 2018

Ah well, your setup with very low TTL time is not that common indeed. :)
The very first requests will be hanging there until the cert is issued. Say, you have 1 QPS. With cert issuance taking 30sec there will be 30 requests waiting for the TLS handshake by the time it's ready, unless they timeout earlier.

Another bit I'm thinking about is the challenge order. At the moment, autocert will try tls-alpn-01 then http-01 if enabled and the tls-alpn fails. Suppose we add dns-01. Should it be preferred over the others, exclusive, or maybe we'll also need a way for autocert package users to indicate the order in which challenges are to be selected, if at all.

Just thinking out loud. Ideas are very welcome.

@rusenask
Copy link

@rusenask rusenask commented Oct 2, 2018

I had a similar desire and got DNS with Cloudflare as a provider working with autocert's GetCertificate() function. You can have a look at this https://github.com/rsc/letsencrypt/blob/master/lets.go great example for inspiration. It's already using correct library (github.com/xenolf/lego/acme) which supports tons of providers. However nicely it fits together, I don't believe it should be merged with x/crypto/acme/autocert :)

Regarding DNS performance: it seems very fast but subsequent requests for different subdomains will likely fail (at least they always fail for me due to a slower cleanup). It's great for wildcards though.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Oct 2, 2018

I think it's fine to keep autocert small and opinionated, focused on just TLS-ALPN. That it supports http-01 is really just a historical accident.

@rgooch
Copy link

@rgooch rgooch commented Jan 29, 2020

I recently tried out autocert, but it turns out to not be viable for us as we run an internal service. If we had split-horizon DNS then that would be be one path to viability, but migrating to that would be a long, risky process. It's not on the horizon [sic].

Using public IP addresses for our services with security groups to allow access only from company networks is also challenging, as it is difficult to identify all the public NAT addresses that are elastically assigned to our VPCs. I don't want to deploy a solution that is likely to generate an ongoing trickle of support tickets to open up access (leaving aside the limitations on the size of security groups).

It's been 1.5 years since the last comment. Has any thinking changed on the scope of autocert since then? If nothing is likely to change, then sadly I'll implement a certificate manager which is pluggable, giving users the choice of which authentication method they want to use and the ability to easily add their own. For me, this also ties into other limitations with autocert around safely performing the ACME transaction concurrently across different instances of a web service. See issue #36818 for more information.

@rgooch
Copy link

@rgooch rgooch commented Feb 11, 2020

So, since this doesn't seem to be going forward, I've written a certificate manager. It supports the dns-01 and http-01 challenge types. I've written plugins for the http-01 challenge and the dns-01 challenge with AWS Route 53. It wouldn't be hard for someone to write a dns-01 challenge responder for another DNS service.

Not yet implemented are the plugins for distributing certs+keys and ACME transaction locking, but since the code adds a random jitter for ACME attempts, you can probably get away with running multiple instances with the code as-is. I've already deployed this since it's a huuuge improvement over what we had (no automation, <60 days to go before certificates start expiring). I plan on using AWS Secrets Manager for both distributing certs+keys and for transaction locking when I write a plugin. I may also implement a plugin using etcd for this. This would provide a vendor-neutral solution for those who are willing to set up etcd.

A preview of the code is available here: https://github.com/rgooch/golib/tree/certmon-preview/pkg/crypto/certmanager

@4n3w: I gather from your thumbs-up that you may be interested in this?

@torrentkino
Copy link

@torrentkino torrentkino commented Feb 17, 2020

Autocert supports both http-01 and tls-alpn challenges. So, that's already more than 1.

It's unclear how to handle dns-01 at the moment. It is a very different flow. The way autocert works is it requests issuance of a new cert during the first inflight request. As you all know, DNS propagation may take hours for a CA server to see, unlike HTTP requests for http-01 and tls-alpn challenges where hostname resolution is expected to be within milliseconds.

We could of course do something like what's proposed in winteraz/crypto@b97c106, adding a clean up function, but it needs implementation for various DNS severs/providers. Maybe hypothetical x/crypto/acme/autocert/dns/{gcp,aws,do,etc} packages could provide some initial implementations.

I'm afraid people will start enabling dns-01 and expecting it to work as fast as the other challenges, which it most likely won't. Maybe it works today specifically with Let's Encrypt but that's just their particular implementation.

For the time being, an alternative could be for one to run a separate process, renewing the certs say in recurring cron job, and let devices use them. Here's an example for dns-01 with lower level acme.Client:

package main

import (
	"context"
	"crypto/ecdsa"
	"crypto/elliptic"
	"crypto/rand"
	"crypto/x509"
	"log"
	"os"
	"time"

	"golang.org/x/crypto/acme"
)

func main() {
	ctx := context.Background()
	client := acmeClient(ctx)

	// Authorize all domains provided in the cmd line args.
	for _, domain := range os.Args[1:] {
		authz, err := client.Authorize(ctx, domain)
		if err != nil {
			log.Fatal(err)
		}
		if authz.Status == acme.StatusValid {
			// Already authorized.
			continue
		}

		// Pick the DNS challenge, if any.
		var chal *acme.Challenge
		for _, c := range authz.Challenges {
			if c.Type == "dns-01" {
				chal = c
				break
			}
		}
		if chal == nil {
			log.Fatalf("no dns-01 challenge for %q", domain)
		}

		// Fulfill the challenge.
		val, err := client.DNS01ChallengeRecord(chal.Token)
		if err != nil {
			log.Fatalf("dns-01 token for %q: %v", domain, err)
		}
		// TODO: Implement. This depends on your DNS hosting.
		// The function must provision a TXT record containing
		// the val value under "_acme-challenge" name.
		if err := updateMyDNS(ctx, domain, val); err != nil {
			log.Fatalf("DNS update for %q: %v", domain, err)
		}
		// Let CA know we're ready. But are we? Is DNS propagated yet?
		if _, err := client.Accept(ctx, chal); err != nil {
			log.Fatalf("dns-01 accept for %q: %v", domain, err)
		}
		// Wait for the CA to validate.
		if _, err := client.WaitAuthorization(ctx, authz.URL); err != nil {
			log.Fatalf("authorization for %q failed: %v", domain, err)
		}
	}

	// All authorizations are granted. Request the certificate.
	key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
	if err != nil {
		log.Fatal(err)
	}
	req := &x509.CertificateRequest{
		DNSNames: os.Args[1:],
	}
	csr, err := x509.CreateCertificateRequest(rand.Reader, req, key)
	if err != nil {
		log.Fatal(err)
	}
	crt, _, err := client.CreateCert(ctx, csr, 90*24*time.Hour, true /* inc. chain */)
	if err != nil {
		log.Fatal(err)
	}

	// TODO: Store cert key and crt ether as is, in DER format, or convert to PEM.
}

func newClient(ctx context.Context) *acme.Client {
	akey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
	if err != nil {
		log.Fatal(err)
	}
	client := &acme.Client{Key: akey}
	if _, err := client.Register(ctx, &acme.Account{}, acme.AcceptTOS); err != nil {
		log.Fatal(err)
	}
	return client
}

Hello,

I am actually using this. Is there an ACMEv2 example for this specific case available?

Kind regards
Aiko

@rgooch
Copy link

@rgooch rgooch commented Feb 17, 2020

Regarding DNS propagation: while in theory it can take hours for records to propagate, it often takes less than a minute. For example, AWS Route 53 has a 1 minute SLA. If you create TXT records with a sub-minute TTL, it works pretty well. The approach seems to be: if it doesn't work for everyone, we won't give it to anyone. That's not how I approach things. This is one of the reasons I decided to write my own certificate manager.

Since autocert performs the ACME transaction at the start of the TLS connection, the overall experience is more vulnerable to delays in obtaining the certificate. The code I wrote starts the renewal process in a goroutine as soon as the programme starts, so it tends not to suffer from latencies. While I could have written a cron job to do this, it's simpler and more robust to build this into the code. For a cron job, one could use this: https://github.com/acmesh-official/acme.sh

The code I wrote (only) supports ACME v2: https://github.com/rgooch/golib/tree/certmon-preview/pkg/crypto/certmanager

@torrentkino
Copy link

@torrentkino torrentkino commented Feb 18, 2020

Hey,

I extracted the DNS-01 part from here:
https://github.com/golang/crypto/blob/master/acme/internal/acmeprobe/prober.go

And it looks like the POC worked with my very first try. Wow, because things became more complicated.

I wrote a broker, that enables internal servers to interact with our PowerDNS-Servers. Each server gets a token, that is associated with one fqhn. No server interacts with the PowerDNS-API directly for security reasons. And I also make sure, that all DNS slaves are in sync before starting the handshake with Let's encrypt. It worked smoothly for the last two years.

Bye
Aiko

@rgooch
Copy link

@rgooch rgooch commented Mar 19, 2020

For whoever is interested, the certmanager package I wrote is checked in, including the more advanced features of ACME transaction locking and certificate+key distribution using AWS Secrets Manager. Other Locker and Storer backends are welcome. Both dns-01 and http-01 challenges are supported. We're running small clusters of servers in Production and are quite happy with it.
Code: https://github.com/Cloud-Foundations/golib/tree/master/pkg/crypto/certmanager
API GoDoc: https://godoc.org/github.com/Cloud-Foundations/golib/pkg/crypto/certmanager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet