apiserver: implement LetsEncrypt support #6345

Merged
merged 1 commit into from Sep 29, 2016

Conversation

Projects
None yet
6 participants
Owner

rogpeppe commented Sep 28, 2016

This enables a controller to be contacted via a real
domain name - it will obtain an official HTTPS certificate
from the letsencrypt service.

It's not easy to test this without doing it for real, so to QA:

  • bootstrap a controller with api-port=443
  • set the IP address associated with a domain you control to the public
    IP address of the new controller.
  • edit controllers.yaml to change the addresses associated
    with the controller to :443.
  • use the controller.

Note that as is, this is open to the DoS vulnerability described
in https://godoc.org/golang.org/x/crypto/acme/autocert#Manager
under the HostPolicy field. We really want to whitelist a single
domain, but that involves adding controller configuration, so we
reserve that for a future PR.

axw approved these changes Sep 29, 2016

This is awesome, and so simple. I just tested it out on Azure, and it works very nicely.

LGTM with a few small things.

apiserver/apiserver.go
- srv: srv,
- cert: cert,
- certChanged: certChanged,
+// getCertificate returns the certificate appropriate for a client
@axw

axw Sep 29, 2016

Member

this isn't used. delete?

@rogpeppe

rogpeppe Sep 29, 2016

Owner

good catch; deleted.

provider/ec2/environ.go
@@ -1435,6 +1435,11 @@ func (e *environ) setUpGroups(controllerUUID, machineId string, apiPort int) ([]
ToPort: apiPort,
SourceIPs: []string{"0.0.0.0/0"},
}, {
+ Protocol: "tcp",
@axw

axw Sep 29, 2016

Member

If you set api-port=443, then the rule above should suffice...?

@jameinel

jameinel Sep 29, 2016

Owner

Why do we have to use 443? Are we not allowed to serve valid traffic from a nonstandard port? I realize you probably can't just browse to the web page by the DNSName but you can always go to https://foobar.com:17070 can't you?
I guess we want to be serving the GUI on :443 so people get it easily?

I think Andrew's point is valid, that either we just change API port everywhere to default to 443, or we serve only the gui on 443, in which case we don't have to touch API port, we can always expose 443, etc.

Note that if we do need this change, then we need to touch far more than just AWS, since it will affect all of our clouds.

Digging into this, it seems the ACME protocol wants to use 443?

@rogpeppe

rogpeppe Sep 29, 2016

Owner

I thought I'd already removed it...

state/allcollections.go
+ // This collection holds information cached by autocert certificate
+ // acquisition.
+ autocertCacheC: {
+ global: true,
@axw

axw Sep 29, 2016

Member

rawAccess: true

@rogpeppe

rogpeppe Sep 29, 2016

Owner

done (BTW, tangential remark - the "modelUserLastConnectionC" collection below doesn't seem
to belong in this section as it's not global).

@rogpeppe

rogpeppe Sep 29, 2016

Owner

I'm slightly surprised that I didn't get a panic trying to get the WriteCollection without this.

@@ -0,0 +1,70 @@
+package state
@axw

axw Sep 29, 2016

Member

copyright header

@@ -0,0 +1,44 @@
+package state_test
@axw

axw Sep 29, 2016

Member

copyright header

+ c.Assert(err, jc.ErrorIsNil)
+ err = cache.Put(ctx, "b", []byte("bval"))
+ c.Assert(err, jc.ErrorIsNil)
+
@axw

axw Sep 29, 2016

Member

Perhaps also an explicit test that shows what happens when you specify an existing key? I'm assuming it's OK to overwrite a cache entry?

Member

frankban commented Sep 29, 2016

This is great Roger, thank you! 👍 with a couple of minor changes.

Have we tested how all of this works if you bootstrap multiple times, and have different servers in between? I don't have a great understanding of how Let's Encrypt keeps things up to date. Does it allow as many signed certificates as you like, as long as the DNS record currently points to the record you are trying to create?
https://letsencrypt.org/docs/rate-limits/ says
The main limit is Certificates per Registered Domain (20 per week).

That seems fine for a real site, slightly worried about testing.
Trying to look over the ACME implementation, it appears that crypto/acme starts a 'auto-renew' 1-week before expiration goroutine. However, AFAICT that goroutine's lifetime doesn't appear to be tied to anything. I do see a 'stopRenew' function, but I didn't see anything in the original 'go s.renew()' that would clearly trigger that.
In fact, the only places I could find that call 'stopRenew' are in the autocert test suite itself, which means that any tests we have that might trigger a manager to run, will end up with renewal goroutines that we can't trigger to stop (at least that I can find).

I didn't find anything concrete that would indicate that we shouldn't use Let's Encrypt, and the ACME protocol seems a nice way to implement automated signed certs. (If you can get DNS to point to your website such that Letsencrypt gets to your server, then you get a signed cert.)

Do we know the lag times on setting up a new host, and updating your DNS records and the time it will take for Letsencrypt to try and find you? I suppose LE could be nearly synchronous (when we set up the server, we immediately ask them to contact us back, and then they do a DNS lookup and then try to contact us there.)

apiserver/apiserver.go
+func (srv *Server) getCertificate(clientHello *tls.ClientHelloInfo) (*tls.Certificate, error) {
+ logger.Infof("getCertificate - clientHello.ServerName: %q", clientHello.ServerName)
+ cert, shouldUse := srv.localCertificate(clientHello.ServerName)
+ if shouldUse {
@jameinel

jameinel Sep 29, 2016

Owner

So if 'shouldUse' is true, we return cert, nil, but if shouldUse is false, we return cert,nil?
Also, 'shouldUse' doesn't seem to quite be enough information. If we shouldn't use it, then why would we get a cert at all. Is it something like "valid server name" or something like that?

@rogpeppe

rogpeppe Sep 29, 2016

Owner

The idea is that we always fall back to some certificate even when we can't obtain a letsencrypt certificate, so you'll get the same failure mode as previously in that case.

@jameinel

jameinel Sep 29, 2016

Owner

It might be clearer if we just did:

if !shouldUse {
logger.Log()
}
return cert, nil

Rather than making it look like we do something different based on shouldUse, just make it clear that all we're doing is adding logging.

provider/ec2/environ.go
@@ -1435,6 +1435,11 @@ func (e *environ) setUpGroups(controllerUUID, machineId string, apiPort int) ([]
ToPort: apiPort,
SourceIPs: []string{"0.0.0.0/0"},
}, {
+ Protocol: "tcp",
@axw

axw Sep 29, 2016

Member

If you set api-port=443, then the rule above should suffice...?

@jameinel

jameinel Sep 29, 2016

Owner

Why do we have to use 443? Are we not allowed to serve valid traffic from a nonstandard port? I realize you probably can't just browse to the web page by the DNSName but you can always go to https://foobar.com:17070 can't you?
I guess we want to be serving the GUI on :443 so people get it easily?

I think Andrew's point is valid, that either we just change API port everywhere to default to 443, or we serve only the gui on 443, in which case we don't have to touch API port, we can always expose 443, etc.

Note that if we do need this change, then we need to touch far more than just AWS, since it will affect all of our clouds.

Digging into this, it seems the ACME protocol wants to use 443?

@rogpeppe

rogpeppe Sep 29, 2016

Owner

I thought I'd already removed it...

👍

apiserver/apiserver.go
+ }
+ tlsConfig := utils.SecureTLSConfig()
+ tlsConfig.GetCertificate = func(clientHello *tls.ClientHelloInfo) (*tls.Certificate, error) {
+ cert, shouldUse := srv.localCertificate(clientHello.ServerName)
@frankban

frankban Sep 29, 2016

Member

Could you add a comment here describing from where we are trying to get the cert before using autocert? The logic is clear here, but the meaning is not.

apiserver/apiserver.go
- srv: srv,
- cert: cert,
- certChanged: certChanged,
+// getCertificate returns the certificate appropriate for a client
@axw

axw Sep 29, 2016

Member

this isn't used. delete?

@rogpeppe

rogpeppe Sep 29, 2016

Owner

good catch; deleted.

+}
+
+// Delete implements autocert.Cache.Delete.
+func (cache autocertCache) Delete(ctx context.Context, name string) error {
@frankban

frankban Sep 29, 2016

Member

// The operation succeeds if the document is not found?

@rogpeppe

rogpeppe Sep 29, 2016

Owner

That's part of the autocert.Cache.Delete contract, so I don't think we
need to repeat it here.

state/autocertcache_test.go
+ c.Assert(err, gc.Equals, autocert.ErrCacheMiss)
+ c.Assert(data, gc.IsNil)
+
+ err = cache.Delete(ctx, "b")
@frankban

frankban Sep 29, 2016

Member

Try to delete another time? I kind of feel that these could be different tests, like one for a notFound, one for fetting the cache, one for overriding a name, one for deleting, even twice. Take it or leave it.

@rogpeppe

rogpeppe Sep 29, 2016

Owner

Yeah, I agree - it started off too simple to justify that, but I've now factored it into separate tests.

Owner

rogpeppe commented Sep 29, 2016

Why do we have to use 443? Are we not allowed to serve valid traffic from a nonstandard port? I realize you probably can't just browse to the web page by the DNSName but you can always go to https://foobar.com:17070 can't you?
I guess we want to be serving the GUI on :443 so people get it easily?

I think Andrew's point is valid, that either we just change API port everywhere to default to 443, or we serve only the gui on 443, in which case we don't have to touch API port, we can always expose 443, etc.

Note that if we do need this change, then we need to touch far more than just AWS, since it will affect all of our clouds.

Digging into this, it seems the ACME protocol wants to use 443?

Yes, you can't get a certificate unless you listen on 443 (I believe it's because 443 is a "privileged port" and it's not uncommon to be able to listen on other ports without having control of the machine).

BTW I couldn't reply directly to this comment because I had deleted the code in question.

Have we tested how all of this works if you bootstrap multiple times, and have different servers in between? I don't have a great understanding of how Let's Encrypt keeps things up to date. Does it allow as many signed certificates as you like, as long as the DNS record currently points to the record you are trying to create?
https://letsencrypt.org/docs/rate-limits/ says
The main limit is Certificates per Registered Domain (20 per week).

That seems fine for a real site, slightly worried about testing.

I've added a TODO to make it possible to use a different URL for obtaining the certificate. This would allow use of the letsencrypt staging URL as well as potentially other cert providers should they emerge.

In fact, the only places I could find that call 'stopRenew' are in the autocert test suite itself, which means that any tests we have that might trigger a manager to run, will end up with renewal goroutines that we can't trigger to stop (at least that I can find).

Yes, this appears to be the case. And even stopRenew doesn't actually wait for any existing renewals to complete. I don't think it's a problem for us in practice though, as even if we trigger a manager to run, it won't be able to obtain a certificate, let alone need to renew one. I'll see if I can get some synchronous-stop changes approved upstream in the autocert repo.

Do we know the lag times on setting up a new host, and updating your DNS records and the time it will take for Letsencrypt to try and find you? I suppose LE could be nearly synchronous (when we set up the server, we immediately ask them to contact us back, and then they do a DNS lookup and then try to contact us there.)

It seems to be pretty much immediate the first time we ask for the certificate.

Owner

jameinel commented Sep 29, 2016

It seems to be pretty much immediate the first time we ask for the certificate

I would guess the minimum is still how long it takes for DNS to update. That's probably heavily dependent on your DNS records and who you paid for the DNS, etc. I thought minimums were usually in the hour timeframe, and often the default is 24hrs for an update. But if it is a service that is going to be long running, neither of those is going to be a dealbreaker.

LGTM overall

Owner

rogpeppe commented Sep 29, 2016

I would guess the minimum is still how long it takes for DNS to update. That's probably heavily dependent on your DNS records and who you paid for the DNS, etc. I thought minimums were usually in the hour timeframe, and often the default is 24hrs for an update. But if it is a service that is going to be long running, neither of those is going to be a dealbreaker.

No, it's independent of DNS - the DNS needs to have been updated before you use letsencrypt. (the DNS records don't contain certificates) Letsencrypt is just verifying that it can talk to the service that the DNS record corresponds to.

Owner

rogpeppe commented Sep 29, 2016

$$merge$$

Contributor

jujubot commented Sep 29, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Sep 29, 2016

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/9366

Owner

rogpeppe commented Sep 29, 2016

$$merge$$

Contributor

jujubot commented Sep 29, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Sep 29, 2016

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/9368

apiserver: implement LetsEncrypt support
This enables a controller to be contacted via a real
domain name - it will obtain an official HTTPS certificate
from the letsencrypt service.

It's not easy to test this without doing it for real, so to QA:

- bootstrap a controller with api-port=443
- set the IP address associated with a domain you control to the public
IP address of the new controller.
- edit controllers.yaml to change the addresses associated
with the controller to <yourdomain>:443.
- use the controller.

Note that as is, this is open to the DoS vulnerability described
in https://godoc.org/golang.org/x/crypto/acme/autocert#Manager
under the HostPolicy field. We really want to whitelist a single
domain, but that involves adding controller configuration, so we
reserve that for a future PR.
Owner

rogpeppe commented Sep 29, 2016

$$merge$$

Contributor

jujubot commented Sep 29, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 40a4d79 into juju:master Sep 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment