Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
apiserver: implement LetsEncrypt support #6345
Conversation
axw
approved these changes
Sep 29, 2016
This is awesome, and so simple. I just tested it out on Azure, and it works very nicely.
LGTM with a few small things.
| - srv: srv, | ||
| - cert: cert, | ||
| - certChanged: certChanged, | ||
| +// getCertificate returns the certificate appropriate for a client |
| @@ -1435,6 +1435,11 @@ func (e *environ) setUpGroups(controllerUUID, machineId string, apiPort int) ([] | ||
| ToPort: apiPort, | ||
| SourceIPs: []string{"0.0.0.0/0"}, | ||
| }, { | ||
| + Protocol: "tcp", |
jameinel
Sep 29, 2016
•
Owner
Why do we have to use 443? Are we not allowed to serve valid traffic from a nonstandard port? I realize you probably can't just browse to the web page by the DNSName but you can always go to https://foobar.com:17070 can't you?
I guess we want to be serving the GUI on :443 so people get it easily?
I think Andrew's point is valid, that either we just change API port everywhere to default to 443, or we serve only the gui on 443, in which case we don't have to touch API port, we can always expose 443, etc.
Note that if we do need this change, then we need to touch far more than just AWS, since it will affect all of our clouds.
Digging into this, it seems the ACME protocol wants to use 443?
| + // This collection holds information cached by autocert certificate | ||
| + // acquisition. | ||
| + autocertCacheC: { | ||
| + global: true, |
rogpeppe
Sep 29, 2016
Owner
done (BTW, tangential remark - the "modelUserLastConnectionC" collection below doesn't seem
to belong in this section as it's not global).
rogpeppe
Sep 29, 2016
Owner
I'm slightly surprised that I didn't get a panic trying to get the WriteCollection without this.
| @@ -0,0 +1,70 @@ | ||
| +package state |
| @@ -0,0 +1,44 @@ | ||
| +package state_test |
| + c.Assert(err, jc.ErrorIsNil) | ||
| + err = cache.Put(ctx, "b", []byte("bval")) | ||
| + c.Assert(err, jc.ErrorIsNil) | ||
| + |
axw
Sep 29, 2016
Member
Perhaps also an explicit test that shows what happens when you specify an existing key? I'm assuming it's OK to overwrite a cache entry?
|
This is great Roger, thank you! |
jameinel
reviewed
Sep 29, 2016
Have we tested how all of this works if you bootstrap multiple times, and have different servers in between? I don't have a great understanding of how Let's Encrypt keeps things up to date. Does it allow as many signed certificates as you like, as long as the DNS record currently points to the record you are trying to create?
https://letsencrypt.org/docs/rate-limits/ says
The main limit is Certificates per Registered Domain (20 per week).
That seems fine for a real site, slightly worried about testing.
Trying to look over the ACME implementation, it appears that crypto/acme starts a 'auto-renew' 1-week before expiration goroutine. However, AFAICT that goroutine's lifetime doesn't appear to be tied to anything. I do see a 'stopRenew' function, but I didn't see anything in the original 'go s.renew()' that would clearly trigger that.
In fact, the only places I could find that call 'stopRenew' are in the autocert test suite itself, which means that any tests we have that might trigger a manager to run, will end up with renewal goroutines that we can't trigger to stop (at least that I can find).
I didn't find anything concrete that would indicate that we shouldn't use Let's Encrypt, and the ACME protocol seems a nice way to implement automated signed certs. (If you can get DNS to point to your website such that Letsencrypt gets to your server, then you get a signed cert.)
Do we know the lag times on setting up a new host, and updating your DNS records and the time it will take for Letsencrypt to try and find you? I suppose LE could be nearly synchronous (when we set up the server, we immediately ask them to contact us back, and then they do a DNS lookup and then try to contact us there.)
| +func (srv *Server) getCertificate(clientHello *tls.ClientHelloInfo) (*tls.Certificate, error) { | ||
| + logger.Infof("getCertificate - clientHello.ServerName: %q", clientHello.ServerName) | ||
| + cert, shouldUse := srv.localCertificate(clientHello.ServerName) | ||
| + if shouldUse { |
jameinel
Sep 29, 2016
•
Owner
So if 'shouldUse' is true, we return cert, nil, but if shouldUse is false, we return cert,nil?
Also, 'shouldUse' doesn't seem to quite be enough information. If we shouldn't use it, then why would we get a cert at all. Is it something like "valid server name" or something like that?
rogpeppe
Sep 29, 2016
Owner
The idea is that we always fall back to some certificate even when we can't obtain a letsencrypt certificate, so you'll get the same failure mode as previously in that case.
jameinel
Sep 29, 2016
Owner
It might be clearer if we just did:
if !shouldUse {
logger.Log()
}
return cert, nil
Rather than making it look like we do something different based on shouldUse, just make it clear that all we're doing is adding logging.
| @@ -1435,6 +1435,11 @@ func (e *environ) setUpGroups(controllerUUID, machineId string, apiPort int) ([] | ||
| ToPort: apiPort, | ||
| SourceIPs: []string{"0.0.0.0/0"}, | ||
| }, { | ||
| + Protocol: "tcp", |
jameinel
Sep 29, 2016
•
Owner
Why do we have to use 443? Are we not allowed to serve valid traffic from a nonstandard port? I realize you probably can't just browse to the web page by the DNSName but you can always go to https://foobar.com:17070 can't you?
I guess we want to be serving the GUI on :443 so people get it easily?
I think Andrew's point is valid, that either we just change API port everywhere to default to 443, or we serve only the gui on 443, in which case we don't have to touch API port, we can always expose 443, etc.
Note that if we do need this change, then we need to touch far more than just AWS, since it will affect all of our clouds.
Digging into this, it seems the ACME protocol wants to use 443?
| + } | ||
| + tlsConfig := utils.SecureTLSConfig() | ||
| + tlsConfig.GetCertificate = func(clientHello *tls.ClientHelloInfo) (*tls.Certificate, error) { | ||
| + cert, shouldUse := srv.localCertificate(clientHello.ServerName) |
frankban
Sep 29, 2016
Member
Could you add a comment here describing from where we are trying to get the cert before using autocert? The logic is clear here, but the meaning is not.
| - srv: srv, | ||
| - cert: cert, | ||
| - certChanged: certChanged, | ||
| +// getCertificate returns the certificate appropriate for a client |
| +} | ||
| + | ||
| +// Delete implements autocert.Cache.Delete. | ||
| +func (cache autocertCache) Delete(ctx context.Context, name string) error { |
rogpeppe
Sep 29, 2016
Owner
That's part of the autocert.Cache.Delete contract, so I don't think we
need to repeat it here.
| + c.Assert(err, gc.Equals, autocert.ErrCacheMiss) | ||
| + c.Assert(data, gc.IsNil) | ||
| + | ||
| + err = cache.Delete(ctx, "b") |
frankban
Sep 29, 2016
Member
Try to delete another time? I kind of feel that these could be different tests, like one for a notFound, one for fetting the cache, one for overriding a name, one for deleting, even twice. Take it or leave it.
rogpeppe
Sep 29, 2016
Owner
Yeah, I agree - it started off too simple to justify that, but I've now factored it into separate tests.
Yes, you can't get a certificate unless you listen on 443 (I believe it's because 443 is a "privileged port" and it's not uncommon to be able to listen on other ports without having control of the machine). BTW I couldn't reply directly to this comment because I had deleted the code in question.
I've added a TODO to make it possible to use a different URL for obtaining the certificate. This would allow use of the letsencrypt staging URL as well as potentially other cert providers should they emerge.
Yes, this appears to be the case. And even stopRenew doesn't actually wait for any existing renewals to complete. I don't think it's a problem for us in practice though, as even if we trigger a manager to run, it won't be able to obtain a certificate, let alone need to renew one. I'll see if I can get some synchronous-stop changes approved upstream in the autocert repo.
It seems to be pretty much immediate the first time we ask for the certificate. |
I would guess the minimum is still how long it takes for DNS to update. That's probably heavily dependent on your DNS records and who you paid for the DNS, etc. I thought minimums were usually in the hour timeframe, and often the default is 24hrs for an update. But if it is a service that is going to be long running, neither of those is going to be a dealbreaker. |
No, it's independent of DNS - the DNS needs to have been updated before you use letsencrypt. (the DNS records don't contain certificates) Letsencrypt is just verifying that it can talk to the service that the DNS record corresponds to. |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
rogpeppe commentedSep 28, 2016
This enables a controller to be contacted via a real
domain name - it will obtain an official HTTPS certificate
from the letsencrypt service.
It's not easy to test this without doing it for real, so to QA:
IP address of the new controller.
with the controller to :443.
Note that as is, this is open to the DoS vulnerability described
in https://godoc.org/golang.org/x/crypto/acme/autocert#Manager
under the HostPolicy field. We really want to whitelist a single
domain, but that involves adding controller configuration, so we
reserve that for a future PR.