Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/crypto/acme/autocert cluster compatibility #42361

Closed
mvasi90 opened this issue Nov 3, 2020 · 7 comments
Closed

proposal: x/crypto/acme/autocert cluster compatibility #42361

mvasi90 opened this issue Nov 3, 2020 · 7 comments

Comments

@mvasi90
Copy link

@mvasi90 mvasi90 commented Nov 3, 2020

I'm planning to adapt x/crypto/acme/autocert to work in a cluster with multiple nodes.

Known issues of current autocert

  • Create a new certificate for each subdomain
    This forces you to use an external client to request certificates for other subdomains (like: mail. smtp. imap. etc.)

    The SAN field of x509.v3 is needed, in order to avoid external clients and avoid rate limit. (Wildcards aren't a desirable option).

  • Cache stored in file
    This doesn't work in a cluster with multiple nodes. Each node has its own storage.

  • Decentralized business logic
    In a cluster, a node requests the renew of a certificate and the CA server requests the domain verification to another node (internal dynamic DNS, floating IPs, etc.). This doesn't work either.

My approach:

  • Use the existent database (encrypted on disk and TLS connection: slaves, clients, master) and store all needed credentials, not only the cache. (Certificates, keys, etc.)

I have the following code for testing

func main() {
        routerTest := createRouter()
	hs := make(hosts)
	hs["test.com"] = routerTest
	hs["www.test.com"] = routerTest
	hs["sub1.test.com"] = routerTest
	m := &autocert.Manager{
		//Cache:  TODO
		Email:      "my@email.com",
		Prompt:     autocert.AcceptTOS,
		HostPolicy: autocert.HostWhitelist("test.com", "www.test.com", "sub1.test.com"),
	}
	tlsConfig := m.TLSConfig()
	tlsConfig.PreferServerCipherSuites = true
	tlsConfig.MinVersion = tls.VersionTLS12
	s := &http.Server{
		Handler:        hs,
		Addr:           ":443",
		TLSConfig:      tlsConfig,
		ReadTimeout:    10 * time.Second,
		WriteTimeout:   15 * time.Second,
		MaxHeaderBytes: 1 << 20,
		IdleTimeout:    time.Minute,
	}

	go http.ListenAndServe(":80", m.HTTPHandler(nil))
	if err := s.ListenAndServeTLS("", ""); err != http.ErrServerClosed {
		fmt.Println("tls:", err)
		os.Exit(1) // can be replaced by log.Fatal(), but this is a service (systemd) and the date is displayed twice.
	}
}

What happens now?

Every time a client connects, (in order to get the corresponding certificate) ClientHelloInfo is sent to

func (m *Manager) GetCertificate(hello *tls.ClientHelloInfo) (*tls.Certificate, error)

The next step is to check if the client supprts the ALPN protocol (if true, it uses the stored certificate. But it is secure? Any client can set ALPN as supported protocol and obtain information, while the authorization flow is active):

...
if wantsTokenCert(hello) {
	m.challengeMu.RLock()
	defer m.challengeMu.RUnlock()
	if cert := m.certTokens[name]; cert != nil {
		return cert, nil
	}
	if cert, err := m.cacheGet(ctx, certKey{domain: name, isToken: true}); err == nil {
		return cert, nil
	}
	// TODO: cache error results?
	return nil, fmt.Errorf("acme/autocert: no token cert for %q", name)
}
...

If the client does not support ALPN, then it is a regular domain. (What is a regular domain? it Is expected to be any client other than the CA Server?)

...
// regular domain
ck := certKey{
	domain: strings.TrimSuffix(name, "."), // golang.org/issue/18114
	isRSA:  !supportsECDSA(hello),
}
cert, err := m.cert(ctx, ck)
if err == nil {
	return cert, nil
}
if err != ErrCacheMiss {
	return nil, err
}

// first-time
if err := m.hostPolicy()(ctx, name); err != nil {
	return nil, err
}
cert, err = m.createCert(ctx, ck)
if err != nil {
	return nil, err
}
m.cachePut(ctx, ck, cert)
return cert, nil

My proposal (and my current implementation) is:

  • Create a CacheDB (similar to CacheDir)
    Store cache information in the database (centralized for all nodes). This is easy.
  • Store more data in the database
    Any additional data is needed between nodes?
    • certState ?
    • httpTokens ?
    • certTokens ?
    • ...

The main improvement will be the use of the SAN field to add additional subdomains:

  • Split domain name (from ClientHello) and remove the subdomain (if any). If the domain exists in the whitelist, then add all related subdomains (if any) when generating the CSR.
  • Store the certificate and key into the database and use it in all nodes.
@toothrot toothrot changed the title [Proposal] x/crypto/acme/autocert cluster compatibility Proposal: x/crypto/acme/autocert cluster compatibility Nov 3, 2020
@toothrot
Copy link
Contributor

@toothrot toothrot commented Nov 3, 2020

I'm not sure if DbCache is useful enough on its own, or if people will end up writing their own implementation using the Cache interface regardless, as I assume the nuances of each application's database are too specific to generalize in this package.

To get around thundering-herd scenarios of cache expiration, I've implemented probabilistic early expiration in a GCS cache in the past with the current API, which is helpful in scenarios with multiple frontends sharing a certificate.

To me, it feels like this proposal is doing too much, and that solving grouped CSR subdomain requests should be separate from any discussion over whether a new DbCache interface is added.

/cc @bradfitz @x1ddos

@gopherbot gopherbot added this to the Proposal milestone Nov 3, 2020
@gopherbot gopherbot added the Proposal label Nov 3, 2020
@mvasi90
Copy link
Author

@mvasi90 mvasi90 commented Nov 3, 2020

I'm not sure if DbCache is useful enough on its own, or if people will end up writing their own implementation using the Cache interface regardless, as I assume the nuances of each application's database are too specific to generalize in this package.

Yes, you are right, I thought of that. I mention it as my own implementation according to my current configuration. I know that it cannot be generalized, it is not something generic, each one has its own system with or without a database.

Using DNS verification (txt record) is not a solution for everyone. Not all providers offer APIs to manipulate dns records.
Using synchronized storage between nodes wouldn't be a solution either (it is neither efficient nor stable when any node goes down).

To get around thundering-herd scenarios of cache expiration, I've implemented probabilistic early expiration in a GCS cache in the past with the current API, which is helpful in scenarios with multiple frontends sharing a certificate.

This is not a problem for me. I can synchronize the certificates on all nodes (pub/sub) or just use a 24h timer task to check if the certificate is updated (or being updated. The first node will update it, for example using priority values).

To me, it feels like this proposal is doing too much, and that solving grouped CSR subdomain requests should be separate from any discussion over whether a new DbCache interface is added.

The DbCache is the less important for me. It is easy to implement one in any storage model.
Leaving the certificates to be cached (leaving the cache itself), I don't know what other data should be shared between the nodes. autocert has data and acme also has data which is stored in the internal memory of the process (in golang). For instance: Client.kid (keyID) is stored in memory but not in the cache.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Nov 11, 2020
@rsc rsc changed the title Proposal: x/crypto/acme/autocert cluster compatibility proposal: x/crypto/acme/autocert cluster compatibility Nov 18, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Nov 18, 2020

Talked with @bradfitz a bit about this.
It sounds like we could add a Manager.ChallengeCache field of type Cache?
But it also sounds like we could just start using the existing Manager.Cache field?

Can anyone think of problems with just reusing the existing cache for holding challenges?

@rsc rsc moved this from Incoming to Active in Proposals Nov 18, 2020
@mvasi90
Copy link
Author

@mvasi90 mvasi90 commented Dec 5, 2020

Today I'm looking at the cache implementation and design. It is poorly designed.

  • The Cache interface must not be inside DirCache.
  • DirCache should be named dircache.go instead of cache.go
  • autocert.go has several wrappers like cacheGet() and cachePut() between it and the cache, but it already prepares the key for DirCache using the suffix "+token", "+http-01" and "+rsa". This must be done within each cache implementation. In other words, cache functions receive parameters and each implementation of the cache (DirCache, DBCache,etc) treats them however it wants. For instance: DirCache prepare the file suffix "+token" and the DBCache set the column token to 1.

For this reason, any modification of the cache interface in autocert "will cause the wheel to accumulate so many patches until it becomes oval".
In my opinion, autocert should be rewritten from scratch, but I'm not going to do it because I'm too busy.

Instead, I'm going to organize the code a bit by separating the cache to be generic and then implementing the missing functions to make it compatible with other future cache implementations. This is frustrating because autocert should already be designed that way. The cache is an interface.
I don't want to be cruel but there should be a filter to select developers and not allow anyone to put their fingers.

Over time you end up wasting more time solving problems than developing libraries from scratch.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 6, 2020

@mvasi90 Thanks, but please do not put patches in GitHub issues. Please put them into Gerrit or GitHub pull requests instead, as described at https://golang.org/doc/contribute.html. That will check that the code has the necessary copyright agreement. Thanks.

@mvasi90
Copy link
Author

@mvasi90 mvasi90 commented Dec 6, 2020

@ianlancetaylor, sorry for the disturbances.

This has been a guide, the less personal data the third-party algorithms record, the better. The less they know about me, the better.

Here my collaboration ends.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 6, 2020

If you are unable to sign a copyright license agreement, then unfortunately we are unable accept your contribution. Closing this issue.

@rsc rsc moved this from Active to Declined in Proposals Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Declined
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants