-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config option to specify server name for inter-node TLS verification #1507
Comments
Thanks for the suggestion, sounds reasonable. To be clear, it only applies to inter-node comms? |
And should both ends use |
Yes, because that's the only time rqlite needs to validate server certificate Common Name/SANs. i.e. it only establishes outgoing connections for inter-node comms. Of course this does raise the tangential question of how cert verification is done for mTLS, both for the raft and http ports. I imagine the cert DN is entirely ignored and only the issuer chain is validated? mTLS for client connections is typically done because you don't want to manage passwords (to avoid coordination between client and server for rotations), but rqlite currently lacks a means of mapping client cert DNs (or at least the CN within the DN) to an internal rqlite username. That would be required to do away with passwords. (Alternatively, with sufficient motivation, one could shim a reverse proxy like nginx in front of rqlite to take care of that and inject the HTTP Basic Auth header based on client's DN.) Barring that, AFAICT mTLS is only a minor extra layer of security on top of passwords, unless you wanted to go with a separate CA per cluster, which fine for personal use or small companies, but can be high friction for larger organizations with overbearing infosec policies that tend not to appreciate ad hoc PKIs. (My own is learning to live with it thanks to Istio. :))
You mean is the ServerName field in the Config struct also used for client cert verification in the mTLS case? I'm afraid I don't know offhand how Go works here. The name "ServerName" would be overloaded if so (the proper client/server-agnostic term would be Subject DN, CN or SAN) , but I don't know this aspect of Go well enough to know if it is. I'm curious though, will do some Googling. |
I have done this with openssl though, some years ago, and I know it allows a custom verification callback function that can be used to apply some custom logic to validate the peer DN (for both client and server cert checks). I have to imagine Go provides a similar facility, or else at least provides some equivalent to ServerName (or, like I said, overloads it) for verifying the the CN portion of the client cert DN. |
Ah! And so does Go: https://pkg.go.dev/crypto/tls#example-Config-VerifyConnection |
The VerifyConnection callback receives tls.ConnectionState for both client and server scenarios, and the ServerName field comment says "it's available both on the server and on the client side." I don't really understand this. It's not clear to me what ServerName would be in the client cert verification context. But the PeerCertificates field would unambiguously make the subject DN available. |
I'm going to support the narrow use case of telling nodes what server name to expect in a cert they receive from another node. |
@otoolep I'm having some difficulties validating the new feature. First, without
Anyway, onto the more relevant part, now with
Looking at the code, in rqlited/main.go: dialerTLSConfig, err = rtls.CreateClientConfig(cfg.NodeX509Cert, cfg.NodeX509Key,
cfg.NodeVerifyServerName, cfg.NodeX509CACert, cfg.NoNodeVerify) But the signature of CreateClientConfig() is func CreateClientConfig(certFile, keyFile, caCertFile, serverName string, noverify bool) (*tls.Config, error) { So the server name is being passed in place of caCertFile. These two issues are probably related, now that I glance at the code. BTW if it's better for you, I am happy to test directly out of master or a topic branch, before you merge or release. |
I have time this afternoon, I can work on a PR. |
Silly error. |
Isn't it always. :) I can take the fix-var-order branch out for the spin. |
Thanks, fixed in 8.13.2. For larger changes going forward I might have you build master to avoid the churn. |
Let me know if you find any other issues. |
Unfortunately I did. I submitted PR #1513. |
This is the problem with my lack of testing in this area. Unit testing obviously isn't sufficient. I'm going to add more extensive testing first, then fix it. I'll take a look at the PR you sent up -- thanks. |
Yeah, I think I found the issue. It's due to the awkward way the code was creating TLS configs for internode comms. I'm separating it out here: #1517 |
Ah, excellent. So indeed the listener's tls.Config was being used for outgoing peer connections. That does seem to fully explain the mystery. |
This should finally be working now -- end-to-end testing in place. |
For highly dynamic environments like Kubernetes, configuring inter-node TLS is heavily complicated by the fact that all IP addresses for the cluster need to be decided in advance. This can be especially difficult in K8s clusters that don't support custom IP pool reservations (via a CNI driver such as Calico).
But, at present, failing to do this will result in connectivity issues:
This can be avoided by
-node-no-verify
, but in addition to skipping the server name check, it also skips verification of the certificate chain, which isn't desirable in this case.One not entirely uncommon solution is to provide a custom server name that the client uses for cert verification instead of the address it connected to. This way, we can issue a certificate with a fixed Common Name such as
rqlite.db.svc.cluster.local
, use that on all nodes in the cluster, and then have rqlite verify the cert contains that name instead of an IP SAN. (The name itself becomes arbitrary but still admin-controlled.)This approach could also enable the option to automate certificate provisioning/rotation with Let's Encrypt. (Although, that would require rqlite to support dynamic reloading of cert files on disk, which I don't believe it does. At least, it doesn't support that for the user auth file in my testing.)
So I propose a new flag
-node-verify-server-name=<name>
as a means to override the server name for cert verification.This value can then be passed to https://pkg.go.dev/crypto/tls#Config.ServerName along with the other options in:
rqlite/rtls/config.go
Line 118 in 395498b
The text was updated successfully, but these errors were encountered: