New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatic mTLS #87
automatic mTLS #87
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of our discussion below, this is really well done. Thanks @briankassouf on chiming in too I'd love to make sure the Vault team has a look here.
First, thinking about security: This adds confidentiality as a core part of the security model for go-plugin, with the caveat that our threat model assumes that data passed via environment variables is safe. Env vars are generally only readable by the process owner and root. I think confidentiality actually doesn't really matter, because our connections are loopback (or Unix domain socket) by default which have the same threat model (process owner + root sniffable). Though, I suppose having encrypted traffic happen anywhere is always a Good Idea(tm). 😄
MTLS also provides identity, again with the same assumption that environment variables are safe. I can't think of a way to actively exploit this today, but in theory: 1.) plugin launches and listens on port 1234 2.) plugin crashes somehow 3.) attacker binds to port 1234 4.) attacker MITMs. I guess? Its a crazy race that would likely require starting/restarting the host application a bunch to make possible. In any case, MTLS (w/ the assumption env vars are safe in this PR) would protect against this since the host would recognize the listening port as tampered.
Just thinking out loud, might be worth adding "why" somewhere ^ and documenting it.
Next, I want to talk through the behavior of this setting:
1.) First, if the plugin explicitly has TLS configured, then it will use that. You do note in your docs that "TLSConfig must be unset" but I think its worth explaining why. This makes sense to me.
2.) Next, if the host fails to parse the cert, then we log-only the errors and still attempt to connect. In the scenario that a TLS cert is served, I believe the client Start
should hard fail if the cert parsing fails.
3.) AutoMTLS
relies on the plugin process being well behaved, listening to the env var, and serving a cert. If the plugin chooses to ignore the env var (or doesn't support it) the host attempts to connect insecurely. My question: should AutoMTLS
require MTLS or should we introduce a RequireMTLS
option to require it?
I don't want to overcomplicate anything but it seems that if we ignore these questions now, we'll create implicit "backwards compatibility" issues in the future if we choose to be a bit harder about it.
I would love a link to the relevant RFC before chiming in about the security model. |
Hey @jefferai, I think this particular change is small enough that it doesn't have a relevant RFC. Maybe we can hop in a Zoom and talk about it quickly and if we feel we need one we can do that? |
I think some documentation of the security model, whether in an RFC or as part of the comments in the code, would be useful for making sure it's solid. For instance, as someone not well versed in go-plugin, I do not know whether more than one connection can be successfully made from a (forked, or otherwise external) process back to the originating process... but if so, the certificate should really be invalidated after the first connection. |
Thanks @mitchellh for the thorough review. I didn't mean for you to spend so much time on this while it was still a WIP, I haven't even had time to complete the details and review it myself! I do need to update and extend the documentation around this if we accept the change into go-plugin. To quickly touch on some of your points, The primary threat Terraform is aiming to prevent is an unauthorized process connecting to a plugin, abusing it to make authenticated api calls. This doesn't provide much more protection on unix in the current default configuration, since if the attacking process has access to the unix socket, it usually has privileges to access whatever credentials were used by terraform in the first place. On windows however the connection is always over loopback, which provides a more access to the plugin. We're not relying on the safety of environment variables, as only the the public certificate is passed in via the environment. Neither client nor server expose the generated private key. The identities of the client and server are trusted via the environment and stdout, by the fact that one process execs the other. If the new process environment or subprocess stdout could be manipulated externally, the attacker likely has access to the secrets anyway. We can't revoke the cert immediately, as multiple connections can be made. There no point in revoking anything on a coarser level, as the private keys are generated on the fly for each session. I definitely intend to further test and review the compatibility here, which is partly why I made this opt-in in the WIP rather than completely automatic (which was also something I wanted to review, as it may be possible to make this completely transparent when no TLS config is supplied, similar to how we currently setup unverified TLS for grpc now). We're hopefully past the major Terraform alpha hurdles so I should be able to clean this up next week. |
This sounds like it's only one-way verification -- the plugin validates the private key of the incoming connection. While you're relying on the fact that the information was received via stdout (right?) for the plugin to feel like it is the correct public key to use for verification, this isn't mutual TLS in the normally-understood sense, where each side is validating the other's certificate via normal verification logic as part of the TLS handshake. Although this is likely sufficient for go-plugin purposes, I would avoid claiming the MTLS moniker for this scheme as I think it will cause a lot of confusion (as it did for me). If you want to take it a step further, you could instead do a multi-way handshake using Diffie-Hellman to generate a shared encryption key to then allow each side to generate a certificate and gain knowledge of each other's public keys (there are still some TOFU-like issues here but they are basically the same as in the current proposal). This would then allow for actual mutual TLS verification. There are some libraries in Vault's tree to do this; they're not standardized at this point but it could be a good use-case for doing so. |
Hi @jefferai, just to make sure we're on the same page, here's the current setup process from a high level:
As far as I can tell this still falls under what is commonly called mTLS, only with the certificate data shared via a side-channel vs a trusted intermediary, though I have no problem calling the option something else if it is confusing. Now I didn't think of doing a Diffie-Hellman exchange over this env+stdout "channel", which does sound nicer at first, however does it actually gain anything? You still have the same TOFU issue, and IIUC we would then have the added overhead of generating matching certs for the other side based on the shared private key (we still need to use the standard TLS machinery for gRPC), which seems less flexible in the long run. |
8b89768
to
65ac2d3
Compare
65ac2d3
to
db07072
Compare
This adds optional tls authentication to the client server protocol. One-time use keys are generated at startup, and the public certs are shared via the subprocess environment and stdout.
Dialing in grpc is asynchronous, and blocking blocks RPC calls not the dailing. Also, when blocking a Dial in grpc, even with a timeout, we lose the dial error that were causing us to block.
Verify that we can connect and call methods using both protocols in the test fixtures.
db07072
to
ff43c1c
Compare
Hi, I tried to better describe the overall usage in the AutoMTLS docs. If the option name or any terminology isn't quite right, I'm still open to changing it of course. I mentioned it in an earlier revision, but this ended up not really changing the handshake protocol, since there was an |
Hi @jbardin Thanks for the more detailed steps. I think this part:
Is what I was missing before. Previously when you'd said |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this with the recent changes. Looks like a great change, and fairly innocuous since its opt-in anyways.
Proof of concept automatic mTLS.
In Terraform we wanted to ensure that the plugin only accepts connections from the calling client. While we could setup the client authentication in Terraform itself (which is our primary concern), it would be possible to authenticate the plugin server with some modification to go-plugin itself.
What this proposal does is generate certs on the fly, passing the cert to the client in the exec environment, and passing the generated cert from the server back over the stdout protocol.
The code here hasn't been fully tested, as I just adapted the working Terraform code to compile here, but gives the rough idea to see if others think this is useful.