automatic mTLS #87

jbardin · 2018-11-07T23:36:58Z

Proof of concept automatic mTLS.

In Terraform we wanted to ensure that the plugin only accepts connections from the calling client. While we could setup the client authentication in Terraform itself (which is our primary concern), it would be possible to authenticate the plugin server with some modification to go-plugin itself.

What this proposal does is generate certs on the fly, passing the cert to the client in the exec environment, and passing the generated cert from the server back over the stdout protocol.

The code here hasn't been fully tested, as I just adapted the working Terraform code to compile here, but gives the rough idea to see if others think this is useful.

server.go

mitchellh

Regardless of our discussion below, this is really well done. Thanks @briankassouf on chiming in too I'd love to make sure the Vault team has a look here.

First, thinking about security: This adds confidentiality as a core part of the security model for go-plugin, with the caveat that our threat model assumes that data passed via environment variables is safe. Env vars are generally only readable by the process owner and root. I think confidentiality actually doesn't really matter, because our connections are loopback (or Unix domain socket) by default which have the same threat model (process owner + root sniffable). Though, I suppose having encrypted traffic happen anywhere is always a Good Idea(tm). 😄

MTLS also provides identity, again with the same assumption that environment variables are safe. I can't think of a way to actively exploit this today, but in theory: 1.) plugin launches and listens on port 1234 2.) plugin crashes somehow 3.) attacker binds to port 1234 4.) attacker MITMs. I guess? Its a crazy race that would likely require starting/restarting the host application a bunch to make possible. In any case, MTLS (w/ the assumption env vars are safe in this PR) would protect against this since the host would recognize the listening port as tampered.

Just thinking out loud, might be worth adding "why" somewhere ^ and documenting it.

Next, I want to talk through the behavior of this setting:

1.) First, if the plugin explicitly has TLS configured, then it will use that. You do note in your docs that "TLSConfig must be unset" but I think its worth explaining why. This makes sense to me.

2.) Next, if the host fails to parse the cert, then we log-only the errors and still attempt to connect. In the scenario that a TLS cert is served, I believe the client Start should hard fail if the cert parsing fails.

3.) AutoMTLS relies on the plugin process being well behaved, listening to the env var, and serving a cert. If the plugin chooses to ignore the env var (or doesn't support it) the host attempts to connect insecurely. My question: should AutoMTLS require MTLS or should we introduce a RequireMTLS option to require it?

I don't want to overcomplicate anything but it seems that if we ignore these questions now, we'll create implicit "backwards compatibility" issues in the future if we choose to be a bit harder about it.

mtls.go

client.go

jefferai · 2018-11-17T01:08:36Z

I would love a link to the relevant RFC before chiming in about the security model.

mitchellh · 2018-11-17T02:57:30Z

Hey @jefferai, I think this particular change is small enough that it doesn't have a relevant RFC. Maybe we can hop in a Zoom and talk about it quickly and if we feel we need one we can do that?

jefferai · 2018-11-17T04:20:55Z

I think some documentation of the security model, whether in an RFC or as part of the comments in the code, would be useful for making sure it's solid. For instance, as someone not well versed in go-plugin, I do not know whether more than one connection can be successfully made from a (forked, or otherwise external) process back to the originating process... but if so, the certificate should really be invalidated after the first connection.

jbardin · 2018-11-17T15:12:00Z

Thanks @mitchellh for the thorough review. I didn't mean for you to spend so much time on this while it was still a WIP, I haven't even had time to complete the details and review it myself! I do need to update and extend the documentation around this if we accept the change into go-plugin.

To quickly touch on some of your points,

The primary threat Terraform is aiming to prevent is an unauthorized process connecting to a plugin, abusing it to make authenticated api calls. This doesn't provide much more protection on unix in the current default configuration, since if the attacking process has access to the unix socket, it usually has privileges to access whatever credentials were used by terraform in the first place. On windows however the connection is always over loopback, which provides a more access to the plugin.

We're not relying on the safety of environment variables, as only the the public certificate is passed in via the environment. Neither client nor server expose the generated private key. The identities of the client and server are trusted via the environment and stdout, by the fact that one process execs the other. If the new process environment or subprocess stdout could be manipulated externally, the attacker likely has access to the secrets anyway.

We can't revoke the cert immediately, as multiple connections can be made. There no point in revoking anything on a coarser level, as the private keys are generated on the fly for each session.

I definitely intend to further test and review the compatibility here, which is partly why I made this opt-in in the WIP rather than completely automatic (which was also something I wanted to review, as it may be possible to make this completely transparent when no TLS config is supplied, similar to how we currently setup unverified TLS for grpc now). We're hopefully past the major Terraform alpha hurdles so I should be able to clean this up next week.

jefferai · 2018-11-17T16:35:59Z

We're not relying on the safety of environment variables, as only the the public certificate is passed in via the environment. Neither client nor server expose the generated private key.

This sounds like it's only one-way verification -- the plugin validates the private key of the incoming connection. While you're relying on the fact that the information was received via stdout (right?) for the plugin to feel like it is the correct public key to use for verification, this isn't mutual TLS in the normally-understood sense, where each side is validating the other's certificate via normal verification logic as part of the TLS handshake. Although this is likely sufficient for go-plugin purposes, I would avoid claiming the MTLS moniker for this scheme as I think it will cause a lot of confusion (as it did for me).

If you want to take it a step further, you could instead do a multi-way handshake using Diffie-Hellman to generate a shared encryption key to then allow each side to generate a certificate and gain knowledge of each other's public keys (there are still some TOFU-like issues here but they are basically the same as in the current proposal). This would then allow for actual mutual TLS verification. There are some libraries in Vault's tree to do this; they're not standardized at this point but it could be a good use-case for doing so.

jbardin · 2018-11-19T15:16:29Z

Hi @jefferai, just to make sure we're on the same page, here's the current setup process from a high level:

Client generates private key and CA cert, which will be used for client authentication
Client starts the server with public cert in the process environment
Server generates a private key and CA cert, which is used for the TLS connection. The public cert is written back to the client over stdout.
Server uses the client cert for the ClientCA
Client uses the server cert for the RootCA.

As far as I can tell this still falls under what is commonly called mTLS, only with the certificate data shared via a side-channel vs a trusted intermediary, though I have no problem calling the option something else if it is confusing.

Now I didn't think of doing a Diffie-Hellman exchange over this env+stdout "channel", which does sound nicer at first, however does it actually gain anything? You still have the same TOFU issue, and IIUC we would then have the added overhead of generating matching certs for the other side based on the shared private key (we still need to use the standard TLS machinery for gRPC), which seems less flexible in the long run.

This adds optional tls authentication to the client server protocol. One-time use keys are generated at startup, and the public certs are shared via the subprocess environment and stdout.

Dialing in grpc is asynchronous, and blocking blocks RPC calls not the dailing. Also, when blocking a Dial in grpc, even with a timeout, we lose the dial error that were causing us to block.

Verify that we can connect and call methods using both protocols in the test fixtures.

jbardin · 2018-11-30T15:40:48Z

Hi,

I tried to better describe the overall usage in the AutoMTLS docs. If the option name or any terminology isn't quite right, I'm still open to changing it of course.

I mentioned it in an earlier revision, but this ended up not really changing the handshake protocol, since there was an extra field which was never used, and has been replaced with the public cert data.

jefferai · 2018-11-30T21:50:34Z

Hi @jbardin

Thanks for the more detailed steps. I think this part:

The public cert is written back to the client over stdout.

Is what I was missing before. Previously when you'd said as only the the public certificate is passed in via the environment I'd thought that meant that one public cert was being sent, via the environment, so you were only doing one-way verification.

mitchellh

I'm good with this with the recent changes. Looks like a great change, and fairly innocuous since its opt-in anyways.

update the generated pb code

97278d5

jbardin requested review from mitchellh, calvn and briankassouf November 7, 2018 23:36

briankassouf reviewed Nov 8, 2018

View reviewed changes

server.go Outdated Show resolved Hide resolved

mitchellh reviewed Nov 17, 2018

View reviewed changes

mtls.go Outdated Show resolved Hide resolved

client.go Outdated Show resolved Hide resolved

jbardin force-pushed the jbardin/mtls branch from 8b89768 to 65ac2d3 Compare November 28, 2018 23:47

jbardin changed the title ~~[WIP] automatic mTLS~~ automatic mTLS Nov 28, 2018

jbardin force-pushed the jbardin/mtls branch from 65ac2d3 to db07072 Compare November 30, 2018 15:30

jbardin added 3 commits November 30, 2018 10:33

Add automatic mTLS negotiation

904034a

This adds optional tls authentication to the client server protocol. One-time use keys are generated at startup, and the public certs are shared via the subprocess environment and stdout.

remove WithBlock from grpc_client

7505a0d

Dialing in grpc is asynchronous, and blocking blocks RPC calls not the dailing. Also, when blocking a Dial in grpc, even with a timeout, we lose the dial error that were causing us to block.

add tests from mTLS in grpc and netrpc

ff43c1c

Verify that we can connect and call methods using both protocols in the test fixtures.

jbardin force-pushed the jbardin/mtls branch from db07072 to ff43c1c Compare November 30, 2018 15:33

mitchellh approved these changes Dec 11, 2018

View reviewed changes

jbardin merged commit f4c3476 into master Dec 11, 2018

jbardin deleted the jbardin/mtls branch December 11, 2018 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic mTLS #87

automatic mTLS #87

jbardin commented Nov 7, 2018

mitchellh left a comment

jefferai commented Nov 17, 2018

mitchellh commented Nov 17, 2018

jefferai commented Nov 17, 2018

jbardin commented Nov 17, 2018

jefferai commented Nov 17, 2018

jbardin commented Nov 19, 2018

jbardin commented Nov 30, 2018

jefferai commented Nov 30, 2018

mitchellh left a comment

automatic mTLS #87

automatic mTLS #87

Conversation

jbardin commented Nov 7, 2018

mitchellh left a comment

Choose a reason for hiding this comment

jefferai commented Nov 17, 2018

mitchellh commented Nov 17, 2018

jefferai commented Nov 17, 2018

jbardin commented Nov 17, 2018

jefferai commented Nov 17, 2018

jbardin commented Nov 19, 2018

jbardin commented Nov 30, 2018

jefferai commented Nov 30, 2018

mitchellh left a comment

Choose a reason for hiding this comment