Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic mTLS #87

Merged
merged 4 commits into from Dec 11, 2018
Merged

automatic mTLS #87

merged 4 commits into from Dec 11, 2018

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Nov 7, 2018

Proof of concept automatic mTLS.

In Terraform we wanted to ensure that the plugin only accepts connections from the calling client. While we could setup the client authentication in Terraform itself (which is our primary concern), it would be possible to authenticate the plugin server with some modification to go-plugin itself.

What this proposal does is generate certs on the fly, passing the cert to the client in the exec environment, and passing the generated cert from the server back over the stdout protocol.

The code here hasn't been fully tested, as I just adapted the working Terraform code to compile here, but gives the rough idea to see if others think this is useful.

server.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mitchellh mitchellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of our discussion below, this is really well done. Thanks @briankassouf on chiming in too I'd love to make sure the Vault team has a look here.

First, thinking about security: This adds confidentiality as a core part of the security model for go-plugin, with the caveat that our threat model assumes that data passed via environment variables is safe. Env vars are generally only readable by the process owner and root. I think confidentiality actually doesn't really matter, because our connections are loopback (or Unix domain socket) by default which have the same threat model (process owner + root sniffable). Though, I suppose having encrypted traffic happen anywhere is always a Good Idea(tm). 😄

MTLS also provides identity, again with the same assumption that environment variables are safe. I can't think of a way to actively exploit this today, but in theory: 1.) plugin launches and listens on port 1234 2.) plugin crashes somehow 3.) attacker binds to port 1234 4.) attacker MITMs. I guess? Its a crazy race that would likely require starting/restarting the host application a bunch to make possible. In any case, MTLS (w/ the assumption env vars are safe in this PR) would protect against this since the host would recognize the listening port as tampered.

Just thinking out loud, might be worth adding "why" somewhere ^ and documenting it.

Next, I want to talk through the behavior of this setting:

1.) First, if the plugin explicitly has TLS configured, then it will use that. You do note in your docs that "TLSConfig must be unset" but I think its worth explaining why. This makes sense to me.

2.) Next, if the host fails to parse the cert, then we log-only the errors and still attempt to connect. In the scenario that a TLS cert is served, I believe the client Start should hard fail if the cert parsing fails.

3.) AutoMTLS relies on the plugin process being well behaved, listening to the env var, and serving a cert. If the plugin chooses to ignore the env var (or doesn't support it) the host attempts to connect insecurely. My question: should AutoMTLS require MTLS or should we introduce a RequireMTLS option to require it?

I don't want to overcomplicate anything but it seems that if we ignore these questions now, we'll create implicit "backwards compatibility" issues in the future if we choose to be a bit harder about it.

mtls.go Outdated Show resolved Hide resolved
client.go Outdated Show resolved Hide resolved
@jefferai
Copy link
Member

I would love a link to the relevant RFC before chiming in about the security model.

@mitchellh
Copy link
Contributor

Hey @jefferai, I think this particular change is small enough that it doesn't have a relevant RFC. Maybe we can hop in a Zoom and talk about it quickly and if we feel we need one we can do that?

@jefferai
Copy link
Member

I think some documentation of the security model, whether in an RFC or as part of the comments in the code, would be useful for making sure it's solid. For instance, as someone not well versed in go-plugin, I do not know whether more than one connection can be successfully made from a (forked, or otherwise external) process back to the originating process... but if so, the certificate should really be invalidated after the first connection.

@jbardin
Copy link
Member Author

jbardin commented Nov 17, 2018

Thanks @mitchellh for the thorough review. I didn't mean for you to spend so much time on this while it was still a WIP, I haven't even had time to complete the details and review it myself! I do need to update and extend the documentation around this if we accept the change into go-plugin.

To quickly touch on some of your points,

The primary threat Terraform is aiming to prevent is an unauthorized process connecting to a plugin, abusing it to make authenticated api calls. This doesn't provide much more protection on unix in the current default configuration, since if the attacking process has access to the unix socket, it usually has privileges to access whatever credentials were used by terraform in the first place. On windows however the connection is always over loopback, which provides a more access to the plugin.

We're not relying on the safety of environment variables, as only the the public certificate is passed in via the environment. Neither client nor server expose the generated private key. The identities of the client and server are trusted via the environment and stdout, by the fact that one process execs the other. If the new process environment or subprocess stdout could be manipulated externally, the attacker likely has access to the secrets anyway.

We can't revoke the cert immediately, as multiple connections can be made. There no point in revoking anything on a coarser level, as the private keys are generated on the fly for each session.

I definitely intend to further test and review the compatibility here, which is partly why I made this opt-in in the WIP rather than completely automatic (which was also something I wanted to review, as it may be possible to make this completely transparent when no TLS config is supplied, similar to how we currently setup unverified TLS for grpc now). We're hopefully past the major Terraform alpha hurdles so I should be able to clean this up next week.

@jefferai
Copy link
Member

We're not relying on the safety of environment variables, as only the the public certificate is passed in via the environment. Neither client nor server expose the generated private key.

This sounds like it's only one-way verification -- the plugin validates the private key of the incoming connection. While you're relying on the fact that the information was received via stdout (right?) for the plugin to feel like it is the correct public key to use for verification, this isn't mutual TLS in the normally-understood sense, where each side is validating the other's certificate via normal verification logic as part of the TLS handshake. Although this is likely sufficient for go-plugin purposes, I would avoid claiming the MTLS moniker for this scheme as I think it will cause a lot of confusion (as it did for me).

If you want to take it a step further, you could instead do a multi-way handshake using Diffie-Hellman to generate a shared encryption key to then allow each side to generate a certificate and gain knowledge of each other's public keys (there are still some TOFU-like issues here but they are basically the same as in the current proposal). This would then allow for actual mutual TLS verification. There are some libraries in Vault's tree to do this; they're not standardized at this point but it could be a good use-case for doing so.

@jbardin
Copy link
Member Author

jbardin commented Nov 19, 2018

Hi @jefferai, just to make sure we're on the same page, here's the current setup process from a high level:

  1. Client generates private key and CA cert, which will be used for client authentication
  2. Client starts the server with public cert in the process environment
  3. Server generates a private key and CA cert, which is used for the TLS connection. The public cert is written back to the client over stdout.
  4. Server uses the client cert for the ClientCA
  5. Client uses the server cert for the RootCA.

As far as I can tell this still falls under what is commonly called mTLS, only with the certificate data shared via a side-channel vs a trusted intermediary, though I have no problem calling the option something else if it is confusing.

Now I didn't think of doing a Diffie-Hellman exchange over this env+stdout "channel", which does sound nicer at first, however does it actually gain anything? You still have the same TOFU issue, and IIUC we would then have the added overhead of generating matching certs for the other side based on the shared private key (we still need to use the standard TLS machinery for gRPC), which seems less flexible in the long run.

@jbardin jbardin changed the title [WIP] automatic mTLS automatic mTLS Nov 28, 2018
This adds optional tls authentication to the client server protocol.
One-time use keys are generated at startup, and the public certs are
shared via the subprocess environment and stdout.
Dialing in grpc is asynchronous, and blocking blocks RPC calls not the
dailing. Also, when blocking a Dial in grpc, even with a timeout, we
lose the dial error that were causing us to block.
Verify that we can connect and call methods using both protocols in the
test fixtures.
@jbardin
Copy link
Member Author

jbardin commented Nov 30, 2018

Hi,

I tried to better describe the overall usage in the AutoMTLS docs. If the option name or any terminology isn't quite right, I'm still open to changing it of course.

I mentioned it in an earlier revision, but this ended up not really changing the handshake protocol, since there was an extra field which was never used, and has been replaced with the public cert data.

@jefferai
Copy link
Member

Hi @jbardin

Thanks for the more detailed steps. I think this part:

The public cert is written back to the client over stdout.

Is what I was missing before. Previously when you'd said as only the the public certificate is passed in via the environment I'd thought that meant that one public cert was being sent, via the environment, so you were only doing one-way verification.

Copy link
Contributor

@mitchellh mitchellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this with the recent changes. Looks like a great change, and fairly innocuous since its opt-in anyways.

@jbardin jbardin merged commit f4c3476 into master Dec 11, 2018
@jbardin jbardin deleted the jbardin/mtls branch December 11, 2018 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants