Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_encrypt and verify_incoming #6811

Merged
merged 3 commits into from
Nov 27, 2019
Merged

Conversation

hanshasselberg
Copy link
Member

@hanshasselberg hanshasselberg commented Nov 19, 2019

After some discussion around auto_encrypt in combination with verify_incoming, we decided to do the following:

  • relax requirement to enable verify_incoming on the server, so that migration is possible
  • improve log output to explain better why verify_incoming on clients still need manual CA and cert setup even together with auto_encrypt
  • explain verify_incoming on Consul clients in docs
  • fix learn docs https://github.com/hashicorp/learn/pull/848

Fixes #6398, #6127.

@hanshasselberg hanshasselberg self-assigned this Nov 19, 2019
@hanshasselberg hanshasselberg added the theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication label Nov 19, 2019
@hanshasselberg
Copy link
Member Author

This PR will also serves as a collection of use cases for verify_incoming on consul clients. Let me first clarify what it means to set verify_incoming on clients:

Turning on verify_incoming on consul clients protects the HTTPS endpoint, by ensuring that the certificate that is presented by a 3rd party tool to the HTTPS endpoint was created by the CA that the consul client was setup with. If the UI is served, the same checks are performed.

We are wondering in which cases you think it is necessary to turn that on? We would like to get a fresh view on that and make sure it is actually being used before we continue to fully support that.

What are the use cases to turn on verify_incoming for clients?

@hanshasselberg hanshasselberg changed the base branch from master to release/1.6.x November 19, 2019 11:20
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM here. Did you already update the docs with that quoted part about what verify_incoming does on a client? That's really clear and should be in docs if it's not already.

Also you seemed to be asking for feedback about use-cases here - did you mean to post that in an issue instead?

return fmt.Errorf("VerifyIncoming set, and no CA certificate provided!")
errMsg := "VerifyIncoming set, and no CA certificate provided!"
if config.AutoEncryptTLS {
errMsg += autoEncryptMsg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@hanshasselberg
Copy link
Member Author

Thanks for the review @banks. I added the quoted part to the docs.

Also you seemed to be asking for feedback about use-cases here - did you mean to post that in an issue instead?

I put it here because there were multiple issues around this and I linked to this PR from them, so that there is a central place for it. It has been a couple of days now and nobody replied. Why do you think it would be better to put it into a separate issue?

@hanshasselberg hanshasselberg merged commit bbc4726 into release/1.6.x Nov 27, 2019
@hanshasselberg hanshasselberg deleted the auto_encrypt_fixes branch November 27, 2019 10:06
@otto-dev
Copy link

otto-dev commented Nov 29, 2019

What are the use cases to turn on verify_incoming for clients?

Is this with auto_encrypt enabled?

The primary security benefit of verify_incoming depends on the attackers inability to lay hands on a valid certificate. If both verify_incoming and auto_encrypt are enabled, it's like encrypting a file, and storing the password right next to it, because the server acts as dispenser of such certificates.

Secondly, as I understand it (?), verify_incoming ensures that all communication is encrypted, which prevents MITM and eavesdropping of sensitive information.

Regarding the first point: What is the situation if gossip encryption is enabled? Will this lock an attacker out from obtaining a certificate via auto_encrypt?

If so, then verify_incoming (even with auto_encrypt, as long as gossip is encrypted) is an easy and reliable way to prevent (UI) access by untrusted parties in the same network, while allowing access for people that have a certificate. <- use case (This is how I use it)

ACL can be used for a similar purpose, but unlike TLS it is on the application level logic, and hence more vulnerable to security issues if there is a bug, or a timing attack, in the implementation. I trust that it's impossible to forge a valid certificate except if you obtain one. I don't trust ACL 100% on its own. First defense is TLS, second, more fine tuned is ACL.

@hanshasselberg
Copy link
Member Author

If both are enabled, it's like encrypting a file, and storing the password right next to it.

can you explain what you mean by that?

What is the situation if gossip encryption is enabled?

Yes, the RPC endpoint works independently from gossip encryption.

If not, then verify_incoming is an easy and reliable way to prevent (UI) access by third parties in the same network, while allowing access for people that have a certificate.

Yes, thats true. Securing the HTTPS endpoint is the use case. I was wondering though if it wouldn't be preferable to setup a proxy which terminates TLS and provides auth for the consul UI.

My thinking is that ideally by reducing the number of ways to fiddle with Consul security, the more secure it becomes.

@otto-dev
Copy link

otto-dev commented Dec 3, 2019

My argumentation above got lost in transmission (sorry, my bad), so I hope to be able to clarify.

If both are enabled, it's like encrypting a file, and storing the password right next to it.

can you explain what you mean by that?

Think of the certificate as a password, and the ability to access the cluster as the encrypted file. The analogy is, you need a valid certificate (password) to operate on the cluster (decrypt the file). auto_encrypt gives an attacker a route to obtain certificates (the password) as far as I can see. With that in mind, I hope the original paragraph above, and the point I'm making should be clear (otherwise let me know).

What is the situation if gossip encryption is enabled?

Yes, the RPC endpoint works independently from gossip encryption.

Is answer in respect to the question "Will this lock an attacker out from obtaining a certificate via auto_encrypt"? For this question to make sense, I think it's necessary to first understand the point I was making in the first paragraph.

To sum up my previous argument: On it's own, verify_incoming is an effective and reliable first defense to prevent unauthorized access (I consider it indispensable). On a different note, the security of verify_incoming is compromised by auto_encrypt because it gives an attacker a route to obtain certificates. Can this be prevented by enabling gossip encryption, thereby restoring the security benefit of verify_incoming ?

It's not clear to me what the relationship is in your considerations between auto_encrypt and verify_incoming, so I'm addressing all cases, and hence my first question in previous post.

PS:

My thinking is that ideally by reducing the number of ways to fiddle with Consul security, the more secure it becomes.

Only if the defaults are secure, meaning verify_incoming is enabled, and auto_encrypt disabled.

@banks
Copy link
Member

banks commented Dec 3, 2019

@otto-dev thanks for explaining!

I understand your point - you are right that auto_encrypt in some sense bypasses the “additional security” you get from mutual TLS as it reduces to anyone with a valid ACL token being able to get a certificate too. This is why we propose here that it’s not really useful to have mutual client auth for HTTPS (verify_incoming) and auto_encrypt together.

I think you get this but to be clear for others:

The primary security benefit of verify_incoming depends on the attackers inability to lay hands on a valid certificate.

Correct. With auto_encrypt you are intentionally reducing the security to only require a valid ACL token and still benefit from having a client certificate. That client certificate no longer increases trust beyond the ACL token alone, but there are other reasons you might want that certificate as the arbiter of a node’s identity though, for example they are easier to rotate and allow the client to serve an encrypted API without having to manage that certificate manually.

We also have future plans that will allow that certificate identity to be obtained using something even more secure than an ACL token which will increase the security of bootstrapping as well as convenience (#6457). We also want to open the door for a future where agent gossip trust/encryption can be bootstrapped via mTLS rather than a symmetric pre-shared key so having client agents able to automatically get TLS certificates from some other secret that’s easier to distribute/provision is a prerequisite for some of these improvements.

Secondly, as I understand it (?), verify_incoming ensures that all communication is encrypted, which prevents MITM and eavesdropping of sensitive information.

That’s true - verify_incoming does enforce TLS on both servers and clients. But there are other ways to do that too. To force only client -> server RPC to be encrypted you can use verify_incoming_rpc on servers. To disallow non-encrypted API traffic you can choose not to listed on http port at all (setting it to -1) so the only option is encrypted access, but still without requiring mutual auth.

Can this be prevented by enabling gossip encryption, thereby restoring the security benefit of verify_incoming ?

No Gossip encryption can’t really protect against anything related to ACL or auto encrypt, at least not fully - they are separate systems. While gossip encryption could prevent you starting a regular consul agent on an unauthorised node, it can’t stop an attacker from hitting Consul server API endpoints directly and so accessing anything they need, registering services directly in the catalog etc.

Only if the defaults are secure, meaning verify_incoming is enabled, and auto_encrypt disabled.

Auto encrypt is disabled by default. We can’t really default verify_incoming to true since then agents would fail to start without manual TLS configured.

In general I think @i0rek was trying to say that “we don’t want to add support for both auto_encrypt AND verify_incoming together as it isn’t a useful combination so it’s simpler not to support it and simpler is better” which I think you are agreeing with in your assessment that using both is not a security improvement.

It’s not clear to me what the relationship is in your considerations between auto_encrypt and verify_incoming, so I’m addressing all cases, and hence my first question in previous post.

Yeah it’s certainly taken some discussion to tease out what makes sense and doesn’t here and we could certainly make this clearer in docs. My take is that the following are all valid/useful configurations with different threat models and degrees of complexity in configuring:

  • auto_encrypt disabled, manual TLS configured for consul clients and servers, and all downstream consumers of consul API and verify_incoming enabled.
    • this is the belt-and-braces setup you prefer and it certainly has strongest assurance of trust.
    • It’s also the most complicated to setup and maintain especially if you rotate those certificates regularly.
  • manual server TLS (that’s the most important part in terms of overall cluster security) with verify_outgoing set on all clients and servers and validate_server_hostname set which is critical for server security.
    • this is the simplest thing that ensures encrypted connections for all client to server traffic without needing client certificates. Clients all need the CA certificate to validate server identity though.
    • the subtlety in this mode is that a malicious or misconfigured client is stall able to connect to servers without TLS. How significant a problem this is depends on your threat model and confidence in the efficacy of your config management/auditing etc. The worst that could happen here would be that a compromised node could force the local agent to connect without TLS and leak it’s credentials to the network, but note that those were credentials that the attacker already had access to on the compromised host and so could have exfiltrated or leaked any other way they choose even if servers didn’t accept TLS.
  • manual server TLS as above but with auto_encrypt enabled and verify_incoming_rpc = true.
    • now all client to server RPC traffic MUST be TLS and must have a valid certificate which is distributed automatically based on client’s having trusted ACL tokens distributed securely out-of-band.
    • opens the way for TLS-based identity throughout the cluster without the complexity of managing TLS certificate for every node manually.
    • TLS certs can be rotated automatically
    • future plans to allow more secure bootstrapping of that identity that doesn’t rely on an ACL and can be locked down per-machine more easily.
    • clients can now expose their API locally using HTTPS (without client authentication) which means that multiple workloads on the same machine can’t sniff each other’s ACL tokens. this is important and necessary part of the threat model when working in scheduler environments where multiple untrusted workloads share an agent on the local host.

@otto-dev
Copy link

otto-dev commented Dec 4, 2019

Thanks @banks, this addresses my concerns correctly and clears up the areas of that I was unsure about. I also fully agree with you assessment.

In regard to the second point in the list of configurations, I would like to add something:

The worst that could happen here would be that a compromised node could force the local agent to connect without TLS and leak it’s credentials to the network, but note that those were credentials that the attacker already had access to

If the node is fully compromised, not even manual TLS would protect any further, because then the attacker could do whatever he wants, including accessing the node's certificate.

The real threat lies in the fact that ACL is the only "defense mechanism" left against attackers, and I think ACL should not be trusted on it's own.

ACL is very application-level. Ask whoever implemented the token validation: "Did you secure the token validation against timing attacks (and side-channel attacks)?"

[If you haven't heard about those, I recommend to check it out, they are real.]

Here is an example of how a timing attack looks like.

In 2003, Boneh and Brumley demonstrated a practical network-based timing attack on SSL-enabled web servers, based on a different vulnerability having to do with the use of RSA with Chinese remainder theorem optimizations. The actual network distance was small in their experiments, but the attack successfully recovered a server private key in a matter of hours.

I expect the ACL to be vulnerable to at least timing attacks, because token based authentication is notorious for this. (With all respect to the devs)

I remember an article where a professional's job was to test token-based authentication implementations, I think OAuth or similar, and he found that all of 20+ implementations were vulnerable to timing attacks in the way they compared the tokens. It leaves me thinking if I were really set on breaking into someones cluster with auto_encrypt enabled, there is probably a good chance that I could, especially if I can set up a node in the same shared datacenter.

So my problem is not if it's possible to obtain a token, my problem is that I doubt that ACL alone can be trusted 100%. Not only because of timing-attacks, but also because it's as much subject to bugs as the rest of the application code. Timing attacks are just one such example, high on the list. (again, no disrespect to the devs - it's token based auth that's the basis for my skepticism)

@banks
Copy link
Member

banks commented Dec 4, 2019 via email

@otto-dev
Copy link

otto-dev commented Dec 4, 2019

We have a security team who actively researches these things

All right, maybe it's reasonable to trust it then (to a large extend). It's only my default position not to.

Finally, I think you understand all my concerns and have the same or better understanding of them, plus you have knowledge of the development internals, so I think I have nothing more to add. I agree with your list of "valid/useful configurations" as far as I looked into it.

When I jumped in I wasn't sure what the agenda is, and I mainly wanted to make sure that I can continue to rely on verify_incoming as a first line of defense.

@hanshasselberg
Copy link
Member Author

Thank you for jumping in @otto-dev!

hanshasselberg added a commit that referenced this pull request Dec 6, 2019
* relax requirements for auto_encrypt on server
* better error message when auto_encrypt and verify_incoming on
* docs: explain verify_incoming on Consul clients.
hanshasselberg added a commit that referenced this pull request Dec 6, 2019
* relax requirements for auto_encrypt on server
* better error message when auto_encrypt and verify_incoming on
* docs: explain verify_incoming on Consul clients.
@ghost
Copy link

ghost commented Jan 25, 2020

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.

If you are still experiencing problems, or still have questions, feel free to open a new one 👍.

@ghost ghost locked and limited conversation to collaborators Jan 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants