auto_encrypt and verify_incoming #6811

hanshasselberg · 2019-11-19T10:29:14Z

After some discussion around auto_encrypt in combination with verify_incoming, we decided to do the following:

relax requirement to enable verify_incoming on the server, so that migration is possible
improve log output to explain better why verify_incoming on clients still need manual CA and cert setup even together with auto_encrypt
explain verify_incoming on Consul clients in docs
fix learn docs https://github.com/hashicorp/learn/pull/848

hanshasselberg · 2019-11-19T11:09:01Z

This PR will also serves as a collection of use cases for verify_incoming on consul clients. Let me first clarify what it means to set verify_incoming on clients:

Turning on verify_incoming on consul clients protects the HTTPS endpoint, by ensuring that the certificate that is presented by a 3rd party tool to the HTTPS endpoint was created by the CA that the consul client was setup with. If the UI is served, the same checks are performed.

We are wondering in which cases you think it is necessary to turn that on? We would like to get a fresh view on that and make sure it is actually being used before we continue to fully support that.

What are the use cases to turn on `verify_incoming` for clients?

banks

Changes LGTM here. Did you already update the docs with that quoted part about what verify_incoming does on a client? That's really clear and should be in docs if it's not already.

Also you seemed to be asking for feedback about use-cases here - did you mean to post that in an issue instead?

banks · 2019-11-26T15:21:40Z

tlsutil/config.go

-			return fmt.Errorf("VerifyIncoming set, and no CA certificate provided!")
+			errMsg := "VerifyIncoming set, and no CA certificate provided!"
+			if config.AutoEncryptTLS {
+				errMsg += autoEncryptMsg


hanshasselberg · 2019-11-27T09:37:01Z

Thanks for the review @banks. I added the quoted part to the docs.

Also you seemed to be asking for feedback about use-cases here - did you mean to post that in an issue instead?

I put it here because there were multiple issues around this and I linked to this PR from them, so that there is a central place for it. It has been a couple of days now and nobody replied. Why do you think it would be better to put it into a separate issue?

otto-dev · 2019-11-29T00:22:35Z

What are the use cases to turn on verify_incoming for clients?

Is this with auto_encrypt enabled?

The primary security benefit of verify_incoming depends on the attackers inability to lay hands on a valid certificate. If both verify_incoming and auto_encrypt are enabled, it's like encrypting a file, and storing the password right next to it, because the server acts as dispenser of such certificates.

Secondly, as I understand it (?), verify_incoming ensures that all communication is encrypted, which prevents MITM and eavesdropping of sensitive information.

Regarding the first point: What is the situation if gossip encryption is enabled? Will this lock an attacker out from obtaining a certificate via auto_encrypt?

If so, then verify_incoming (even with auto_encrypt, as long as gossip is encrypted) is an easy and reliable way to prevent (UI) access by untrusted parties in the same network, while allowing access for people that have a certificate. <- use case (This is how I use it)

ACL can be used for a similar purpose, but unlike TLS it is on the application level logic, and hence more vulnerable to security issues if there is a bug, or a timing attack, in the implementation. I trust that it's impossible to forge a valid certificate except if you obtain one. I don't trust ACL 100% on its own. First defense is TLS, second, more fine tuned is ACL.

hanshasselberg · 2019-12-02T21:26:53Z

If both are enabled, it's like encrypting a file, and storing the password right next to it.

can you explain what you mean by that?

What is the situation if gossip encryption is enabled?

Yes, the RPC endpoint works independently from gossip encryption.

If not, then verify_incoming is an easy and reliable way to prevent (UI) access by third parties in the same network, while allowing access for people that have a certificate.

Yes, thats true. Securing the HTTPS endpoint is the use case. I was wondering though if it wouldn't be preferable to setup a proxy which terminates TLS and provides auth for the consul UI.

My thinking is that ideally by reducing the number of ways to fiddle with Consul security, the more secure it becomes.

otto-dev · 2019-12-03T00:10:58Z

My argumentation above got lost in transmission (sorry, my bad), so I hope to be able to clarify.

If both are enabled, it's like encrypting a file, and storing the password right next to it.

can you explain what you mean by that?

Think of the certificate as a password, and the ability to access the cluster as the encrypted file. The analogy is, you need a valid certificate (password) to operate on the cluster (decrypt the file). auto_encrypt gives an attacker a route to obtain certificates (the password) as far as I can see. With that in mind, I hope the original paragraph above, and the point I'm making should be clear (otherwise let me know).

What is the situation if gossip encryption is enabled?

Yes, the RPC endpoint works independently from gossip encryption.

Is answer in respect to the question "Will this lock an attacker out from obtaining a certificate via auto_encrypt"? For this question to make sense, I think it's necessary to first understand the point I was making in the first paragraph.

To sum up my previous argument: On it's own, verify_incoming is an effective and reliable first defense to prevent unauthorized access (I consider it indispensable). On a different note, the security of verify_incoming is compromised by auto_encrypt because it gives an attacker a route to obtain certificates. Can this be prevented by enabling gossip encryption, thereby restoring the security benefit of verify_incoming ?

It's not clear to me what the relationship is in your considerations between auto_encrypt and verify_incoming, so I'm addressing all cases, and hence my first question in previous post.

PS:

My thinking is that ideally by reducing the number of ways to fiddle with Consul security, the more secure it becomes.

Only if the defaults are secure, meaning verify_incoming is enabled, and auto_encrypt disabled.

banks · 2019-12-03T13:05:33Z

@otto-dev thanks for explaining!

I understand your point - you are right that auto_encrypt in some sense bypasses the “additional security” you get from mutual TLS as it reduces to anyone with a valid ACL token being able to get a certificate too. This is why we propose here that it’s not really useful to have mutual client auth for HTTPS (verify_incoming) and auto_encrypt together.

I think you get this but to be clear for others:

The primary security benefit of verify_incoming depends on the attackers inability to lay hands on a valid certificate.

Correct. With auto_encrypt you are intentionally reducing the security to only require a valid ACL token and still benefit from having a client certificate. That client certificate no longer increases trust beyond the ACL token alone, but there are other reasons you might want that certificate as the arbiter of a node’s identity though, for example they are easier to rotate and allow the client to serve an encrypted API without having to manage that certificate manually.

We also have future plans that will allow that certificate identity to be obtained using something even more secure than an ACL token which will increase the security of bootstrapping as well as convenience (#6457). We also want to open the door for a future where agent gossip trust/encryption can be bootstrapped via mTLS rather than a symmetric pre-shared key so having client agents able to automatically get TLS certificates from some other secret that’s easier to distribute/provision is a prerequisite for some of these improvements.

Secondly, as I understand it (?), verify_incoming ensures that all communication is encrypted, which prevents MITM and eavesdropping of sensitive information.

That’s true - verify_incoming does enforce TLS on both servers and clients. But there are other ways to do that too. To force only client -> server RPC to be encrypted you can use verify_incoming_rpc on servers. To disallow non-encrypted API traffic you can choose not to listed on http port at all (setting it to -1) so the only option is encrypted access, but still without requiring mutual auth.

Can this be prevented by enabling gossip encryption, thereby restoring the security benefit of verify_incoming ?

No Gossip encryption can’t really protect against anything related to ACL or auto encrypt, at least not fully - they are separate systems. While gossip encryption could prevent you starting a regular consul agent on an unauthorised node, it can’t stop an attacker from hitting Consul server API endpoints directly and so accessing anything they need, registering services directly in the catalog etc.

Only if the defaults are secure, meaning verify_incoming is enabled, and auto_encrypt disabled.

Auto encrypt is disabled by default. We can’t really default verify_incoming to true since then agents would fail to start without manual TLS configured.

In general I think @i0rek was trying to say that “we don’t want to add support for both auto_encrypt AND verify_incoming together as it isn’t a useful combination so it’s simpler not to support it and simpler is better” which I think you are agreeing with in your assessment that using both is not a security improvement.

It’s not clear to me what the relationship is in your considerations between auto_encrypt and verify_incoming, so I’m addressing all cases, and hence my first question in previous post.

Yeah it’s certainly taken some discussion to tease out what makes sense and doesn’t here and we could certainly make this clearer in docs. My take is that the following are all valid/useful configurations with different threat models and degrees of complexity in configuring:

auto_encrypt disabled, manual TLS configured for consul clients and servers, and all downstream consumers of consul API and verify_incoming enabled.
- this is the belt-and-braces setup you prefer and it certainly has strongest assurance of trust.
- It’s also the most complicated to setup and maintain especially if you rotate those certificates regularly.
manual server TLS (that’s the most important part in terms of overall cluster security) with verify_outgoing set on all clients and servers and validate_server_hostname set which is critical for server security.
- this is the simplest thing that ensures encrypted connections for all client to server traffic without needing client certificates. Clients all need the CA certificate to validate server identity though.
- the subtlety in this mode is that a malicious or misconfigured client is stall able to connect to servers without TLS. How significant a problem this is depends on your threat model and confidence in the efficacy of your config management/auditing etc. The worst that could happen here would be that a compromised node could force the local agent to connect without TLS and leak it’s credentials to the network, but note that those were credentials that the attacker already had access to on the compromised host and so could have exfiltrated or leaked any other way they choose even if servers didn’t accept TLS.
manual server TLS as above but with auto_encrypt enabled and verify_incoming_rpc = true.
- now all client to server RPC traffic MUST be TLS and must have a valid certificate which is distributed automatically based on client’s having trusted ACL tokens distributed securely out-of-band.
- opens the way for TLS-based identity throughout the cluster without the complexity of managing TLS certificate for every node manually.
- TLS certs can be rotated automatically
- future plans to allow more secure bootstrapping of that identity that doesn’t rely on an ACL and can be locked down per-machine more easily.
- clients can now expose their API locally using HTTPS (without client authentication) which means that multiple workloads on the same machine can’t sniff each other’s ACL tokens. this is important and necessary part of the threat model when working in scheduler environments where multiple untrusted workloads share an agent on the local host.

otto-dev · 2019-12-04T02:06:29Z

Thanks @banks, this addresses my concerns correctly and clears up the areas of that I was unsure about. I also fully agree with you assessment.

In regard to the second point in the list of configurations, I would like to add something:

The worst that could happen here would be that a compromised node could force the local agent to connect without TLS and leak it’s credentials to the network, but note that those were credentials that the attacker already had access to

If the node is fully compromised, not even manual TLS would protect any further, because then the attacker could do whatever he wants, including accessing the node's certificate.

The real threat lies in the fact that ACL is the only "defense mechanism" left against attackers, and I think ACL should not be trusted on it's own.

ACL is very application-level. Ask whoever implemented the token validation: "Did you secure the token validation against timing attacks (and side-channel attacks)?"

[If you haven't heard about those, I recommend to check it out, they are real.]

Here is an example of how a timing attack looks like.

In 2003, Boneh and Brumley demonstrated a practical network-based timing attack on SSL-enabled web servers, based on a different vulnerability having to do with the use of RSA with Chinese remainder theorem optimizations. The actual network distance was small in their experiments, but the attack successfully recovered a server private key in a matter of hours.

https://en.wikipedia.org/wiki/Timing_attack#Examples

I expect the ACL to be vulnerable to at least timing attacks, because token based authentication is notorious for this. (With all respect to the devs)

I remember an article where a professional's job was to test token-based authentication implementations, I think OAuth or similar, and he found that all of 20+ implementations were vulnerable to timing attacks in the way they compared the tokens. It leaves me thinking if I were really set on breaking into someones cluster with auto_encrypt enabled, there is probably a good chance that I could, especially if I can set up a node in the same shared datacenter.

So my problem is not if it's possible to obtain a token, my problem is that I doubt that ACL alone can be trusted 100%. Not only because of timing-attacks, but also because it's as much subject to bugs as the rest of the application code. Timing attacks are just one such example, high on the list. (again, no disrespect to the devs - it's token based auth that's the basis for my skepticism)

banks · 2019-12-04T12:01:04Z

For sure timing and side channel attacks are a concern for all token-based systems. In Consul's case it goes beyond subtleties like constant time comparison functions since ACL token resolution may involve RPCs (possibly even over the WAN to another DC), policy lookups and caching which all make the timing story way more nuanced than just token comparison. We have a security team who actively researches these things so I of course can't claim it's perfectly secure as nothing is but we do at some level have to trust it! If we don't 100% trust ACLs then a lot of our threat model disintegrates - even if you have manual agent TLS, if ACL is vulnerable then an attacker can get access to anything they want form any compromised host in your DC anyway since every host has a connected agent on and will happily relay the attackers payloads with forged ACL etc. to the servers. So I'm not disagreeing as such - manual TLS still offers some additional defense *for direct server RPC access, *but it's very important to consider the whole system threat model since in practice if every host has a manually assigned TLS cert on anyway then the attacker already has all they need to defeat that defense and you are only relying on ACL for any authentication that matters still.

…

On Wed, Dec 4, 2019 at 2:06 AM otto-dev ***@***.***> wrote: Thanks @banks <https://github.com/banks>, this addresses my concerns correctly and clears up the areas of that I was unsure about. I also fully agree with you assessment. In regard to the second point in the list of configurations, I would like to add something: The worst that could happen here would be that a compromised node could force the local agent to connect without TLS and leak it’s credentials to the network, but note that those were credentials that the attacker already had access to If the node is fully compromised, not even manual TLS would protect any further, because then the attacker could access the node's certificate. The real threat lies in the fact that ACL is the only "defense mechanism" left against attackers, and I don't think ACL should be trusted on it's own. ACL is very application-level. Ask whoever implemented the token validation: Did you secure the token validation against timing attacks (and side-channel attacks)? [If you haven't heard about those, I recommend to check it out, they are real.] Here is an example of how a timing attack looks like. In 2003, Boneh and Brumley demonstrated a practical network-based timing attack on SSL-enabled web servers, based on a different vulnerability having to do with the use of RSA with Chinese remainder theorem optimizations. The actual network distance was small in their experiments, but the attack successfully recovered a server private key in a matter of hours. - https://en.wikipedia.org/wiki/Timing_attack#Examples I *expect* the ACL to be vulnerable to at least timing attacks, because token based authentication is *notorious* for this. (With all disrespect to the devs) I remember an article where a professional's job was to test token-based authentication implementations, I think OAuth or similar, and he found that all of 20+ implementations were vulnerable to timing attacks in the way they compared the tokens. It leaves me thinking if I were really set on breaking into someones cluster with auto_encrypt enabled, I there is a good chance I could, especially if I can set up a node in the same shared datacenter. So my problem is not if it's possible to obtain a token, my problem is that I doubt that ACL alone can be trusted 100%. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6811?email_source=notifications&email_token=AAA5QUZYEGZ7FIUUY4UBTC3QW4GCPA5CNFSM4JPBFTOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF3O3KQ#issuecomment-561442218>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA5QU3YLUEBAEGIF6WWBKDQW4GCPANCNFSM4JPBFTOA> .

otto-dev · 2019-12-04T22:33:55Z

We have a security team who actively researches these things

All right, maybe it's reasonable to trust it then (to a large extend). It's only my default position not to.

Finally, I think you understand all my concerns and have the same or better understanding of them, plus you have knowledge of the development internals, so I think I have nothing more to add. I agree with your list of "valid/useful configurations" as far as I looked into it.

When I jumped in I wasn't sure what the agenda is, and I mainly wanted to make sure that I can continue to rely on verify_incoming as a first line of defense.

hanshasselberg · 2019-12-05T09:13:27Z

Thank you for jumping in @otto-dev!

* relax requirements for auto_encrypt on server * better error message when auto_encrypt and verify_incoming on * docs: explain verify_incoming on Consul clients.

ghost · 2020-01-25T02:02:09Z

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.

If you are still experiencing problems, or still have questions, feel free to open a new one 👍.

hanshasselberg self-assigned this Nov 19, 2019

hanshasselberg added the theme/tls Using TLS (Transport Layer Security) or mTLS (mutual TLS) to secure communication label Nov 19, 2019

hanshasselberg changed the base branch from master to release/1.6.x November 19, 2019 11:20

hanshasselberg added 2 commits November 19, 2019 12:22

relax requirements for auto_encrypt on server

ece875d

better error message

d0cb36f

hanshasselberg force-pushed the auto_encrypt_fixes branch from a06410c to d0cb36f Compare November 19, 2019 11:23

This was referenced Nov 19, 2019

Client agent not starting when auto_encrypt.tls enabled #6398

Closed

documentation: auto encrypt on an existing Consul datacenter not possible? #6127

Closed

hanshasselberg requested a review from a team November 21, 2019 21:58

banks approved these changes Nov 26, 2019

View reviewed changes

docs: explain verify_incoming on Consul clients.

3403a9f

hanshasselberg merged commit bbc4726 into release/1.6.x Nov 27, 2019

hanshasselberg deleted the auto_encrypt_fixes branch November 27, 2019 10:06

kjdelisle mentioned this pull request Nov 28, 2019

connect: allow use of static certificates #6848

Closed

hanshasselberg added a commit that referenced this pull request Dec 6, 2019

tls: auto_encrypt and verify_incoming (#6811)

0beaba8

* relax requirements for auto_encrypt on server * better error message when auto_encrypt and verify_incoming on * docs: explain verify_incoming on Consul clients.

ghost locked and limited conversation to collaborators Jan 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto_encrypt and verify_incoming #6811

auto_encrypt and verify_incoming #6811

hanshasselberg commented Nov 19, 2019 •

edited

Loading

hanshasselberg commented Nov 19, 2019

banks left a comment

banks Nov 26, 2019

hanshasselberg commented Nov 27, 2019

otto-dev commented Nov 29, 2019 •

edited

Loading

hanshasselberg commented Dec 2, 2019

otto-dev commented Dec 3, 2019 •

edited

Loading

banks commented Dec 3, 2019

otto-dev commented Dec 4, 2019 •

edited

Loading

banks commented Dec 4, 2019 via email

otto-dev commented Dec 4, 2019

hanshasselberg commented Dec 5, 2019

ghost commented Jan 25, 2020

auto_encrypt and verify_incoming #6811

auto_encrypt and verify_incoming #6811

Conversation

hanshasselberg commented Nov 19, 2019 • edited Loading

hanshasselberg commented Nov 19, 2019

What are the use cases to turn on verify_incoming for clients?

banks left a comment

Choose a reason for hiding this comment

banks Nov 26, 2019

Choose a reason for hiding this comment

hanshasselberg commented Nov 27, 2019

otto-dev commented Nov 29, 2019 • edited Loading

hanshasselberg commented Dec 2, 2019

otto-dev commented Dec 3, 2019 • edited Loading

banks commented Dec 3, 2019

otto-dev commented Dec 4, 2019 • edited Loading

banks commented Dec 4, 2019 via email

otto-dev commented Dec 4, 2019

hanshasselberg commented Dec 5, 2019

ghost commented Jan 25, 2020

hanshasselberg commented Nov 19, 2019 •

edited

Loading

What are the use cases to turn on `verify_incoming` for clients?

otto-dev commented Nov 29, 2019 •

edited

Loading

otto-dev commented Dec 3, 2019 •

edited

Loading

otto-dev commented Dec 4, 2019 •

edited

Loading