Add documentation & flexibility to the way Consul Connect Leaf Certificate CNs are calculated #8170

gugalnikov · 2020-06-23T08:30:15Z

Feature Description

Context & Background

Currently, the CN on a Consul Connect Leaf Certificate requested through the /agent/connect endpoint follows a quite unique and non-standard syntax which isn't present anywhere else in the system.

One can only find a rationale for this by looking closely at the code and some of the comments attached to it (as well as by tracing back PRs, commits, etc.): https://github.com/hashicorp/consul/blob/master/agent/connect/common_names.go

Problem with this approach is that the CN cannot really be inferred at runtime (for dynamic config purposes) which can be quite limiting, besides leaving the consumer with a proprietary / opinionated implementation as the only available option

*** It is understood that SNI (service discovery chain) is not being used here because of a 64-character constraint in the X509 spec

Suggested Features

First and foremost, these internals should be documented and explained in detail, probably here: https://www.consul.io/docs/connect/connect-internals
The documentation for connect API should also link you to this very relevant information
There should be an API endpoint under /agent/connect which allows you to easily and reliably get the calculated CN for a specific service name
Besides having this as a "default" behaviour, the API should also allow consumers to provide a specific CN (or maybe SAN?) to the CSR. This optional parameter can be validated using regular expressions, etc. to make sure it is constructed adequately and in compliance with what connect internals require
Another idea off the top of my head is to keep this functionality when internal CA is being used, but allow greater flexibility (and decoupling) on the CSR when Vault has been configured as the CA; then Vault can perform proper validations, enforce certification chains and deal with any additional security concerns.
If the "cluster id" (first 8 digits of trust domain UUID) is absolutely necessary for internal functionality, then this value should also be available through the API (and visible for consul template for example).

Use Case(s)

This is extremely relevant for any application which does native integration with Consul Connect and relies on the certificate's CN for establishing mutual TLS.

One very obvious example would be Java, where mutual TLS is delegated to keystores / truststores containing the certification chain. CN is a key element in these cases, and one cannot expect every application to be fully SPIFFE / SPIRE compliant just yet.

This is the specific use case which put us in this conundrum:

https://github.com/gugalnikov/presto-consul-connect

Presto is a Java-based distributed SQL application (with a complex internal architecture) which we are integrating natively with Consul Connect. A contribution was made recently to add pluggable certificate authenticators to the tool for this purpose:

https://prestosql.io/docs/current/develop/certificate-authenticator.html

The plugin works quite well and Consul Connect can secure both internal and external communication to the Presto coordinator and workers, but the aforementioned situation with Leaf Certificate Common Names is really limiting our flexibility when it comes to dynamic provisioning, scaling, etc.

banks · 2020-06-23T16:49:11Z

Thanks for the issue @gugalnikov!

To share some background - Connect treats the Common Name in certificates purely as a human-readable administrative tool. We encode all that information just to make it easy for humans who happen to be looking at certificates for example in Vault or an AWS CA listing to be able to work out what that certificate is and where it was generated. Cluster ID is included to ensure organisations with multiple distinct Consul clusters and a shared CA have clear indication in the name listed in CA interfaces/APIs.

As far as actual identity in the mesh, Connect completely ignores the Common Name and relies only on the URI SAN. We wouldn't want to make URI SANs user-configurable because it would break the security model of Connect - we are careful to only encode in those facts that can be trusted because they can be encoded into the ACL tokens given out to workloads in the first place. If we allowed users to add other fields or change the identity to something we can't verify via ACLs then Connects entire security model is bypassed - anyone can forge any certificate they like in effect.

It's not currently a goal of Connect to support traditional web-PKI verification of hostnames - we instead validate the certificate chain is trusted and then validate the SPIFFE URI against the intentions specified. It sounds like the crux of this request is to be able to also use Connect-generated certificates as more traditional Web PKI certificates that can be validated by traditional TLS clients against the hostname of the server presenting them etc.

That's potentially possible, but it would been careful thought for the same reason as above - that web-PKI hostname validation is only meaningful if the certificate hostnames requested in CSRs are actually validated in some way. If any workload can request any CN or DNS SAN to be added then there is zero security in using hostname verification and you might as well just ask clients to skip hostname verification.

One interesting exception here is that we already allow Consul Client Agents to generate certs via Connect when using auto_encrypt as part of that we allow the client agent to add an additional DNS SAN for localhost, 127.0.0.1 and any other optional hostnames that the node might be reachable on so that regular web hostname verification will work when that cert is presented as the server cert for a TLS connection to the agent's API. The threat model there is very different to our service workloads though - only operators able to configure the client agent can change those values and a malicious user can only "spoof" the identity of there local agent that they already have access to anyway so there is not so much risk.

So with that background in mind, what do you hope to achieve in security threat modelling terms by using Consul Connect certificates with external TLS systems that don't support SPIFFE? Would making the format configurable but still only allowing values that are enforceable via ACL rules be sufficient for your case? If not, how would Consul validate the Common Names chosen to prevent malicious users being able to forge whatever identity they want (at least as far as external clients that are using CN as an identity are concerned)?

gugalnikov · 2020-06-23T21:04:52Z

Hi, thanks for the very comprehensive answer @banks, I believe making the format configurable but still only allowing values that are enforceable via ACL rules would be quite sufficient for our use case. I understand that this cannot be completely open due to security concerns and it's also not our intention at all to break or bypass connect's trust model.

What we want to achieve is the following: in our point of view, connect native integration allows us to plug-in this kind of distributed application while delegating trust & service identity to a sophisticated and scalable security model; we're interested in applying this model not only to external communication but also internal, because even though these technologies are key for us (eg. Presto, Kafka, etc.), we don't really want to drop silos or mystery boxes (security-wise) into our system which is all based on a Consul Connect architecture. So, when you have to go through a Java keystore to establish mutual TLS, then the CN becomes relevant, which doesn't mean it has to be completely customizable but a bit more deterministic in nature. This is why we use the leaf certificate and CA roots to build the JKSs, and once we are able to have a proper SSL handshake, then we of course validate the SPIFFE URI and hit the authorization endpoint to check on intentions. Our implementation intends to follow the pattern described here: https://www.consul.io/docs/connect/native , and from the documentation page describing connect leaf certificate: "This certificate is used as a server certificate for accepting inbound connections and is also used as the client certificate for establishing outbound connections to other services" , so we figured this is a proper service identity certificate we can actually leverage for the purposes described above.

banks · 2020-06-25T12:54:58Z

Thanks Arturo, In order for Java's regular TLS stack to validate the connections, the CN (or a DNS SAN) would need to match the hostname being used to resolve the service. Can you share how that is working for you right now? Are you using Consul DNS or something else for addressing? If we added an additional DNS SAN for the Consul DNS name of the service would that work for you? I suspect most modern clients will support DNS SAN over the common name during validation? FYI the trust domain part is fixed for a cluster's lifetime and possible to query from https://www.consul.io/api-docs/connect/ca#list-ca-root-certificates (see TrustDomain field).

…

On Tue, Jun 23, 2020 at 10:05 PM Arturo Viveros ***@***.***> wrote: Hi, thanks for the very comprehensive answer @banks <https://github.com/banks>, I believe making the format configurable but still only allowing values that are enforceable via ACL rules would be quite sufficient for our use case. I understand that this cannot be completely open due to security concerns and it's also not our intention at all to break or bypass connect's trust model. What we want to achieve is the following: in our point of view, connect native integration allows us to plug-in this kind of distributed application while delegating trust & service identity to a sophisticated and scalable security model; we're interested in applying this model not only to external communication but also internal, because even though these technologies are key for us (eg. Presto, Kafka, etc.), we don't really want to drop silos or mystery boxes (security-wise) into our system which is all based on a Consul Connect architecture. So, when you have to go through a Java keystore to establish mutual TLS, then the CN becomes relevant, which doesn't mean it has to be completely customizable but a bit more deterministic in nature. This is why we use the leaf certificate and CA roots to build the JKSs, and once we are able to have a proper SSL handshake, then we of course validate the SPIFFE URI and hit the authorization endpoint to check on intentions. Our implementation intends to follow the pattern described here: https://www.consul.io/docs/connect/native , and from the documentation page describing connect leaf certificate: *"This certificate is used as a server certificate for accepting inbound connections and is also used as the client certificate for establishing outbound connections to other services"* , so we figured this is a proper service identity certificate we can actually leverage for the purposes described above. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8170 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA5QU5BKAROTSRGLDA6Y2TRYEKIFANCNFSM4OFNRK7Q> .

gugalnikov · 2020-06-25T13:23:52Z

Hi, yes, adding an additional DNS SAN for the Consul DNS name of the service would be brilliant; I actually have been taking a look at the updated standards for X.509 hostname verification (https://tools.ietf.org/search/rfc6125), and DNS SAN seems to be now the preferred option as opposed to CN.

We are using Consul DNS for addressing. We've also been able to curl the trust domain from https://www.consul.io/api-docs/connect/ca#list-ca-root-certificates as you pointed out, but the resulting code is not the neatest as it requires cutting down the 1st 8 characters, etc. and we also haven't been able to fetch this info. using consul template.

quinndiggity · 2020-09-07T05:04:40Z

Please, please, please enable a path to a deterministic cluster id; it would be ideal if one could take either of the following paths from an empty, unbootstrapped Consul cluster:

connect = {
  enabled = true
  ca_provider = "consul"
  ca_config   = {
    private_key        = "...CONSUL_CA_KEY_CONTENTS..."
    root_cert          = "...CONSUL_CA_CRT_CONTENTS..."    # <----- with a proper pathlen:0, etc to minimize risk

or:

connect = {
  enabled = true
  ca_provider = "vault"
  ca_config   = {
    address               = "https://...:8200/"
    token                 = "..."
    root_pki_path         = "pki_consul_root/"                    # <---- with this set ahead of time with restricted pathlen, and any other compliance necessaries, etc, via `vault write pki_consul_root/config/ca pem_bundle="@root.vault.bundle.pem"`
    intermediate_pki_path = "pki_consul_intermediate/" # <---- with this set ahead of time with restricted pathlen, and any other compliance necessaries, etc, via `vault write pki_consul_intermediate/config/ca pem_bundle="@intermediate.vault.bundle.pem"`

quinndiggity · 2020-09-07T05:11:05Z

Currently there are a lot of hoops to jump through in order to have a proper air gapped root CA (or in a HSM) with pathlen:2, with subsequent Vault as Consul's pki root with pathlen:1 and intermediate as pathlen:0

I still haven't found a way to generate client tls certs if using auto_encrypt and Consul's built-in CA (specifically when not set statically with a cert/key generated+signed outside Consul), which makes it impossible to have TLS enabled any and everywhere.

apollo13 · 2021-01-19T20:25:25Z

Hi @banks,

If we added an additional DNS SAN for the Consul DNS name of the service
would that work for you? I suspect most modern clients will support DNS SAN
over the common name during validation?

This would be massively helpful; currently people are working on supporting consul connect directly in traefik ( Gufran/traefik#1 ). Our current limitation is that GO does not provide any option to simply disabling hostname validation short of disabling any TLS validation (which includes CA validation). What we can easily do is provide a servername ala service or service.connect.consul for that matter.

The other option in GO would be to disable the builtin validation and roll your own, but rolling your own crypto validation code begs for problems (and is often not possible). What is possible though for most systems is supplying a hostname for SNI. This would be a great addition while reducing the complexity of native connect tls clients.

apollo13 · 2021-02-09T13:47:38Z

@banks: Since you reacted with a thumbs up… ;) The current connect integration to traefik is blocked (kinda) by traefik/traefik#7826 because the traefik team would prefer a way that does not involve overriding the cert validation process completely. The current suggested approach would not work for us since we still wouldn't know how to construct the SPIFFE host before getting the peer certificate (at least that is how consul upstream does it).

Did you get any chance to discuss internally if a DNS SAN would be okay and if yes is there any chance to get a timeline? I imagine that the required changes are rather minimal?

jsosulska added theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/enhancement Proposed improvement or new feature labels Jun 23, 2020

jsosulska added the theme/certificates Related to creating, distributing, and rotating certificates in Consul label Jul 2, 2020

quinndiggity mentioned this issue Sep 8, 2020

Envoy Proxy breaks when enabling Consul TLS #7926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation & flexibility to the way Consul Connect Leaf Certificate CNs are calculated #8170

Add documentation & flexibility to the way Consul Connect Leaf Certificate CNs are calculated #8170

gugalnikov commented Jun 23, 2020

banks commented Jun 23, 2020

gugalnikov commented Jun 23, 2020

banks commented Jun 25, 2020 via email

gugalnikov commented Jun 25, 2020

quinndiggity commented Sep 7, 2020

quinndiggity commented Sep 7, 2020

apollo13 commented Jan 19, 2021

apollo13 commented Feb 9, 2021

Add documentation & flexibility to the way Consul Connect Leaf Certificate CNs are calculated #8170

Add documentation & flexibility to the way Consul Connect Leaf Certificate CNs are calculated #8170

Comments

gugalnikov commented Jun 23, 2020

Feature Description

Context & Background

Suggested Features

Use Case(s)

banks commented Jun 23, 2020

gugalnikov commented Jun 23, 2020

banks commented Jun 25, 2020 via email

gugalnikov commented Jun 25, 2020

quinndiggity commented Sep 7, 2020

quinndiggity commented Sep 7, 2020

apollo13 commented Jan 19, 2021

apollo13 commented Feb 9, 2021