Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation & flexibility to the way Consul Connect Leaf Certificate CNs are calculated #8170

Open
gugalnikov opened this issue Jun 23, 2020 · 8 comments
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/enhancement Proposed improvement or new feature

Comments

@gugalnikov
Copy link

Feature Description

Context & Background

Currently, the CN on a Consul Connect Leaf Certificate requested through the /agent/connect endpoint follows a quite unique and non-standard syntax which isn't present anywhere else in the system.

One can only find a rationale for this by looking closely at the code and some of the comments attached to it (as well as by tracing back PRs, commits, etc.): https://github.com/hashicorp/consul/blob/master/agent/connect/common_names.go

Problem with this approach is that the CN cannot really be inferred at runtime (for dynamic config purposes) which can be quite limiting, besides leaving the consumer with a proprietary / opinionated implementation as the only available option

*** It is understood that SNI (service discovery chain) is not being used here because of a 64-character constraint in the X509 spec

Suggested Features
  • First and foremost, these internals should be documented and explained in detail, probably here: https://www.consul.io/docs/connect/connect-internals
  • The documentation for connect API should also link you to this very relevant information
  • There should be an API endpoint under /agent/connect which allows you to easily and reliably get the calculated CN for a specific service name
  • Besides having this as a "default" behaviour, the API should also allow consumers to provide a specific CN (or maybe SAN?) to the CSR. This optional parameter can be validated using regular expressions, etc. to make sure it is constructed adequately and in compliance with what connect internals require
  • Another idea off the top of my head is to keep this functionality when internal CA is being used, but allow greater flexibility (and decoupling) on the CSR when Vault has been configured as the CA; then Vault can perform proper validations, enforce certification chains and deal with any additional security concerns.
  • If the "cluster id" (first 8 digits of trust domain UUID) is absolutely necessary for internal functionality, then this value should also be available through the API (and visible for consul template for example).

Use Case(s)

This is extremely relevant for any application which does native integration with Consul Connect and relies on the certificate's CN for establishing mutual TLS.

One very obvious example would be Java, where mutual TLS is delegated to keystores / truststores containing the certification chain. CN is a key element in these cases, and one cannot expect every application to be fully SPIFFE / SPIRE compliant just yet.

This is the specific use case which put us in this conundrum:

https://github.com/gugalnikov/presto-consul-connect

Presto is a Java-based distributed SQL application (with a complex internal architecture) which we are integrating natively with Consul Connect. A contribution was made recently to add pluggable certificate authenticators to the tool for this purpose:

https://prestosql.io/docs/current/develop/certificate-authenticator.html

The plugin works quite well and Consul Connect can secure both internal and external communication to the Presto coordinator and workers, but the aforementioned situation with Leaf Certificate Common Names is really limiting our flexibility when it comes to dynamic provisioning, scaling, etc.

@jsosulska jsosulska added theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/enhancement Proposed improvement or new feature labels Jun 23, 2020
@banks
Copy link
Member

banks commented Jun 23, 2020

Thanks for the issue @gugalnikov!

To share some background - Connect treats the Common Name in certificates purely as a human-readable administrative tool. We encode all that information just to make it easy for humans who happen to be looking at certificates for example in Vault or an AWS CA listing to be able to work out what that certificate is and where it was generated. Cluster ID is included to ensure organisations with multiple distinct Consul clusters and a shared CA have clear indication in the name listed in CA interfaces/APIs.

As far as actual identity in the mesh, Connect completely ignores the Common Name and relies only on the URI SAN. We wouldn't want to make URI SANs user-configurable because it would break the security model of Connect - we are careful to only encode in those facts that can be trusted because they can be encoded into the ACL tokens given out to workloads in the first place. If we allowed users to add other fields or change the identity to something we can't verify via ACLs then Connects entire security model is bypassed - anyone can forge any certificate they like in effect.

It's not currently a goal of Connect to support traditional web-PKI verification of hostnames - we instead validate the certificate chain is trusted and then validate the SPIFFE URI against the intentions specified. It sounds like the crux of this request is to be able to also use Connect-generated certificates as more traditional Web PKI certificates that can be validated by traditional TLS clients against the hostname of the server presenting them etc.

That's potentially possible, but it would been careful thought for the same reason as above - that web-PKI hostname validation is only meaningful if the certificate hostnames requested in CSRs are actually validated in some way. If any workload can request any CN or DNS SAN to be added then there is zero security in using hostname verification and you might as well just ask clients to skip hostname verification.

One interesting exception here is that we already allow Consul Client Agents to generate certs via Connect when using auto_encrypt as part of that we allow the client agent to add an additional DNS SAN for localhost, 127.0.0.1 and any other optional hostnames that the node might be reachable on so that regular web hostname verification will work when that cert is presented as the server cert for a TLS connection to the agent's API. The threat model there is very different to our service workloads though - only operators able to configure the client agent can change those values and a malicious user can only "spoof" the identity of there local agent that they already have access to anyway so there is not so much risk.

So with that background in mind, what do you hope to achieve in security threat modelling terms by using Consul Connect certificates with external TLS systems that don't support SPIFFE? Would making the format configurable but still only allowing values that are enforceable via ACL rules be sufficient for your case? If not, how would Consul validate the Common Names chosen to prevent malicious users being able to forge whatever identity they want (at least as far as external clients that are using CN as an identity are concerned)?

@gugalnikov
Copy link
Author

Hi, thanks for the very comprehensive answer @banks, I believe making the format configurable but still only allowing values that are enforceable via ACL rules would be quite sufficient for our use case. I understand that this cannot be completely open due to security concerns and it's also not our intention at all to break or bypass connect's trust model.

What we want to achieve is the following: in our point of view, connect native integration allows us to plug-in this kind of distributed application while delegating trust & service identity to a sophisticated and scalable security model; we're interested in applying this model not only to external communication but also internal, because even though these technologies are key for us (eg. Presto, Kafka, etc.), we don't really want to drop silos or mystery boxes (security-wise) into our system which is all based on a Consul Connect architecture. So, when you have to go through a Java keystore to establish mutual TLS, then the CN becomes relevant, which doesn't mean it has to be completely customizable but a bit more deterministic in nature. This is why we use the leaf certificate and CA roots to build the JKSs, and once we are able to have a proper SSL handshake, then we of course validate the SPIFFE URI and hit the authorization endpoint to check on intentions. Our implementation intends to follow the pattern described here: https://www.consul.io/docs/connect/native , and from the documentation page describing connect leaf certificate: "This certificate is used as a server certificate for accepting inbound connections and is also used as the client certificate for establishing outbound connections to other services" , so we figured this is a proper service identity certificate we can actually leverage for the purposes described above.

@banks
Copy link
Member

banks commented Jun 25, 2020 via email

@gugalnikov
Copy link
Author

Hi, yes, adding an additional DNS SAN for the Consul DNS name of the service would be brilliant; I actually have been taking a look at the updated standards for X.509 hostname verification (https://tools.ietf.org/search/rfc6125), and DNS SAN seems to be now the preferred option as opposed to CN.

We are using Consul DNS for addressing. We've also been able to curl the trust domain from https://www.consul.io/api-docs/connect/ca#list-ca-root-certificates as you pointed out, but the resulting code is not the neatest as it requires cutting down the 1st 8 characters, etc. and we also haven't been able to fetch this info. using consul template.

@jsosulska jsosulska added the theme/certificates Related to creating, distributing, and rotating certificates in Consul label Jul 2, 2020
@quinndiggity
Copy link

Please, please, please enable a path to a deterministic cluster id; it would be ideal if one could take either of the following paths from an empty, unbootstrapped Consul cluster:

connect = {
  enabled = true
  ca_provider = "consul"
  ca_config   = {
    private_key        = "...CONSUL_CA_KEY_CONTENTS..."
    root_cert          = "...CONSUL_CA_CRT_CONTENTS..."    # <----- with a proper pathlen:0, etc to minimize risk

or:

connect = {
  enabled = true
  ca_provider = "vault"
  ca_config   = {
    address               = "https://...:8200/"
    token                 = "..."
    root_pki_path         = "pki_consul_root/"                    # <---- with this set ahead of time with restricted pathlen, and any other compliance necessaries, etc, via `vault write pki_consul_root/config/ca pem_bundle="@root.vault.bundle.pem"`
    intermediate_pki_path = "pki_consul_intermediate/" # <---- with this set ahead of time with restricted pathlen, and any other compliance necessaries, etc, via `vault write pki_consul_intermediate/config/ca pem_bundle="@intermediate.vault.bundle.pem"`

@quinndiggity
Copy link

Currently there are a lot of hoops to jump through in order to have a proper air gapped root CA (or in a HSM) with pathlen:2, with subsequent Vault as Consul's pki root with pathlen:1 and intermediate as pathlen:0

I still haven't found a way to generate client tls certs if using auto_encrypt and Consul's built-in CA (specifically when not set statically with a cert/key generated+signed outside Consul), which makes it impossible to have TLS enabled any and everywhere.

@apollo13
Copy link
Contributor

Hi @banks,

If we added an additional DNS SAN for the Consul DNS name of the service
would that work for you? I suspect most modern clients will support DNS SAN
over the common name during validation?

This would be massively helpful; currently people are working on supporting consul connect directly in traefik ( Gufran/traefik#1 ). Our current limitation is that GO does not provide any option to simply disabling hostname validation short of disabling any TLS validation (which includes CA validation). What we can easily do is provide a servername ala service or service.connect.consul for that matter.

The other option in GO would be to disable the builtin validation and roll your own, but rolling your own crypto validation code begs for problems (and is often not possible). What is possible though for most systems is supplying a hostname for SNI. This would be a great addition while reducing the complexity of native connect tls clients.

@apollo13
Copy link
Contributor

apollo13 commented Feb 9, 2021

@banks: Since you reacted with a thumbs up… ;) The current connect integration to traefik is blocked (kinda) by traefik/traefik#7826 because the traefik team would prefer a way that does not involve overriding the cert validation process completely. The current suggested approach would not work for us since we still wouldn't know how to construct the SPIFFE host before getting the peer certificate (at least that is how consul upstream does it).

Did you get any chance to discuss internally if a DNS SAN would be okay and if yes is there any chance to get a timeline? I imagine that the required changes are rather minimal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

5 participants