New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation & flexibility to the way Consul Connect Leaf Certificate CNs are calculated #8170
Comments
Thanks for the issue @gugalnikov! To share some background - Connect treats the Common Name in certificates purely as a human-readable administrative tool. We encode all that information just to make it easy for humans who happen to be looking at certificates for example in Vault or an AWS CA listing to be able to work out what that certificate is and where it was generated. Cluster ID is included to ensure organisations with multiple distinct Consul clusters and a shared CA have clear indication in the name listed in CA interfaces/APIs. As far as actual identity in the mesh, Connect completely ignores the Common Name and relies only on the URI SAN. We wouldn't want to make URI SANs user-configurable because it would break the security model of Connect - we are careful to only encode in those facts that can be trusted because they can be encoded into the ACL tokens given out to workloads in the first place. If we allowed users to add other fields or change the identity to something we can't verify via ACLs then Connects entire security model is bypassed - anyone can forge any certificate they like in effect. It's not currently a goal of Connect to support traditional web-PKI verification of hostnames - we instead validate the certificate chain is trusted and then validate the SPIFFE URI against the intentions specified. It sounds like the crux of this request is to be able to also use Connect-generated certificates as more traditional Web PKI certificates that can be validated by traditional TLS clients against the hostname of the server presenting them etc. That's potentially possible, but it would been careful thought for the same reason as above - that web-PKI hostname validation is only meaningful if the certificate hostnames requested in CSRs are actually validated in some way. If any workload can request any CN or DNS SAN to be added then there is zero security in using hostname verification and you might as well just ask clients to skip hostname verification. One interesting exception here is that we already allow Consul Client Agents to generate certs via Connect when using So with that background in mind, what do you hope to achieve in security threat modelling terms by using Consul Connect certificates with external TLS systems that don't support SPIFFE? Would making the format configurable but still only allowing values that are enforceable via ACL rules be sufficient for your case? If not, how would Consul validate the Common Names chosen to prevent malicious users being able to forge whatever identity they want (at least as far as external clients that are using CN as an identity are concerned)? |
Hi, thanks for the very comprehensive answer @banks, I believe making the format configurable but still only allowing values that are enforceable via ACL rules would be quite sufficient for our use case. I understand that this cannot be completely open due to security concerns and it's also not our intention at all to break or bypass connect's trust model. What we want to achieve is the following: in our point of view, connect native integration allows us to plug-in this kind of distributed application while delegating trust & service identity to a sophisticated and scalable security model; we're interested in applying this model not only to external communication but also internal, because even though these technologies are key for us (eg. Presto, Kafka, etc.), we don't really want to drop silos or mystery boxes (security-wise) into our system which is all based on a Consul Connect architecture. So, when you have to go through a Java keystore to establish mutual TLS, then the CN becomes relevant, which doesn't mean it has to be completely customizable but a bit more deterministic in nature. This is why we use the leaf certificate and CA roots to build the JKSs, and once we are able to have a proper SSL handshake, then we of course validate the SPIFFE URI and hit the authorization endpoint to check on intentions. Our implementation intends to follow the pattern described here: https://www.consul.io/docs/connect/native , and from the documentation page describing connect leaf certificate: "This certificate is used as a server certificate for accepting inbound connections and is also used as the client certificate for establishing outbound connections to other services" , so we figured this is a proper service identity certificate we can actually leverage for the purposes described above. |
Thanks Arturo,
In order for Java's regular TLS stack to validate the connections, the CN
(or a DNS SAN) would need to match the hostname being used to resolve the
service.
Can you share how that is working for you right now? Are you using Consul
DNS or something else for addressing?
If we added an additional DNS SAN for the Consul DNS name of the service
would that work for you? I suspect most modern clients will support DNS SAN
over the common name during validation?
FYI the trust domain part is fixed for a cluster's lifetime and possible to
query from
https://www.consul.io/api-docs/connect/ca#list-ca-root-certificates (see
TrustDomain field).
…On Tue, Jun 23, 2020 at 10:05 PM Arturo Viveros ***@***.***> wrote:
Hi, thanks for the very comprehensive answer @banks
<https://github.com/banks>, I believe making the format configurable but
still only allowing values that are enforceable via ACL rules would be
quite sufficient for our use case. I understand that this cannot be
completely open due to security concerns and it's also not our intention at
all to break or bypass connect's trust model.
What we want to achieve is the following: in our point of view, connect
native integration allows us to plug-in this kind of distributed
application while delegating trust & service identity to a sophisticated
and scalable security model; we're interested in applying this model not
only to external communication but also internal, because even though these
technologies are key for us (eg. Presto, Kafka, etc.), we don't really want
to drop silos or mystery boxes (security-wise) into our system which is all
based on a Consul Connect architecture. So, when you have to go through a
Java keystore to establish mutual TLS, then the CN becomes relevant, which
doesn't mean it has to be completely customizable but a bit more
deterministic in nature. This is why we use the leaf certificate and CA
roots to build the JKSs, and once we are able to have a proper SSL
handshake, then we of course validate the SPIFFE URI and hit the
authorization endpoint to check on intentions. Our implementation intends
to follow the pattern described here:
https://www.consul.io/docs/connect/native , and from the documentation
page describing connect leaf certificate: *"This certificate is used as a
server certificate for accepting inbound connections and is also used as
the client certificate for establishing outbound connections to other
services"* , so we figured this is a proper service identity certificate
we can actually leverage for the purposes described above.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8170 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA5QU5BKAROTSRGLDA6Y2TRYEKIFANCNFSM4OFNRK7Q>
.
|
Hi, yes, adding an additional DNS SAN for the Consul DNS name of the service would be brilliant; I actually have been taking a look at the updated standards for X.509 hostname verification (https://tools.ietf.org/search/rfc6125), and DNS SAN seems to be now the preferred option as opposed to CN. We are using Consul DNS for addressing. We've also been able to curl the trust domain from https://www.consul.io/api-docs/connect/ca#list-ca-root-certificates as you pointed out, but the resulting code is not the neatest as it requires cutting down the 1st 8 characters, etc. and we also haven't been able to fetch this info. using consul template. |
Please, please, please enable a path to a deterministic cluster id; it would be ideal if one could take either of the following paths from an empty, unbootstrapped Consul cluster:
or:
|
Currently there are a lot of hoops to jump through in order to have a proper air gapped root CA (or in a HSM) with I still haven't found a way to generate client tls certs if using |
Hi @banks,
This would be massively helpful; currently people are working on supporting consul connect directly in traefik ( Gufran/traefik#1 ). Our current limitation is that GO does not provide any option to simply disabling hostname validation short of disabling any TLS validation (which includes CA validation). What we can easily do is provide a servername ala The other option in GO would be to disable the builtin validation and roll your own, but rolling your own crypto validation code begs for problems (and is often not possible). What is possible though for most systems is supplying a hostname for SNI. This would be a great addition while reducing the complexity of native connect tls clients. |
@banks: Since you reacted with a thumbs up… ;) The current connect integration to traefik is blocked (kinda) by traefik/traefik#7826 because the traefik team would prefer a way that does not involve overriding the cert validation process completely. The current suggested approach would not work for us since we still wouldn't know how to construct the SPIFFE host before getting the peer certificate (at least that is how consul upstream does it). Did you get any chance to discuss internally if a DNS SAN would be okay and if yes is there any chance to get a timeline? I imagine that the required changes are rather minimal? |
Feature Description
Context & Background
Currently, the CN on a Consul Connect Leaf Certificate requested through the /agent/connect endpoint follows a quite unique and non-standard syntax which isn't present anywhere else in the system.
One can only find a rationale for this by looking closely at the code and some of the comments attached to it (as well as by tracing back PRs, commits, etc.): https://github.com/hashicorp/consul/blob/master/agent/connect/common_names.go
Problem with this approach is that the CN cannot really be inferred at runtime (for dynamic config purposes) which can be quite limiting, besides leaving the consumer with a proprietary / opinionated implementation as the only available option
*** It is understood that SNI (service discovery chain) is not being used here because of a 64-character constraint in the X509 spec
Suggested Features
Use Case(s)
This is extremely relevant for any application which does native integration with Consul Connect and relies on the certificate's CN for establishing mutual TLS.
One very obvious example would be Java, where mutual TLS is delegated to keystores / truststores containing the certification chain. CN is a key element in these cases, and one cannot expect every application to be fully SPIFFE / SPIRE compliant just yet.
This is the specific use case which put us in this conundrum:
https://github.com/gugalnikov/presto-consul-connect
Presto is a Java-based distributed SQL application (with a complex internal architecture) which we are integrating natively with Consul Connect. A contribution was made recently to add pluggable certificate authenticators to the tool for this purpose:
https://prestosql.io/docs/current/develop/certificate-authenticator.html
The plugin works quite well and Consul Connect can secure both internal and external communication to the Presto coordinator and workers, but the aforementioned situation with Leaf Certificate Common Names is really limiting our flexibility when it comes to dynamic provisioning, scaling, etc.
The text was updated successfully, but these errors were encountered: