Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113
GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113
Changes from 1 commit
79deab7
7eb7dd6
31dce71
5fd598f
5d84379
a2b2dbc
b38b787
1d73d28
24dff1f
2c04f4e
ae5d524
b261560
79f818b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to at least list another option here. If we're saying that the Application Developer is the only one that can/should configure this policy, we're basically saying that the Application Developer is responsible for both:
For the validation to really be meaningful and useful, 1 and 2 feel like they should be done by different people. I get that in reality they're often configured by the same person, but I'd like to suggest that we should at least describe that potential problem here and recommend that another persona, such as the Cluster Operator may want to be involved here.
This whole line of thinking has made me wonder if any users would actually want to configure a set of trusted backend CAs per Gateway instead of per Service. I'm less sure of that one and am only raising this in case someone else has also had similar thoughts/interests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the important part here is to consider whose intent we are representing in this API. I suggest something like this:
The important part here is that the application running inside the Pods expects connections to look a certain way - the TLSConnectionPolicy object is about having a way for the application developer to explicitly describe that expectation.
We can use this way of looking at things to make calls about what should be included in the TLSConnectionPolicy.
While the Gateway as a TLS client does need to know if it should pass a client certificate, that should be configured on the Gateway, by the Gateway owner, so that the client and server have their chain of custody properly managed. (see https://github.com/kubernetes-sigs/gateway-api/pull/2113/files#r1268891063 for some more thoughts on this from me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott I have added to the API to describe an idea that if the backend requires TLS and the Gateway hasn't been configured for it, then it should be possible to use system trusted certs. I think this addresses your concern. I don't feel it is a problem for 1 and 2 to be done by the same person.
@youngnick I agree with what you've written here and can update the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two are already there:
That TLS should be attempted at all -> the TLSConfigPolicy is there for the targetRef.
What CA to use -> TrustedCACertRefs
I will add the TLS versions and SAN names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the great things about this being a separate resource is that multiple personas can be granted access to this resource. In some cases that might be the application developer, and in others, that might be another role like cluster operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on an update for this that ties in with updates for #2113 (comment) and #2113 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be adding something like this to explain a future approach.
One idea is to use two types: ApplicationTLSConnectionPolicy,
and GatewayTLSConnectionPolicy, where the application developer is responsible for the former, the cluster operator is responsible for the latter, and the cluster operator may configure whether certain settings may be overridden by application developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "enabled"? this is a property of the verification. So does enabled mean we should verify it (we ought to by default or it's the equivalent of -k in curl).
If so what are we verifying? It's not just a bool.
Browsers handle this by asserting the DNS name matches the certificate, but a Kubernetes service doesn't really have as strict of a mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what "enabled" means is that a SAN (multi-domain) certificate will be accepted when true. When false, a certificate containing SANs will be rejected right away. The verification is always done (unless disabled otherwise with a different setting).
Regarding what we are verifying it's a good question: in Kubernetes (or service meshes like Istio) certificates may contain SPIFFE ids but the client needs to be provided with a mapping (service to valid SPIFFE id) to authorize the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAN is the only non-deprecated way to encode a DNS name as far as I know (CN was deprecated 20 years ago). So it's not a question of if we verify San or not - we MUST.
The question is what SAN do we verify. Tou don't have to get into spife even, the pod could have any certificate..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the simplest case would be to assume svc.ns.svc.cluster.local but I am not sure that is sufficient, or even a common usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that SAN certs have been around for long and CN-only certs have been deprecated so this setting is not needed/useful.
Regarding "what SAN do we verify": may be that needs to be an additional field here with defaults for inferred names like "svc.ns.svc.cluster.local"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTTPRoute hostname is the public internet facing hostname, not the internal service. So I'm a route from example.com->example-svc, I would Not expect the workload to have a cert for example.com
it could be example-svc.ns.svc.cluster.local maybe - but there is no standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that very service-mesh specific to not expect the pod to have a cert for example.com? Alternatively, with SANs, it could list example.com and example-svc.ns.svc.cluster.local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not related to service mesh in any way. I wouldn't expect them to have a cert for example.com since the Gateway itself has that. And it's not clear why you would serve the same cert in 2 places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair they could have any cert - even google.com (with their own ca, of course). So anything is possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal workload on the pod can serve requests either from a Gateway or another internal workload (the mesh scenario) and in both cases is likely to serve the same certificate (unless SNI is used?) which is more likely to be example-svc and not example.com.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better as a list that's keyed off the PortNumber
Then it's possible to have different TLS configurations for different ports in a single Service. Unless you were specifically looking to have those be distinct policies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guidelines are only one target ref per policy attachment. Maybe this also depends on whether we create a new type of target ref that includes port or ports.
x-ref https://github.com/kubernetes-sigs/gateway-api/pull/2113/files/5d84379d2aa570958ce6dbd3fb8cf187447e9f95#r1231594914
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To close this one out, we're adding
sectionName
to Policy TargetRef - which will allow you to target the port'sname
field.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the meeting, can we also add the SNI to use ? SNI is used in virtually all https connections on the internet, and allows a backend to select what cert to provide.
If we do add the SNI - could we also modify the AllowedSAN to state "if no AllowedSAN is provided, the SNI will be used to check" ( since this is also the most common scenario on the internet )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, the SNI has to be singleton on the TLSConnectionPolicy, and is sent by the Gateway in this process, in order for the backend application to choose which of several different certificates it would provide. I don't think this is use case we need to support, as we want the TLSConnectionPolicy to be able to serve several endpoints.
i did not hear alot of support for adding SNI at the meeting. i'll open a slack discussion and find out if there is widespread support, and whether TLSConnectionPolicy is the place to support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs more specificity on what we are supposed to do with it. Is it a client certificate? Is it a CA? Either? Both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends, so it can be both. It is one or more reference to a certificate. When there is a chain of certificates, one or more can be CA/IntermediateCA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik a
kubernetes.io/tls
forces the secret to containtls.key
andtls.crt
, but thetls.key
is not required for TLS originationThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkodg do you have a suggestion or change for the
Support:
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could try and relax this constraint, i.e. a
kubernetes.io/tls
is valid if atls.key
ORtls.crt
key is set.cc @robscott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or the CA Cert can be inputted as a ConfigMap (local to the namespace), and in the case of mTLS, the client's key and cert can be inputted as a
kubernetes.io/tls
SecretThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere I commented that we should allow/support mTLS to be originated from the gateway so I like the idea of
kubernetes.io/tls
SecretThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I just realized an mTLS client requires 3 things: a private key, identity cert (public key) and a trust store to validate peer (server) cert. If
kubernetes.io/tls
containstls.key
andtls.crt
then we need one more thing (may be the Cluster trust bundle @howardjohn mentioned?) for full mTLS client side cert config.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott
We could change this to ConfigMap for CA Certs now and add kubernetes.io/tls Secret to a different PolicyAttachment, like mTLSConnectionPolicy I mentioned earlier.
It feels out of scope for this proposal, but I can open another discussion or issue to develop a different kind of resource for certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that makes sense to me - a well-defined field in a ConfigMap for now with a documented plan to move to a new resource type longer term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to ClusterTrustBundle in the long-term, but well-formatted configmap for now makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this could be done through implementation specific things, but is it possible to have an option to use the system trusted certificates? This is how most TLS configurations i have seen work - I don't have to tell chrome, curl, etc to use
/etc/ssl/certs/ca-certificates.crt
for example.This is important since these do not exist as Secrets. Rather, they are... somewhere... Every client is going to have some, but they may be in different places (there are like 8 different well-known filepaths in https://golang.org/src/crypto/x509/root_linux.go for example, so systems do not use files. etc).
One option is to make this the default if the list is unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor nit, I think implementations will essentially apply this policy to anywhere they're mapping a BackendRef that matches the combination of the TLSConnectionPolicy TargetRef and Port. This is not great, but maybe something like this would be a bit clearer, very open to improvements/changes though:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending a decision on removing Port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x-ref #2113 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the target of this policy is a
Service
and as far as I know theService
will have atargetPort
(or defaulted to service port?). Can we just use that instead so this port is not needed? Otherwise how does the implementation work when thetargetPort
in theService
and thePort
here are different?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Gateway API, a port reference is traditionally part of a Service reference (BackendRef is best example) and it refers to the Port the Service is listening on, not the targetPort (port the Service backends are listening on). The reason this is needed is that it's fairly common for Kubernetes Services to listen on multiple ports and protocols.
So given the following example, this would be referring to port 80, and specifically backendRefs that are targeting this Service on port 80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. So in the case where a Kubernetes service listens on a single port, the
Port
field here (on line 243) has to match the service port. Put another way, the port can be inferred in such a case. Just confirming my understandingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about letting policy authors use port names and not just numbers?
Similar to how K8s Service
ports.targetPort
isIntOrString
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's decide on whether we need a new object that includes Port, then we can decide on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can
ResolvedRefs
condition be also used here? since the policy addresses external references - service, secretsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, unless there are privacy issues. As an implementor, what kind of condition reasons would you think should be there? Should there be a general validation on the certificate so it could have an invalid cert reason? Or does this store results of an analysis of reference grants between the policy and targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most implementations would only look at this if they were implementing a Service that was targeted by one of these policies, so I don't think
ResolvedRefs
would be relevant in that case, but I agree that it likely would be very useful for the secret reference(s) that will be included.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one I will pass on unless there is further interest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does an implementation do for "not supported"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, these are invalid configurations, so they wouldn't pass validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can they fail validation? The configuration and the route are separate objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, I didn't think it through. So for "not supported", the intention really is that this is not supported, at least not in this proposal, where we're only covering HTTPRoute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something that would benefit from both nested conditions (as suggested in https://github.com/kubernetes-sigs/gateway-api/pull/2113/files#r1239031972) and the SupportedFeatures GEP from @LiorLieberman. We can say that every implementation that claims supports for this policy needs to populate status on the policy describing which Routes it is being implemented for (and maybe also which Routes it is not being implemented for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to take mesh clients into account (or rather, another table for mesh). Should they be following TLSConnectionPolicy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the mesh case is not covered in this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be accepting new APIs into this repo that don't take mesh (now a core use case of the project) into account. I am not talking about handling "mesh transport", but we can't just have no answer for what mesh client that is implementing the API should do when they API is there. Even if it's saying "should always ignore it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spelled it out in Longer Term Goals that service mesh use cases are a worthy goal, but it's a goal that may need a different GEP for proper attention. I hope we don't get into a habit where we approve/merge a preliminary GEP with clearly stated goals and then request to change the goals later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a small clarification to longer term goals to say that implementations that support those 3 use-cases should ignore TLSConnectionPolicy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be the same as HTTP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not proposing any other changes in behavior except for HTTPRoute, so I'm not in a hurry to make GRPCRoute have the same behavior as HTTPRoute and be changed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels fine for provisional but GRPCRoute should probably be supported for implementable. Not blocking right now though IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would request to wait until GRPCRoute itself is no longer experimental. https://gateway-api.sigs.k8s.io/api-types/grpcroute/
Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may actually be pretty straightforward and identical to HTTP as @howardjohn suggested. @gnossen can correct me here, but I think we should expect the same config from TLS from Gateway -> Backend to apply equally to HTTPRoute and GRPCRoute.
As far as the experimental status of GRPCRoute, it's hard to get right. We've gotten into situations in upstream Kubernetes where we let in a bunch of incompatible alpha features and it became quite a pain to graduate any of them past alpha. In this case with GRPCRoute being fairly stable at this point, I'd rather cover it here unless it ends up adding significant complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will mark it "implementation-dependent" intstead of "Yes" in the "Connect to backend with TLS?" column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @robscott . Disregarding any issues of experimental state, I think the values for GRPCRoute should be the same as for HTTPRoute in this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Application can reject certificates with SANs" doesn't make sense (unless its mTLS).
The application is the one with a certificate, it wouldn't reject its own certificate.
The client is verifying the SAN in the application's certificate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if I replace the last sentence with:
"The backend owner is the application developer, and the route owner will have to collaborate with the application developer to provide the appropriate configuration for TLS. The implementation would need to take the certificate provided by the application and verify that it satisfies the requirements of the route-as-client, including SAN information. Sometimes the backend owner and route owner are the same entity."