New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113
GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113
Conversation
Hi @candita. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
geps/gep-1897.md
Outdated
// | ||
// This field is required for any TLSConnectionPolicyConfig. | ||
// | ||
// Support: Core - A single reference to a Kubernetes Secret of type kubernetes.io/tls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik a kubernetes.io/tls
forces the secret to contain tls.key
and tls.crt
, but the tls.key
is not required for TLS origination
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkodg do you have a suggestion or change for the Support:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could try and relax this constraint, i.e. a kubernetes.io/tls
is valid if a tls.key
OR tls.crt
key is set.
cc @robscott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or the CA Cert can be inputted as a ConfigMap (local to the namespace), and in the case of mTLS, the client's key and cert can be inputted as a kubernetes.io/tls
Secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere I commented that we should allow/support mTLS to be originated from the gateway so I like the idea of kubernetes.io/tls
Secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I just realized an mTLS client requires 3 things: a private key, identity cert (public key) and a trust store to validate peer (server) cert. If kubernetes.io/tls
contains tls.key
and tls.crt
then we need one more thing (may be the Cluster trust bundle @howardjohn mentioned?) for full mTLS client side cert config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using something other than secret to story a CA Cert since that's not really a secret. One of the simplest paths forward would be to use a ConfigMap for CA Cert and a kubernetes.io/tls Secret for the client key and cert for mTLS.
We could change this to ConfigMap for CA Certs now and add kubernetes.io/tls Secret to a different PolicyAttachment, like mTLSConnectionPolicy I mentioned earlier.
Alternatively, I'd chatted with @enj in the past about developing a different kind of resource under the scope of Gateway API that was meant for storing certs. This could potentially help with the issue we have today where most Ingress/Gateway controllers are deployed with read access to all secrets in the cluster by default. I think the ongoing ReferenceGrant work in sig-auth may make this idea irrelevant longer term, but thought it was at least mentioning in this context.
It feels out of scope for this proposal, but I can open another discussion or issue to develop a different kind of resource for certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that makes sense to me - a well-defined field in a ConfigMap for now with a documented plan to move to a new resource type longer term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to ClusterTrustBundle in the long-term, but well-formatted configmap for now makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes for this.
On line 19 (I could not comment on that line), I see an incomplete sentence: "In terms of the Gateway API personas, only the application developer persona in this solution." Should it be "In terms of the Gateway API personas, only the application developer persona is considered in this solution." ? |
@sanjaypujare I changed it to "In terms of the Gateway API personas, only the application developer persona applies in this solution." |
@@ -16,7 +16,7 @@ so in order to drive resolution this GEP focuses only on this single piece of fu | |||
1. The solution must satisfy the following use case: the backend pod has its own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a use-case where a backend (pod/service) is also receiving mesh traffic (in addition to traffic from the gateway) and is configured to only accept mTLS connections. Can we enhance the TLSConnectionPolicy
struct to add:
- a mode field (to specify mTLS or TLS)
- a field similar to
TrustedCACertRefs
- sayClientCertRefs
- which the client will use to provide client cert in case of mTLS
So that this API can be directly used for the mesh-case and also cases where the Gateway could or would like to provide a client-cert with an mTLS connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mTLS is intentionally out of scope for this phase of the GEP, but we want to ensure we have room to add it in the future. So it's probably worth sketching out some possible future extensions for mTLS here, but I don't want to block on deciding on a solution here, just as long as we agree that we've left room for a future solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally want to keep mesh transport out of scope forever, not just short term. Note that is not necessarily the same as "mtls", though.
Mesh transport is NOT a service level concern though, its an infrastructural concern. You would not configure ipsec per service, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mesh case is also intentionally out of scope for this tightly-scoped proposal. In the case we'd need to do something for mTLS, I would rather propose a new mTLSConnectionPolicy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a new mTLSConnectionPolicy type does not sound that bad after all although a single type like TLSConnectionPolicy supporting mTLS is preferable in which case we should make sure it's possible to add it later.
I disagree that mTLS is not a service level concern so should be left out. mTLS = TLS + client certificate. If TLS is a service level concern, then so is mTLS. Also mTLS provides client identity which is used in authz and other(?) mesh configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @howardjohn that mesh transport is an infrastructure concern; in fact, most meshes go out of their way to make the mTLS transparent to the application. Ideally, the application developer doesn't necessarily know or care that their application has mTLS configured. IMO the only use-case for a mTLS policy in Gateway API would be application-enforced mTLS which is certainly a use-case, but not in scope for this GEP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue here is one of term overloading and us again not being able to be explicit enough.
@sanjaypujare appears to be using mTLS
to mean "a connection between a Gateway and backend where the Gateway verifies the backend and the backend verifies the Gateway", or in other words a mutual TLS handshake where the certificate management is done by the owner of the service, whereas I think @keithmattix and @howardjohn are more thinking about the Service Mesh use of mTLS
to mean "a mutual auth handshake where the certificate management is performed silently and transparently by infrastructure". In the latter case, it's important that Gateway API constructs are not required, because if they are, then the certificate management is no longer transparent.
The important part here is that this GEP is not about magic infrastructure that automatically upgrades connections to TLS on your behalf, it's about an app owner being able to say that their app expects TLS and has no magic to make it work transparently.
This is why, in the Listener case, I've pushed very hard for us to call similar functionality "client certificate configuration" rather than "mTLS", because the use of the acronym can sometimes, depending on who's using it, carry a lot of extra meaning that others may not know is there.
Now, can I see a use for Client Certificate settings in the future in TLSConnectionPolicy? Possibly, but I think it's important that we keep in mind personas here. The whole idea of having mutual authentication of certificates is that it's expected that the two parties spoke independently to the Certificate Authority, and took some steps to prove that they are who they say they are, and then the Certificate Authority provides a method for the two parties to check that the CA that they both trust says the other is okay.
Having the service or backend owner provide the client certificate details completely bypasses this - at that point, there is very little gain in using TLS at all.
For Gateway API to handle the "service/backend owner is doing their own TLS", and "service or backend owner wants to validate the clients connecting to it", it's on the service or backend owner to provide a method for the Gateway owner to retrieve a certificate, and on us, as Gateway API designers, to provide a place on the Gateway to configure the client keypair that should be used (since it's both a private and a public key that are required in the client certificate case).
So, what's the summary of this rambling diatribe?
- We've excluded "mTLS" from this precisely because the fact that different people can use it to mean different things means we end up having discussions exactly like this one, over and over again.
- This GEP is about explicit TLS configuration, not the implicit, automatic TLS configuration you get with a Service Mesh or similar infrastructure tooling. Perhaps we should update some of the preamble to this effect if this makes sense to people other than me, and we can avoid yet another round of this discussion.
- Using an explicitly configured client certificate for the Gateway end of a TLS connection is a reasonable use case, but that should be configured on the Gateway, not as part of the TLSConnectionPolicy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely put. I'll add some of this to the preamble.
geps/gep-1897.md
Outdated
// if the connection should be TLS, the targetRef’s certificate | ||
// should be validated by the certs in TrustedCACertRefs, and a | ||
// status delivered in the response for validation failures. | ||
Port PortNumber `json:port,omitempty` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the target of this policy is a Service
and as far as I know the Service
will have a targetPort
(or defaulted to service port?). Can we just use that instead so this port is not needed? Otherwise how does the implementation work when the targetPort
in the Service
and the Port
here are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Gateway API, a port reference is traditionally part of a Service reference (BackendRef is best example) and it refers to the Port the Service is listening on, not the targetPort (port the Service backends are listening on). The reason this is needed is that it's fairly common for Kubernetes Services to listen on multiple ports and protocols.
So given the following example, this would be referring to port 80, and specifically backendRefs that are targeting this Service on port 80.
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app.kubernetes.io/name: MyApp
ports:
- protocol: TCP
port: 80
targetPort: 9376
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...it's fairly common for Kubernetes Services to listen on multiple ports and protocols.
Got it. So in the case where a Kubernetes service listens on a single port, the Port
field here (on line 243) has to match the service port. Put another way, the port can be inferred in such a case. Just confirming my understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this @candita! Haven't quite made it through everything, but took an initial pass.
/ok-to-test
@@ -16,7 +16,7 @@ so in order to drive resolution this GEP focuses only on this single piece of fu | |||
1. The solution must satisfy the following use case: the backend pod has its own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mTLS is intentionally out of scope for this phase of the GEP, but we want to ensure we have room to add it in the future. So it's probably worth sketching out some possible future extensions for mTLS here, but I don't want to block on deciding on a solution here, just as long as we agree that we've left room for a future solution.
geps/gep-1897.md
Outdated
// if the connection should be TLS, the targetRef’s certificate | ||
// should be validated by the certs in TrustedCACertRefs, and a | ||
// status delivered in the response for validation failures. | ||
Port PortNumber `json:port,omitempty` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Gateway API, a port reference is traditionally part of a Service reference (BackendRef is best example) and it refers to the Port the Service is listening on, not the targetPort (port the Service backends are listening on). The reason this is needed is that it's fairly common for Kubernetes Services to listen on multiple ports and protocols.
So given the following example, this would be referring to port 80, and specifically backendRefs that are targeting this Service on port 80.
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app.kubernetes.io/name: MyApp
ports:
- protocol: TCP
port: 80
targetPort: 9376
geps/gep-1897.md
Outdated
// Port is the network port that the implementation watches to | ||
// know if the connection should be TLS and the targetRef’s | ||
// certificate should be validated by the certs in TrustedCACertRefs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor nit, I think implementations will essentially apply this policy to anywhere they're mapping a BackendRef that matches the combination of the TLSConnectionPolicy TargetRef and Port. This is not great, but maybe something like this would be a bit clearer, very open to improvements/changes though:
// Port is the network port that the implementation watches to | |
// know if the connection should be TLS and the targetRef’s | |
// certificate should be validated by the certs in TrustedCACertRefs | |
// Port is the network port of the target. When a target matches a BackendRef, | |
// this Policy should apply, resulting in the certificate served by the backend | |
// being validated by the certs in TrustedCACertRefs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending a decision on removing Port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x-ref #2113 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -16,7 +16,7 @@ so in order to drive resolution this GEP focuses only on this single piece of fu | |||
1. The solution must satisfy the following use case: the backend pod has its own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally want to keep mesh transport out of scope forever, not just short term. Note that is not necessarily the same as "mtls", though.
Mesh transport is NOT a service level concern though, its an infrastructural concern. You would not configure ipsec per service, for example.
geps/gep-1897.md
Outdated
// | ||
// This field is required for any TLSConnectionPolicyConfig. | ||
// | ||
// Support: Core - A single reference to a Kubernetes Secret of type kubernetes.io/tls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-trust-anchor-sets#clustertrustbundle-object ? Its in 1.27 I think, alpha though. Its also cluster scoped which is good or bad depending on how you look at it..
geps/gep-1897.md
Outdated
type TLSConnectionPolicyConfig struct { | ||
// TrustedCACertRefs contains one or more references to | ||
// Kubernetes objects that contain TLS certificates, which are | ||
// used to establish a TLS handshake between the gateway and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs more specificity on what we are supposed to do with it. Is it a client certificate? Is it a CA? Either? Both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends, so it can be both. It is one or more reference to a certificate. When there is a chain of certificates, one or more can be CA/IntermediateCA.
geps/gep-1897.md
Outdated
| TLSRoute | Listener Mode: Passthrough | Yes | No | | ||
| TLSRoute | Listener Mode: Terminate | Yes | Not supported | | ||
| TLSRoute | Listener Mode: Passthrough | No | No | | ||
| TLSRoute | Listener Mode: Terminate | No | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to take mesh clients into account (or rather, another table for mesh). Should they be following TLSConnectionPolicy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the mesh case is not covered in this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be accepting new APIs into this repo that don't take mesh (now a core use case of the project) into account. I am not talking about handling "mesh transport", but we can't just have no answer for what mesh client that is implementing the API should do when they API is there. Even if it's saying "should always ignore it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spelled it out in Longer Term Goals that service mesh use cases are a worthy goal, but it's a goal that may need a different GEP for proper attention. I hope we don't get into a habit where we approve/merge a preliminary GEP with clearly stated goals and then request to change the goals later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a small clarification to longer term goals to say that implementations that support those 3 use-cases should ignore TLSConnectionPolicy?
geps/gep-1897.md
Outdated
This table describes the effect that a TLSConnectionPolicy has on a Route. There are only two cases where the | ||
TLSConnectionPolicy will signal a Route to connect to a backend using TLS, an HTTPRoute with a backend that is targeted | ||
by a TLSConnectionPolicy, either with or without listener TLS configured. (There are a few other cases where it may be | ||
possible, but is purposely marked “not supported” due to a desire for less confusion on the assigned purpose of each of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does an implementation do for "not supported"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, these are invalid configurations, so they wouldn't pass validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can they fail validation? The configuration and the route are separate objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, I didn't think it through. So for "not supported", the intention really is that this is not supported, at least not in this proposal, where we're only covering HTTPRoute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something that would benefit from both nested conditions (as suggested in https://github.com/kubernetes-sigs/gateway-api/pull/2113/files#r1239031972) and the SupportedFeatures GEP from @LiorLieberman. We can say that every implementation that claims supports for this policy needs to populate status on the policy describing which Routes it is being implemented for (and maybe also which Routes it is not being implemented for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about,
Every implementation that claims supports for TLSConnectionPolicy should document for which Routes it is being implemented.
geps/gep-1897.md
Outdated
| UDPRoute | No listener TLS | N/A | No | | ||
| UDPRoute | Listener TLS | N/A | No | | ||
| UDPRoute | No listener TLS | N/A | No | | ||
| GRPCRoute | N/A | N/A | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be the same as HTTP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not proposing any other changes in behavior except for HTTPRoute, so I'm not in a hurry to make GRPCRoute have the same behavior as HTTPRoute and be changed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels fine for provisional but GRPCRoute should probably be supported for implementable. Not blocking right now though IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would request to wait until GRPCRoute itself is no longer experimental. https://gateway-api.sigs.k8s.io/api-types/grpcroute/
Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may actually be pretty straightforward and identical to HTTP as @howardjohn suggested. @gnossen can correct me here, but I think we should expect the same config from TLS from Gateway -> Backend to apply equally to HTTPRoute and GRPCRoute.
As far as the experimental status of GRPCRoute, it's hard to get right. We've gotten into situations in upstream Kubernetes where we let in a bunch of incompatible alpha features and it became quite a pain to graduate any of them past alpha. In this case with GRPCRoute being fairly stable at this point, I'd rather cover it here unless it ends up adding significant complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will mark it "implementation-dependent" intstead of "Yes" in the "Connect to backend with TLS?" column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @robscott . Disregarding any issues of experimental state, I think the values for GRPCRoute should be the same as for HTTPRoute in this table.
geps/gep-1897.md
Outdated
these are currently implementation-dependent, with the following recommended defaults: | ||
- Server Name Indication: enables passing of the server name through server name indication, in the TLS transaction, to | ||
assist with selection of certificates when several hosts share the same IP address. (default to enabled) | ||
- Subject Alternative Name certificates: enable the use of a single certificate that can serve multiple domains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "enabled"? this is a property of the verification. So does enabled mean we should verify it (we ought to by default or it's the equivalent of -k in curl).
If so what are we verifying? It's not just a bool.
Browsers handle this by asserting the DNS name matches the certificate, but a Kubernetes service doesn't really have as strict of a mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what "enabled" means is that a SAN (multi-domain) certificate will be accepted when true. When false, a certificate containing SANs will be rejected right away. The verification is always done (unless disabled otherwise with a different setting).
Regarding what we are verifying it's a good question: in Kubernetes (or service meshes like Istio) certificates may contain SPIFFE ids but the client needs to be provided with a mapping (service to valid SPIFFE id) to authorize the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAN is the only non-deprecated way to encode a DNS name as far as I know (CN was deprecated 20 years ago). So it's not a question of if we verify San or not - we MUST.
The question is what SAN do we verify. Tou don't have to get into spife even, the pod could have any certificate..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the simplest case would be to assume svc.ns.svc.cluster.local but I am not sure that is sufficient, or even a common usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that SAN certs have been around for long and CN-only certs have been deprecated so this setting is not needed/useful.
Regarding "what SAN do we verify": may be that needs to be an additional field here with defaults for inferred names like "svc.ns.svc.cluster.local"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTTPRoute hostname is the public internet facing hostname, not the internal service. So I'm a route from example.com->example-svc, I would Not expect the workload to have a cert for example.com
it could be example-svc.ns.svc.cluster.local maybe - but there is no standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that very service-mesh specific to not expect the pod to have a cert for example.com? Alternatively, with SANs, it could list example.com and example-svc.ns.svc.cluster.local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not related to service mesh in any way. I wouldn't expect them to have a cert for example.com since the Gateway itself has that. And it's not clear why you would serve the same cert in 2 places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair they could have any cert - even google.com (with their own ca, of course). So anything is possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal workload on the pod can serve requests either from a Gateway or another internal workload (the mesh scenario) and in both cases is likely to serve the same certificate (unless SNI is used?) which is more likely to be example-svc and not example.com.
Co-authored-by: Rob Scott <rob.scott87@gmail.com>
Co-authored-by: Rob Scott <rob.scott87@gmail.com>
12a6b24
to
5d84379
Compare
// | ||
// * “Accepted” | ||
// | ||
// Possible reasons for this condition to be False are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can ResolvedRefs
condition be also used here? since the policy addresses external references - service, secrets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, unless there are privacy issues. As an implementor, what kind of condition reasons would you think should be there? Should there be a general validation on the certificate so it could have an invalid cert reason? Or does this store results of an analysis of reference grants between the policy and targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most implementations would only look at this if they were implementing a Service that was targeted by one of these policies, so I don't think ResolvedRefs
would be relevant in that case, but I agree that it likely would be very useful for the secret reference(s) that will be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one I will pass on unless there is further interest.
geps/gep-1897.md
Outdated
TargetRef gatewayv1a2.PolicyTargetReference `json:"targetRef"` | ||
|
||
// TLS contains TLS connection policy configuration. | ||
TLS *TLSConnectionPolicyConfig `json:”tls”` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better as a list that's keyed off the PortNumber
Then it's possible to have different TLS configurations for different ports in a single Service. Unless you were specifically looking to have those be distinct policies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guidelines are only one target ref per policy attachment. Maybe this also depends on whether we create a new type of target ref that includes port or ports.
x-ref https://github.com/kubernetes-sigs/gateway-api/pull/2113/files/5d84379d2aa570958ce6dbd3fb8cf187447e9f95#r1231594914
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To close this one out, we're adding sectionName
to Policy TargetRef - which will allow you to target the port's name
field.
geps/gep-1897.md
Outdated
// if the connection should be TLS, the targetRef’s certificate | ||
// should be validated by the certs in TrustedCACertRefs, and a | ||
// status delivered in the response for validation failures. | ||
Port PortNumber `json:port,omitempty` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about letting policy authors use port names and not just numbers?
Similar to how K8s Service ports.targetPort
is IntOrString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's decide on whether we need a new object that includes Port, then we can decide on this.
The goal here is to start with the simplest possible API and also ensure portability as broadly as possible. As I mentioned in my 4th point, this is going to be the first API we've introduced where partial support is unlikely to suffice. Implementations will likely either need to fully implement the resource or consider the backend invalid. I think our migration path is also a bit simpler than what Contour would be dealing with. In a future world we might have something like this:
|
@robscott sounds good - |
Rob and I have spoken about this, and I agree with these points - except that I think that the default value for |
It looks like the existing |
I'm going to rename the Edit - I changed this a little, adding CertRefs and CACertRefs in the place of trustedCACertRefs. Otherwise, I adopted this suggestion. |
@robscott @youngnick @costinm, I Just pushed requested changes, PTAL when you get a chance.
|
geps/gep-1897.md
Outdated
The names of the fields were chosen to facilitate discussion, but may be substituted without blocking acceptance of the | ||
content of the API change. | ||
|
||
The `TrustedCA` contains the `CertRefs` and `CACertRefs` fields. CertRefs is a slice of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, naming is hard, but I think it's important that the field currently called CACertRefs
does not end with Refs
- that is reserved in the API conventions for structs that contain a list of references to objects.
How about instead of CACertRefs
, we have UseExtraCerts
- I suggest this because it has two possible values (YAML shown for ease of typing):
trustedCAs.useExtraCerts: ""
(or unspecified) - don't use any extra certificates for Trusted CAs.trustedCAs.useExtraCerts: "system"
- use the system certificates as extra trusted CAs.
This name implies that it's possible to use CertRefs
with UseExtraCerts
- which I think has undefined behavior at this time.
Other names I can think of:
AdditionalCerts
(YAMLtrustedCAs.additionalCerts
) Also implies a merging behavior betweenCertRefs
andAdditionalCerts
.AlternativeCerts
(YAMLtrustedCAs.alternativeCerts
) Implies that you can only choose eitherCertRefs
orAdditionalCerts
- they are mutually exclusive.
Whatever name we choose, we need to ensure that the behavior when you set both is defined. ("That's a syntax error" is okay, as is "the lists must be merged" or other possible answers, we just need to pick one, which we can also mark as temporary - given that this is Provisional
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've struggled with naming on this one as well. Of course I'd originally proposed wellKnownRoots
but obviously that has it's flaws, and not all systems are necessarily going to be using "well-known" values. So maybe we should just narrow down the scope of this field a bit. It seems very unlikely that Kubernetes or Gateway API will ever try to maintain a common set of certificates that should be used for this purpose, so we can likely say that whatever values are supported here other than none/empty, are going to be at least somewhat implementation-specific.
I think we should use the following guidelines when choosing a name here:
- Avoid a name that locks us into whether/how this combines with
certRefs
. My personal opinion is that we should start with validation that ensures that onlycertRefs
or this field can be set, and we can loosen that later if needed. I think this would rule outadditional
,alternative
, orextra
. - Avoid a boolean value so we're not conflicting with the relevant Kubernetes API Conventions.
- Building on 2, allow for some theoretical room for expansion. I think the most likely value here would be domain-prefixed values if for some reasons implementations want to support predefined sets of CAs. Even this seems highly unlikely at this point, but want to at least leave room for it so we're not boxed in years from now.
With all that said, I'm struggling to actually find a good name that meets this criteria. Some more ideas:
A. trustedCAs.systemCerts: "System"
B. trustedCAs.predefinedCerts: "System"
C. trustedCAs.standardCerts: "System"
As far as whether the empty value should be explicit ("None") or just empty (nil pointer or ""), I tend to lean towards a default of "None" so there's absolutely no ambiguity here. If we'd rather avoid "None", I think I'd lean towards a nil pointer instead of "", but open to what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@youngnick I changed the name of CACertRefs
, didn't realize the Refs suffix had a semantic association. Should I also change the name of CertRefs
, which a slice of ConfigMapNames, or is that fitting?
@robscott and I had this conversation last night and I originally proposed alternativeCerts
(amongst others) but I get Rob's point about preferring to use a name that doesn't lock us into how this field combines with certRefs
. I chose standardCerts
. certRefs
and standardCerts
are intended to be mutually exclusive in the first round.
Regarding using a nil pointer instead of "" for standardCerts
, I'd rather leave it "" so that the field is visible when yaml is examined. If we don't normally do that for our users, I can make it a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that CertRefs
is a slice of strings that are ConfigMap names. Yes, that should be a []ObjectReference
instead, with a default of Configmap as the Kind (which puts "", meaning "core" as the Group). That way, you can specify a list of certificates in Configmaps like this:
...
certRefs:
- name: CAcert1
- name: CAcert2
...
or, you can also specify other objects like Secrets if you want. Anything not a ConfigMap should be Implementation Specific support in the initial version though. This lets us change that much more easily lately. (Practically, to be able to set the defaults, you'll need to create another type like SecretObjectReference
- probably ConfigmapObjectReference
).
Using a nil pointer or "" as the default value makes no difference for display - because the field is optional and will have omitempty
set, then a nil pointer (for a *string
type) or an empty string (for a string
type) will both be not necessary in input YAMLs, and hidden in output YAMLs. The only way you'll see the standardCerts
field (which I agree is a good name) is if it's set to system
. However, if you do set standardCerts: system
, then the CertRefs
field also must be empty in this version, so it will also be hidden on output.
So you'll only ever see:
...
certRefs:
- name: CAcert1
...
or
...
standardCerts: "system"
...
Which I think is what we want.
However, there's also an interaction with Enum
validation, in that we can't specify ""
as an Enum value. So, I recommend changing standardCerts
to a *string
, because then we can have an Enum with only one value (system
).
This lets us add further Enum values later, while keeping the above behavior.
So, my requested changes here are:
CertRefs
should be a[]ConfigmapObjectReference
, where that is an ObjectReference struct that defaults to Configmap.standardCerts
should be a string pointer field with Enum validation, and one valid value:system
(orSystem
if that's what we prefer - personally, I prefer these constants to be all lower-case, it makes validation easier).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@youngnick we discussed offline, and I made the requested change to CertRefs
and changed StandardCerts
like this:
- standardCerts should be a
StandardCertType
pointer field with Enum validation, and one valid value: System. An informal review showed that alias string values usually start with a capital letter in this API, particularly condition types and reasons, so I used System rather than system.
geps/gep-1897.md
Outdated
// If the CertRefs is empty, then the system trusted | ||
// certificates should be used. If there are none, or the | ||
// implementation doesn't define system trusted certificates, | ||
// then a TLS connection must fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this because this behavior is now handled by the other field (CACertRefs
here, but as I said above, I think it needs a new name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a user could accidentally try to use System certs but the system doesn't have them, so I think it should stay in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Github, for being so slow I duplicated my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one concern about field naming, and a small fix to the struct details, but aside from that, this LGTM.
@youngnick @robscott I made the updates you suggesed and added a couple of CEL validations to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this @candita! A couple tiny nits and a suggestion to move back to a backend focused policy name but otherwise I think this is good to go.
/label tide/merge-method-squash |
🎉 Thanks for all the work on this @candita! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: candita, keithmattix, robscott The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
🎉 Thanks for all your work on this @candita! |
What type of PR is this?
/kind gep
What this PR does / why we need it:
This PR adds the API details for GEP-1897 TLS from Gateway to Backend for ingress. GEP-1897 has already had the questions answered - what do we want to do, and why do we want to do this. This update attempts to answer - how do we do this.
Which issue(s) this PR fixes:
Fixes #1897
Does this PR introduce a user-facing change?: