GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113

candita · 2023-06-14T01:22:52Z

What type of PR is this?

/kind gep

What this PR does / why we need it:

This PR adds the API details for GEP-1897 TLS from Gateway to Backend for ingress. GEP-1897 has already had the questions answered - what do we want to do, and why do we want to do this. This update attempts to answer - how do we do this.

Which issue(s) this PR fixes:

Fixes #1897

Does this PR introduce a user-facing change?:

NONE

k8s-ci-robot · 2023-06-14T01:23:02Z

Hi @candita. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

arkodg · 2023-06-14T02:18:26Z

geps/gep-1897.md

+    // 
+    // This field is required for any TLSConnectionPolicyConfig.
+    // 
+    // Support: Core - A single reference to a Kubernetes Secret of type kubernetes.io/tls


afaik a kubernetes.io/tls forces the secret to contain tls.key and tls.crt, but the tls.key is not required for TLS origination

@arkodg do you have a suggestion or change for the Support:?

maybe we could try and relax this constraint, i.e. a kubernetes.io/tls is valid if a tls.key OR tls.crt key is set.
cc @robscott

or the CA Cert can be inputted as a ConfigMap (local to the namespace), and in the case of mTLS, the client's key and cert can be inputted as a kubernetes.io/tls Secret

Elsewhere I commented that we should allow/support mTLS to be originated from the gateway so I like the idea of kubernetes.io/tls Secret

BTW, I just realized an mTLS client requires 3 things: a private key, identity cert (public key) and a trust store to validate peer (server) cert. If kubernetes.io/tls contains tls.key and tls.crt then we need one more thing (may be the Cluster trust bundle @howardjohn mentioned?) for full mTLS client side cert config.

@robscott

I like the idea of using something other than secret to story a CA Cert since that's not really a secret. One of the simplest paths forward would be to use a ConfigMap for CA Cert and a kubernetes.io/tls Secret for the client key and cert for mTLS.

We could change this to ConfigMap for CA Certs now and add kubernetes.io/tls Secret to a different PolicyAttachment, like mTLSConnectionPolicy I mentioned earlier.

Alternatively, I'd chatted with @enj in the past about developing a different kind of resource under the scope of Gateway API that was meant for storing certs. This could potentially help with the issue we have today where most Ingress/Gateway controllers are deployed with read access to all secrets in the cluster by default. I think the ongoing ReferenceGrant work in sig-auth may make this idea irrelevant longer term, but thought it was at least mentioning in this context.

It feels out of scope for this proposal, but I can open another discussion or issue to develop a different kind of resource for certs.

Yep, that makes sense to me - a well-defined field in a ConfigMap for now with a documented plan to move to a new resource type longer term.

+1 to ClusterTrustBundle in the long-term, but well-formatted configmap for now makes sense

Made some changes for this.

geps/gep-1897.md

sanjaypujare · 2023-06-14T17:26:00Z

On line 19 (I could not comment on that line), I see an incomplete sentence: "In terms of the Gateway API personas, only the application developer persona in this solution."

Should it be "In terms of the Gateway API personas, only the application developer persona is considered in this solution." ?

candita · 2023-06-15T14:31:10Z

On line 19 (I could not comment on that line), I see an incomplete sentence: "In terms of the Gateway API personas, only the application developer persona in this solution."

Should it be "In terms of the Gateway API personas, only the application developer persona is considered in this solution." ?

@sanjaypujare I changed it to "In terms of the Gateway API personas, only the application developer persona applies in this solution."

sanjaypujare · 2023-06-15T21:38:54Z

geps/gep-1897.md

@@ -16,7 +16,7 @@ so in order to drive resolution this GEP focuses only on this single piece of fu
 1. The solution must satisfy the following use case: the backend pod has its own


I was thinking of a use-case where a backend (pod/service) is also receiving mesh traffic (in addition to traffic from the gateway) and is configured to only accept mTLS connections. Can we enhance the TLSConnectionPolicy struct to add:

a mode field (to specify mTLS or TLS)

a field similar to TrustedCACertRefs - say ClientCertRefs - which the client will use to provide client cert in case of mTLS

So that this API can be directly used for the mesh-case and also cases where the Gateway could or would like to provide a client-cert with an mTLS connection?

I think mTLS is intentionally out of scope for this phase of the GEP, but we want to ensure we have room to add it in the future. So it's probably worth sketching out some possible future extensions for mTLS here, but I don't want to block on deciding on a solution here, just as long as we agree that we've left room for a future solution.

I personally want to keep mesh transport out of scope forever, not just short term. Note that is not necessarily the same as "mtls", though.

Mesh transport is NOT a service level concern though, its an infrastructural concern. You would not configure ipsec per service, for example.

The mesh case is also intentionally out of scope for this tightly-scoped proposal. In the case we'd need to do something for mTLS, I would rather propose a new mTLSConnectionPolicy.

I guess a new mTLSConnectionPolicy type does not sound that bad after all although a single type like TLSConnectionPolicy supporting mTLS is preferable in which case we should make sure it's possible to add it later.

I disagree that mTLS is not a service level concern so should be left out. mTLS = TLS + client certificate. If TLS is a service level concern, then so is mTLS. Also mTLS provides client identity which is used in authz and other(?) mesh configs.

I agree with @howardjohn that mesh transport is an infrastructure concern; in fact, most meshes go out of their way to make the mTLS transparent to the application. Ideally, the application developer doesn't necessarily know or care that their application has mTLS configured. IMO the only use-case for a mTLS policy in Gateway API would be application-enforced mTLS which is certainly a use-case, but not in scope for this GEP

I think the issue here is one of term overloading and us again not being able to be explicit enough.

@sanjaypujare appears to be using mTLS to mean "a connection between a Gateway and backend where the Gateway verifies the backend and the backend verifies the Gateway", or in other words a mutual TLS handshake where the certificate management is done by the owner of the service, whereas I think @keithmattix and @howardjohn are more thinking about the Service Mesh use of mTLS to mean "a mutual auth handshake where the certificate management is performed silently and transparently by infrastructure". In the latter case, it's important that Gateway API constructs are not required, because if they are, then the certificate management is no longer transparent.

The important part here is that this GEP is not about magic infrastructure that automatically upgrades connections to TLS on your behalf, it's about an app owner being able to say that their app expects TLS and has no magic to make it work transparently.

This is why, in the Listener case, I've pushed very hard for us to call similar functionality "client certificate configuration" rather than "mTLS", because the use of the acronym can sometimes, depending on who's using it, carry a lot of extra meaning that others may not know is there.

Now, can I see a use for Client Certificate settings in the future in TLSConnectionPolicy? Possibly, but I think it's important that we keep in mind personas here. The whole idea of having mutual authentication of certificates is that it's expected that the two parties spoke independently to the Certificate Authority, and took some steps to prove that they are who they say they are, and then the Certificate Authority provides a method for the two parties to check that the CA that they both trust says the other is okay.

Having the service or backend owner provide the client certificate details completely bypasses this - at that point, there is very little gain in using TLS at all.

For Gateway API to handle the "service/backend owner is doing their own TLS", and "service or backend owner wants to validate the clients connecting to it", it's on the service or backend owner to provide a method for the Gateway owner to retrieve a certificate, and on us, as Gateway API designers, to provide a place on the Gateway to configure the client keypair that should be used (since it's both a private and a public key that are required in the client certificate case).

So, what's the summary of this rambling diatribe?

We've excluded "mTLS" from this precisely because the fact that different people can use it to mean different things means we end up having discussions exactly like this one, over and over again.

This GEP is about explicit TLS configuration, not the implicit, automatic TLS configuration you get with a Service Mesh or similar infrastructure tooling. Perhaps we should update some of the preamble to this effect if this makes sense to people other than me, and we can avoid yet another round of this discussion.

Using an explicitly configured client certificate for the Gateway end of a TLS connection is a reasonable use case, but that should be configured on the Gateway, not as part of the TLSConnectionPolicy.

Nicely put. I'll add some of this to the preamble.

sanjaypujare · 2023-06-15T21:55:16Z

geps/gep-1897.md

+    // if the connection should be TLS, the targetRef’s certificate
+    // should be validated by the certs in TrustedCACertRefs, and a
+    // status delivered in the response for validation failures.
+    Port PortNumber `json:port,omitempty`


So the target of this policy is a Service and as far as I know the Service will have a targetPort (or defaulted to service port?). Can we just use that instead so this port is not needed? Otherwise how does the implementation work when the targetPort in the Service and the Port here are different?

In Gateway API, a port reference is traditionally part of a Service reference (BackendRef is best example) and it refers to the Port the Service is listening on, not the targetPort (port the Service backends are listening on). The reason this is needed is that it's fairly common for Kubernetes Services to listen on multiple ports and protocols.

So given the following example, this would be referring to port 80, and specifically backendRefs that are targeting this Service on port 80.

apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app.kubernetes.io/name: MyApp ports: - protocol: TCP port: 80 targetPort: 9376

...it's fairly common for Kubernetes Services to listen on multiple ports and protocols.

Got it. So in the case where a Kubernetes service listens on a single port, the Port field here (on line 243) has to match the service port. Put another way, the port can be inferred in such a case. Just confirming my understanding

robscott

Thanks for all the work on this @candita! Haven't quite made it through everything, but took an initial pass.

/ok-to-test

robscott · 2023-06-15T22:12:41Z

geps/gep-1897.md

@@ -16,7 +16,7 @@ so in order to drive resolution this GEP focuses only on this single piece of fu
 1. The solution must satisfy the following use case: the backend pod has its own


I think mTLS is intentionally out of scope for this phase of the GEP, but we want to ensure we have room to add it in the future. So it's probably worth sketching out some possible future extensions for mTLS here, but I don't want to block on deciding on a solution here, just as long as we agree that we've left room for a future solution.

geps/gep-1897.md

robscott · 2023-06-15T22:22:17Z

geps/gep-1897.md

+    // if the connection should be TLS, the targetRef’s certificate
+    // should be validated by the certs in TrustedCACertRefs, and a
+    // status delivered in the response for validation failures.
+    Port PortNumber `json:port,omitempty`


In Gateway API, a port reference is traditionally part of a Service reference (BackendRef is best example) and it refers to the Port the Service is listening on, not the targetPort (port the Service backends are listening on). The reason this is needed is that it's fairly common for Kubernetes Services to listen on multiple ports and protocols.

So given the following example, this would be referring to port 80, and specifically backendRefs that are targeting this Service on port 80.

apiVersion: v1 kind: Service metadata: name: my-service spec: selector: app.kubernetes.io/name: MyApp ports: - protocol: TCP port: 80 targetPort: 9376

geps/gep-1897.md

robscott · 2023-06-15T22:30:47Z

geps/gep-1897.md

+    // Port is the network port that the implementation watches to
+    // know if the connection should be TLS and the targetRef’s
+    // certificate should be validated by the certs in TrustedCACertRefs


As a minor nit, I think implementations will essentially apply this policy to anywhere they're mapping a BackendRef that matches the combination of the TLSConnectionPolicy TargetRef and Port. This is not great, but maybe something like this would be a bit clearer, very open to improvements/changes though:

Suggested change

// Port is the network port that the implementation watches to

// know if the connection should be TLS and the targetRef’s

// certificate should be validated by the certs in TrustedCACertRefs

// Port is the network port of the target. When a target matches a BackendRef,

// this Policy should apply, resulting in the certificate served by the backend

// being validated by the certs in TrustedCACertRefs.

Pending a decision on removing Port.

x-ref #2113 (comment)

geps/gep-1897.md

howardjohn

Thanks!

howardjohn · 2023-06-15T22:48:52Z

geps/gep-1897.md

@@ -16,7 +16,7 @@ so in order to drive resolution this GEP focuses only on this single piece of fu
 1. The solution must satisfy the following use case: the backend pod has its own


I personally want to keep mesh transport out of scope forever, not just short term. Note that is not necessarily the same as "mtls", though.

Mesh transport is NOT a service level concern though, its an infrastructural concern. You would not configure ipsec per service, for example.

geps/gep-1897.md

howardjohn · 2023-06-15T22:54:28Z

geps/gep-1897.md

+    // 
+    // This field is required for any TLSConnectionPolicyConfig.
+    // 
+    // Support: Core - A single reference to a Kubernetes Secret of type kubernetes.io/tls


https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-trust-anchor-sets#clustertrustbundle-object ? Its in 1.27 I think, alpha though. Its also cluster scoped which is good or bad depending on how you look at it..

geps/gep-1897.md

howardjohn · 2023-06-15T22:57:09Z

geps/gep-1897.md

+type TLSConnectionPolicyConfig struct {
+    // TrustedCACertRefs contains one or more references to
+    // Kubernetes objects that contain TLS certificates, which are
+    // used to establish a TLS handshake between the gateway and


This needs more specificity on what we are supposed to do with it. Is it a client certificate? Is it a CA? Either? Both?

It depends, so it can be both. It is one or more reference to a certificate. When there is a chain of certificates, one or more can be CA/IntermediateCA.

howardjohn · 2023-06-15T22:59:52Z

geps/gep-1897.md

+| TLSRoute   | Listener Mode: Passthrough | Yes                                           | No                            |
+| TLSRoute   | Listener Mode: Terminate   | Yes                                           | Not supported                 |
+| TLSRoute   | Listener Mode: Passthrough | No                                            | No                            |
+| TLSRoute   | Listener Mode: Terminate   | No                                            | No                            |


I think this needs to take mesh clients into account (or rather, another table for mesh). Should they be following TLSConnectionPolicy?

No, the mesh case is not covered in this proposal.

I don't think we should be accepting new APIs into this repo that don't take mesh (now a core use case of the project) into account. I am not talking about handling "mesh transport", but we can't just have no answer for what mesh client that is implementing the API should do when they API is there. Even if it's saying "should always ignore it"

I spelled it out in Longer Term Goals that service mesh use cases are a worthy goal, but it's a goal that may need a different GEP for proper attention. I hope we don't get into a habit where we approve/merge a preliminary GEP with clearly stated goals and then request to change the goals later.

Can we add a small clarification to longer term goals to say that implementations that support those 3 use-cases should ignore TLSConnectionPolicy?

howardjohn · 2023-06-15T23:00:36Z

geps/gep-1897.md

+This table describes the effect that a TLSConnectionPolicy has on a Route.  There are only two cases where the
+TLSConnectionPolicy will signal a Route to connect to a backend using TLS, an HTTPRoute with a backend that is targeted
+by a TLSConnectionPolicy, either with or without listener TLS configured.  (There are a few other cases where it may be
+possible, but is purposely marked “not supported” due to a desire for less confusion on the assigned purpose of each of


What does an implementation do for "not supported"?

In my understanding, these are invalid configurations, so they wouldn't pass validation.

how can they fail validation? The configuration and the route are separate objects

That is true, I didn't think it through. So for "not supported", the intention really is that this is not supported, at least not in this proposal, where we're only covering HTTPRoute.

This seems like something that would benefit from both nested conditions (as suggested in https://github.com/kubernetes-sigs/gateway-api/pull/2113/files#r1239031972) and the SupportedFeatures GEP from @LiorLieberman. We can say that every implementation that claims supports for this policy needs to populate status on the policy describing which Routes it is being implemented for (and maybe also which Routes it is not being implemented for).

How about,

Every implementation that claims supports for TLSConnectionPolicy should document for which Routes it is being implemented.

howardjohn · 2023-06-15T23:01:12Z

geps/gep-1897.md

+| UDPRoute   | No listener TLS            | N/A                                           | No                            |
+| UDPRoute   | Listener TLS               | N/A                                           | No                            |
+| UDPRoute   | No listener TLS            | N/A                                           | No                            |
+| GRPCRoute  | N/A                        | N/A                                           | No                            |


should probably be the same as HTTP?

I'm not proposing any other changes in behavior except for HTTPRoute, so I'm not in a hurry to make GRPCRoute have the same behavior as HTTPRoute and be changed as well.

That feels fine for provisional but GRPCRoute should probably be supported for implementable. Not blocking right now though IMO

I would request to wait until GRPCRoute itself is no longer experimental. https://gateway-api.sigs.k8s.io/api-types/grpcroute/

Is that acceptable?

I think this may actually be pretty straightforward and identical to HTTP as @howardjohn suggested. @gnossen can correct me here, but I think we should expect the same config from TLS from Gateway -> Backend to apply equally to HTTPRoute and GRPCRoute.

As far as the experimental status of GRPCRoute, it's hard to get right. We've gotten into situations in upstream Kubernetes where we let in a bunch of incompatible alpha features and it became quite a pain to graduate any of them past alpha. In this case with GRPCRoute being fairly stable at this point, I'd rather cover it here unless it ends up adding significant complexity.

I will mark it "implementation-dependent" intstead of "Yes" in the "Connect to backend with TLS?" column.

I agree with @robscott . Disregarding any issues of experimental state, I think the values for GRPCRoute should be the same as for HTTPRoute in this table.

geps/gep-1897.md

howardjohn · 2023-06-16T00:48:14Z

geps/gep-1897.md

+these are currently implementation-dependent, with the following recommended defaults:
+- Server Name Indication: enables passing of the server name through server name indication, in the TLS transaction, to
+assist with selection of certificates when several hosts share the same IP address. (default to enabled)
+- Subject Alternative Name certificates: enable the use of a single certificate that can serve multiple domains.


What is "enabled"? this is a property of the verification. So does enabled mean we should verify it (we ought to by default or it's the equivalent of -k in curl).

If so what are we verifying? It's not just a bool.

Browsers handle this by asserting the DNS name matches the certificate, but a Kubernetes service doesn't really have as strict of a mapping.

I think what "enabled" means is that a SAN (multi-domain) certificate will be accepted when true. When false, a certificate containing SANs will be rejected right away. The verification is always done (unless disabled otherwise with a different setting).

Regarding what we are verifying it's a good question: in Kubernetes (or service meshes like Istio) certificates may contain SPIFFE ids but the client needs to be provided with a mapping (service to valid SPIFFE id) to authorize the server.

SAN is the only non-deprecated way to encode a DNS name as far as I know (CN was deprecated 20 years ago). So it's not a question of if we verify San or not - we MUST.

The question is what SAN do we verify. Tou don't have to get into spife even, the pod could have any certificate..

the simplest case would be to assume svc.ns.svc.cluster.local but I am not sure that is sufficient, or even a common usage

I agree that SAN certs have been around for long and CN-only certs have been deprecated so this setting is not needed/useful.

Regarding "what SAN do we verify": may be that needs to be an additional field here with defaults for inferred names like "svc.ns.svc.cluster.local"

The HTTPRoute hostname is the public internet facing hostname, not the internal service. So I'm a route from example.com->example-svc, I would Not expect the workload to have a cert for example.com

it could be example-svc.ns.svc.cluster.local maybe - but there is no standard

Isn't that very service-mesh specific to not expect the pod to have a cert for example.com? Alternatively, with SANs, it could list example.com and example-svc.ns.svc.cluster.local.

It is not related to service mesh in any way. I wouldn't expect them to have a cert for example.com since the Gateway itself has that. And it's not clear why you would serve the same cert in 2 places?

To be fair they could have any cert - even google.com (with their own ca, of course). So anything is possible

The internal workload on the pod can serve requests either from a Gateway or another internal workload (the mesh scenario) and in both cases is likely to serve the same certificate (unless SNI is used?) which is more likely to be example-svc and not example.com.

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

geps/gep-1897.md

pleshakov · 2023-06-20T19:46:40Z

geps/gep-1897.md

+    //
+    // * “Accepted” 
+    // 
+    // Possible reasons for this condition to be False are:


can ResolvedRefs condition be also used here? since the policy addresses external references - service, secrets

Sounds reasonable, unless there are privacy issues. As an implementor, what kind of condition reasons would you think should be there? Should there be a general validation on the certificate so it could have an invalid cert reason? Or does this store results of an analysis of reference grants between the policy and targets?

I think most implementations would only look at this if they were implementing a Service that was targeted by one of these policies, so I don't think ResolvedRefs would be relevant in that case, but I agree that it likely would be very useful for the secret reference(s) that will be included.

This one I will pass on unless there is further interest.

dprotaso · 2023-06-24T00:43:50Z

geps/gep-1897.md

+    TargetRef gatewayv1a2.PolicyTargetReference `json:"targetRef"`
+
+    // TLS contains TLS connection policy configuration.
+    TLS *TLSConnectionPolicyConfig `json:”tls”`


This might be better as a list that's keyed off the PortNumber

Then it's possible to have different TLS configurations for different ports in a single Service. Unless you were specifically looking to have those be distinct policies?

The guidelines are only one target ref per policy attachment. Maybe this also depends on whether we create a new type of target ref that includes port or ports.
x-ref https://github.com/kubernetes-sigs/gateway-api/pull/2113/files/5d84379d2aa570958ce6dbd3fb8cf187447e9f95#r1231594914

To close this one out, we're adding sectionName to Policy TargetRef - which will allow you to target the port's name field.

dprotaso · 2023-06-24T00:46:37Z

geps/gep-1897.md

+    // if the connection should be TLS, the targetRef’s certificate
+    // should be validated by the certs in TrustedCACertRefs, and a
+    // status delivered in the response for validation failures.
+    Port PortNumber `json:port,omitempty`


Do we care about letting policy authors use port names and not just numbers?

Similar to how K8s Service ports.targetPort is IntOrString ?

Let's decide on whether we need a new object that includes Port, then we can decide on this.

robscott · 2023-08-28T20:32:33Z

why not have a valid list of hostnames and have the first entry be used for SNI. This would save you from needing to do a migration in the future (eg. projectcontour/contour#5520)

The goal here is to start with the simplest possible API and also ensure portability as broadly as possible. As I mentioned in my 4th point, this is going to be the first API we've introduced where partial support is unlikely to suffice. Implementations will likely either need to fully implement the resource or consider the backend invalid.

I think our migration path is also a bit simpler than what Contour would be dealing with. In a future world we might have something like this:

sni: This field will configure the SNI the Gateway should use to connect to the backend. This field will configure the SNI the Gateway should use to connect to the backend. Implementations MUST validate that at least one name in the certificate served by the backend matches this field unless the subjectNames field is configured.
subjectNames: Implementations MUST validate that at least one name in the certificate served by the backend matches one name specified in this list.

dprotaso · 2023-08-28T21:00:55Z

@robscott sounds good - subjectNames support can be an subsequent follow up GEP

youngnick · 2023-08-28T22:00:42Z

Thanks for all the work on this @candita! I've got some high level thoughts and suggestions on this one:

Split trustedCACertRefs into trustedCA.certRefs and trustedCA.wellKnownRoots. The wellKnownRoots field will be an enum that allows users to specify NONE or SYSTEM values. When SYSTEM is specified, the implementation will use the set of CAs trusted by the Gateway. The default value for wellKnownRoots would be NONE, users must specify either SYSTEM for wellKnownRoots or provide ref(s) to custom CA Certs.

Rob and I have spoken about this, and I agree with these points - except that I think that the default value for wellKnownRoots should be "", but otherwise have the same behavior as NONE - in that you must specify either system (note lower case instead of all caps to match our usual constant value types), in which case the implementation must use whatever system provided CAs are available, or, probably more commonly, you provide at least one entry in trustedCA.certRefs to a valid Configmap or Secret containing a PEM-encoded CA certificate bundle.

mikemorris · 2023-08-28T23:05:24Z

Transition from allowedSANs to a new sni or hostname field that will be required. This field will configure the SNI the Gateway should use to connect to the backend. Implementations MUST validate that at least one name in the certificate served by the backend matches this field. In the future we may need to support a variation between SNI and allowed SAN(s), but to start it feels like we can cover >90% of use cases with this simpler configuration.

It looks like the existing PreciseHostname API type would be the right fit for the type of this field (as opposed to Hostname which allows wildcard values).

candita · 2023-08-30T00:05:24Z

3. Split trustedCACertRefs into trustedCA.certRefs and trustedCA.wellKnownRoots. The wellKnownRoots field will be an enum that allows users to specify NONE or SYSTEM values. When SYSTEM is specified, the implementation will use the set of CAs trusted by the Gateway. The default value for wellKnownRoots would be NONE, users must specify either SYSTEM for wellKnownRoots or provide ref(s) to custom CA Certs.

I'm going to rename the wellKnownRoots to alternateCerts, or maybe caCerts, which I feel are more slightly more descriptive for this peculiar field.

Edit - I changed this a little, adding CertRefs and CACertRefs in the place of trustedCACertRefs. Otherwise, I adopted this suggestion.

candita · 2023-08-30T01:48:51Z

@robscott @youngnick @costinm, I Just pushed requested changes, PTAL when you get a chance.

Change goals to remove references to TLS versions and cipher suites (use case # 5)
Move TLS versions into Longer Term Goals
Clear up language in various places to support changes to API field trustedCACerts
Add reference to issue Update TargetREf in Policy GEP #2147
Add 3 new API fields to TLS - CertRefs, CACertRefs, and Hostname
Remove from TLS - TrustedCACerts, AllowedSans, TLSVersions
Rename Open Questions to Answered Questions and move some unanswered questions to Graduation Criteria
Add Graduation Criteria section

geps/gep-1897.md

youngnick · 2023-08-30T05:43:04Z

geps/gep-1897.md

+The names of the fields were chosen to facilitate discussion, but may be substituted without blocking acceptance of the
+content of the API change.
+
+The `TrustedCA` contains the `CertRefs` and `CACertRefs` fields. CertRefs is a slice of


I know, naming is hard, but I think it's important that the field currently called CACertRefs does not end with Refs - that is reserved in the API conventions for structs that contain a list of references to objects.

How about instead of CACertRefs, we have UseExtraCerts - I suggest this because it has two possible values (YAML shown for ease of typing):

trustedCAs.useExtraCerts: "" (or unspecified) - don't use any extra certificates for Trusted CAs.

trustedCAs.useExtraCerts: "system" - use the system certificates as extra trusted CAs.

This name implies that it's possible to use CertRefs with UseExtraCerts - which I think has undefined behavior at this time.

Other names I can think of:

AdditionalCerts (YAML trustedCAs.additionalCerts) Also implies a merging behavior between CertRefs and AdditionalCerts.

AlternativeCerts (YAML trustedCAs.alternativeCerts) Implies that you can only choose either CertRefs or AdditionalCerts - they are mutually exclusive.

Whatever name we choose, we need to ensure that the behavior when you set both is defined. ("That's a syntax error" is okay, as is "the lists must be merged" or other possible answers, we just need to pick one, which we can also mark as temporary - given that this is Provisional.)

Again, naming is hard. ping @robscott as well as @candita.

Yeah I've struggled with naming on this one as well. Of course I'd originally proposed wellKnownRoots but obviously that has it's flaws, and not all systems are necessarily going to be using "well-known" values. So maybe we should just narrow down the scope of this field a bit. It seems very unlikely that Kubernetes or Gateway API will ever try to maintain a common set of certificates that should be used for this purpose, so we can likely say that whatever values are supported here other than none/empty, are going to be at least somewhat implementation-specific.

I think we should use the following guidelines when choosing a name here:

Avoid a name that locks us into whether/how this combines with certRefs. My personal opinion is that we should start with validation that ensures that only certRefs or this field can be set, and we can loosen that later if needed. I think this would rule out additional, alternative, or extra.

Avoid a boolean value so we're not conflicting with the relevant Kubernetes API Conventions.

Building on 2, allow for some theoretical room for expansion. I think the most likely value here would be domain-prefixed values if for some reasons implementations want to support predefined sets of CAs. Even this seems highly unlikely at this point, but want to at least leave room for it so we're not boxed in years from now.

With all that said, I'm struggling to actually find a good name that meets this criteria. Some more ideas:

A. trustedCAs.systemCerts: "System"
B. trustedCAs.predefinedCerts: "System"
C. trustedCAs.standardCerts: "System"

As far as whether the empty value should be explicit ("None") or just empty (nil pointer or ""), I tend to lean towards a default of "None" so there's absolutely no ambiguity here. If we'd rather avoid "None", I think I'd lean towards a nil pointer instead of "", but open to what others think.

@youngnick I changed the name of CACertRefs, didn't realize the Refs suffix had a semantic association. Should I also change the name of CertRefs, which a slice of ConfigMapNames, or is that fitting?

@robscott and I had this conversation last night and I originally proposed alternativeCerts (amongst others) but I get Rob's point about preferring to use a name that doesn't lock us into how this field combines with certRefs. I chose standardCerts. certRefs and standardCerts are intended to be mutually exclusive in the first round.

Regarding using a nil pointer instead of "" for standardCerts, I'd rather leave it "" so that the field is visible when yaml is examined. If we don't normally do that for our users, I can make it a pointer.

Oh, I missed that CertRefs is a slice of strings that are ConfigMap names. Yes, that should be a []ObjectReference instead, with a default of Configmap as the Kind (which puts "", meaning "core" as the Group). That way, you can specify a list of certificates in Configmaps like this:

... certRefs: - name: CAcert1 - name: CAcert2 ...

or, you can also specify other objects like Secrets if you want. Anything not a ConfigMap should be Implementation Specific support in the initial version though. This lets us change that much more easily lately. (Practically, to be able to set the defaults, you'll need to create another type like SecretObjectReference - probably ConfigmapObjectReference).

Using a nil pointer or "" as the default value makes no difference for display - because the field is optional and will have omitempty set, then a nil pointer (for a *string type) or an empty string (for a string type) will both be not necessary in input YAMLs, and hidden in output YAMLs. The only way you'll see the standardCerts field (which I agree is a good name) is if it's set to system. However, if you do set standardCerts: system, then the CertRefs field also must be empty in this version, so it will also be hidden on output.

So you'll only ever see:

... certRefs: - name: CAcert1 ...

or

... standardCerts: "system" ...

Which I think is what we want.

However, there's also an interaction with Enum validation, in that we can't specify "" as an Enum value. So, I recommend changing standardCerts to a *string, because then we can have an Enum with only one value (system).
This lets us add further Enum values later, while keeping the above behavior.

So, my requested changes here are:

CertRefs should be a []ConfigmapObjectReference, where that is an ObjectReference struct that defaults to Configmap.

standardCerts should be a string pointer field with Enum validation, and one valid value: system (or System if that's what we prefer - personally, I prefer these constants to be all lower-case, it makes validation easier).

@youngnick we discussed offline, and I made the requested change to CertRefs and changed StandardCerts like this:

standardCerts should be a StandardCertType pointer field with Enum validation, and one valid value: System. An informal review showed that alias string values usually start with a capital letter in this API, particularly condition types and reasons, so I used System rather than system.

youngnick · 2023-08-30T05:45:46Z

geps/gep-1897.md

+    // If the CertRefs is empty, then the system trusted
+    // certificates should be used. If there are none, or the
+    // implementation doesn't define system trusted certificates,
+    // then a TLS connection must fail.


We should remove this because this behavior is now handled by the other field (CACertRefs here, but as I said above, I think it needs a new name).

I think a user could accidentally try to use System certs but the system doesn't have them, so I think it should stay in.

youngnick

Thanks Github, for being so slow I duplicated my comment.

youngnick

I have one concern about field naming, and a small fix to the struct details, but aside from that, this LGTM.

candita · 2023-08-30T23:49:13Z

@youngnick @robscott I made the updates you suggesed and added a couple of CEL validations to TLSConnectionPolicyConfig. PTAL the changes in the latest commit when you get a chance.

robscott

Thanks for all the work on this @candita! A couple tiny nits and a suggestion to move back to a backend focused policy name but otherwise I think this is good to go.

geps/gep-1897.md

candita · 2023-09-06T20:47:03Z

/label tide/merge-method-squash

robscott · 2023-09-06T21:28:35Z

🎉 Thanks for all the work on this @candita!

/lgtm
/approve

k8s-ci-robot · 2023-09-06T21:28:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita, keithmattix, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~geps/OWNERS~~ [robscott]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

youngnick · 2023-09-07T00:02:18Z

🎉 Thanks for all your work on this @candita!

GEP-1897 Update with API details

79deab7

k8s-ci-robot requested review from bowei and keithmattix June 14, 2023 01:23

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 14, 2023

arkodg reviewed Jun 14, 2023

View reviewed changes

gcs278 reviewed Jun 14, 2023

View reviewed changes

geps/gep-1897.md Outdated Show resolved Hide resolved

gcs278 reviewed Jun 14, 2023

View reviewed changes

geps/gep-1897.md Outdated Show resolved Hide resolved

GEP-1897 Update with API details - suggested changes kubernetes-sigs#1

7eb7dd6

sanjaypujare reviewed Jun 15, 2023

View reviewed changes

robscott reviewed Jun 15, 2023

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 15, 2023

howardjohn reviewed Jun 15, 2023

View reviewed changes

howardjohn reviewed Jun 16, 2023

View reviewed changes

candita and others added 3 commits June 15, 2023 22:55

Update geps/gep-1897.md

31dce71

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

Update geps/gep-1897.md

5fd598f

Co-authored-by: Rob Scott <rob.scott87@gmail.com>

GEP-1897 Update with API details - suggested changes kubernetes-sigs#2

5d84379

candita force-pushed the gep-1897-backend-tls-update branch from 12a6b24 to 5d84379 Compare June 16, 2023 15:16

sanjaypujare reviewed Jun 18, 2023

View reviewed changes

geps/gep-1897.md Show resolved Hide resolved

pleshakov reviewed Jun 20, 2023

View reviewed changes

dprotaso reviewed Jun 24, 2023

View reviewed changes

GEP-1897 Update with API details - suggested changes 8

2c04f4e

youngnick reviewed Aug 30, 2023

View reviewed changes

geps/gep-1897.md Outdated Show resolved Hide resolved

youngnick reviewed Aug 30, 2023

View reviewed changes

GEP-1897 Update with API details - suggested changes 9

ae5d524

GEP-1897 Update with API details - suggested changes 10

b261560

robscott approved these changes Sep 6, 2023

View reviewed changes

geps/gep-1897.md Outdated Show resolved Hide resolved

geps/gep-1897.md Outdated Show resolved Hide resolved

geps/gep-1897.md Outdated Show resolved Hide resolved

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2023

GEP-1897 Update with API details - suggested changes 11

79f818b

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 6, 2023

k8s-ci-robot assigned robscott Sep 6, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2023

k8s-ci-robot merged commit d09b142 into kubernetes-sigs:main Sep 6, 2023
9 checks passed

arkodg mentioned this pull request Sep 18, 2023

Adds API to GEP-91: Client Certificate Validation #2273

Merged

shaneutt mentioned this pull request Sep 18, 2023

shaneutt/conformance tests for gateway addresses shaneutt/gateway-api#5

Closed

candita mentioned this pull request Sep 22, 2023

REQUEST: New membership for candita kubernetes/org#4477

Closed

9 tasks

candita mentioned this pull request Oct 5, 2023

Update GEP-713 and add BackendTLSPolicy implementation #2448

Merged

		@@ -16,7 +16,7 @@ so in order to drive resolution this GEP focuses only on this single piece of fu
		1. The solution must satisfy the following use case: the backend pod has its own

GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113

GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113

Conversation

candita commented Jun 14, 2023

k8s-ci-robot commented Jun 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanjaypujare commented Jun 14, 2023

candita commented Jun 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick Jul 20, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

howardjohn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick Jul 20, 2023 •

edited