kubernetes-sigs · k8s-ci-robot · Sep 6, 2023 · Jun 14, 2023 · Jun 15, 2023 · Jun 16, 2023
diff --git a/geps/gep-1897.md b/geps/gep-1897.md
@@ -69,7 +69,7 @@ including:
 * SANs for validating upstream service (server authentication)
 * client certificate of the gateway (client authentication)
 
-## Purpose - why do we want to this?
+## Purpose - why do we want to do this?
 
 This proposal is _very_ tightly scoped because we have tried and failed to address this well-known
 gap in the API specification. The lack of support for this fundamental concept is holding back
@@ -125,7 +125,234 @@ This GEP is the outcome of the TLS use cases #4 and #5 in
 
 ## API
 
-Details deferred until we reach consensus on what we want to do, and why we want to do this.
+To allow the gateway client to know how to connect to the backend pod, when the backend pod has its own
+certificate, we implement a metaresource that is already mentioned as an example in
+[GEP-713: Metaresources and PolicyAttachment](https://gateway-api.sigs.k8s.io/geps/gep-713/), and as a hypothetical in the
+[Policy Attachment](https://gateway-api.sigs.k8s.io/references/policy-attachment/#direct-policy-attachment)
+documentation.
+
+This metaresource is named TLSConnectionPolicy.  In this document, because naming is hard, we chose to retain the
+name TLSConnectionPolicy to advertise alignment with a previously discussed naming choice, but a new name may be
+substituted without blocking acceptance of the content of the API change.
+
+The selection of the applicable Gateway API persona is important in the design of this proposal, because each
+persona is assigned a role which handles specific Gateway API resources.  TLSConnectionPolicy is used by the application
+developer Gateway API persona to convey client certificate settings used in the TLS handshake from Gateway to backend,
+because this persona handles resources involved with application access and configuration, such as Routes and Services.
-persona is assigned a role which handles specific Gateway API resources.  TLSConnectionPolicy is used by the application
-developer Gateway API persona to convey client certificate settings used in the TLS handshake from Gateway to backend,
-because this persona handles resources involved with application access and configuration, such as Routes and Services.
+persona is assigned a role which handles specific Gateway API resources.  TLSConnectionPolicy is used by the application
+developer Gateway API persona to signal what the application developer _expects_ connections to the application to look like from a TLS perspective. 
+Only the application developer can know what the application expects, so this it's important that this configuration be owned by that persona.
-persona is assigned a role which handles specific Gateway API resources.  TLSConnectionPolicy is used by the application
-developer Gateway API persona to convey client certificate settings used in the TLS handshake from Gateway to backend,
-because this persona handles resources involved with application access and configuration, such as Routes and Services.
+persona is assigned a role which handles specific Gateway API resources.  TLSConnectionPolicy is used by the application
+developer Gateway API persona to signal what the application developer _expects_ connections to the application to look like from a TLS perspective. 
+Only the application developer can know what the application expects, so this it's important that this configuration be owned by that persona.
+Choosing any other role would move the application-related responsibility from the application developer role to that
+role, which violates the role-oriented design principle of Gateway API. As mentioned in Non-goal #7, providing a
+mechanism for the cluster operator gateway role to override gateway to backend TLS settings is not covered here, but can
+be addressed in a future update should the need arise.
+
+TLSConnectionPolicy is defined as a Direct Policy Attachment without defaults or overrides, applied to a Service that
+accesses the backend in question, where the TLSConnectionPolicy resides in the same namespace as the Service it is
+applied to.  The TLSConnectionPolicy and the Service must reside in the same namespace in order to prevent the
+complications involved with sharing trust across namespace boundaries.  By choosing the Service resource rather than the
+Route resource, we can reuse the same TLSConnectionPolicy for all the different Routes that might point to this Service.
+For the use case where certificates are stored in their own namespace, users may create Secrets and use ReferenceGrants
+for a TLSConnectionPolicy-to-Secret binding.
+
+In the API defined here, the definition of TrustedCACertRefs follows a convention established by TLSRoute in
+https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1beta1/gateway_types.go#L340
+
+One of the areas of concern for this API is that we need to indicate how and when the API implementations should use the
+backend destination certificate authority.  This solution proposes, as introduced in
+[GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/), that the implementation
+should watch the connections to a specific port on a specified targetRef (such as a Service), and if the port and
+Service match a TLSConnectionPolicy, then assume the connection is TLS, and verify that the targetRef’s certificate can
+be validated by the provided trusted CA certificates before the connection is made.  On the question of how to signal
+that there was a failure in the certificate validation, this is left up to the implementation to return a response error
+that is appropriate, such as one of the HTTP error codes: 400 (Bad Request), 401 (Unauthorized), 403 (Forbidden), or
+other signal that makes the failure sufficiently clear to the requester without revealing too much about the transaction,
+based on established security requirements.
+
+Not covered here, but possible to add would be additional configuration options mentioned in [SIG-NET Gateway API: TLS
+to the K8s.Service/Backend](https://docs.google.com/document/d/1RTYh2brg_vLX9o3pTcrWxtZSsf8Y5NQvIG52lpFcZlo).  All of
+these are currently implementation-dependent, with the following recommended defaults:
+- Visibility: the visibility level could be all or none, indicating that if the connection failed due to validation
+failures, it would drop the connection silently if the visibility level were none, and report an error if the visibility
+level were all.  This is left as implementation-dependent.
+- Server Name Indication: enables passing of the server name through server name indication, in the TLS transaction, to
+assist with selection of certificates when several hosts share the same IP address. (default to enabled)
+- Subject Alternative Name certificates: enable the use of a single certificate that can serve multiple domains.
+(default to enabled)
+- Version: specifies the minimum TLS version that the connection may use (default TLSv1.2)
+- Ciphers: specifies enabled ciphers to be used in TLS exchanges. (default to align with TLS versions 1.1, 1.2, and/or
+1.3 as described in [RFC4346 TLS 1.1](https://datatracker.ietf.org/doc/html/rfc4346#appendix-A.5),
+[RFC 5246 TLS 1.2](https://datatracker.ietf.org/doc/html/rfc5246#appendix-A.5), and
+[RFC 8446 TLS 1.3](https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.4).
+- Version: specifies the minimum TLS version that the connection may use (default TLSv1.2)
+- Visibility: the visibility level could be all or none, indicating that if the connection failed due to validation
+failures, it would drop the connection silently if the visibility level were none, and report an error if the visibility
+level were all.  (default to all)
+
+```go
+// TLSConnectionPolicy provides a way to publish TLS configuration
+// that enables a gateway client to connect to a backend pod.
+type TLSConnectionPolicy struct {
+    metav1.TypeMeta   `json:",inline"`
+    metav1.ObjectMeta `json:"metadata,omitempty"`
+
+    // Spec defines the desired state of TLSConnectionPolicy.
+    Spec TLSConnectionPolicySpec `json:"spec"`
+
+    // Status defines the current state of TLSConnectionPolicy.
+    Status TLSConnectionPolicyStatus `json:"status,omitempty"`
+}
+
+// TLSConnectionPolicySpec defines the desired state of
+// TLSConnectionPolicy.
+// Note: there is no Override or Default policy configuration.
+type TLSConnectionPolicySpec struct {
+    // TargetRef identifies an API object to apply policy to.
+    TargetRef gatewayv1a2.PolicyTargetReference `json:"targetRef"`
+
+    // TLS contains TLS connection policy configuration.
+    TLS *TLSConnectionPolicyConfig `json:”tls”`
+}
+
+// TLSConnectionPolicyConfig contains TLS connection policy configuration.
+type TLSConnectionPolicyConfig struct {
+    // TrustedCACertRefs contains one or more references to
+    // Kubernetes objects that contain TLS certificates, which are
+    // used to establish a TLS handshake between the gateway and
+    // backend pod.
+    //
+    // A single TrustedCACertRef to a Kubernetes Secret has "Core"
+    // support.  Implementations MAY choose to support attaching
+    // multiple certificates to a backend, but this behavior is 
+    // implementation-specific.
+    //
+    // References to a resource in a different namespace are 
+    // invalid.
+    // 
+    // This field is required for any TLSConnectionPolicyConfig.
+    // 
+    // Support: Core - A single reference to a Kubernetes Secret of type kubernetes.io/tls
+    //
+    // Support: Implementation-specific (More than one reference or other resource types)
+    //
+    // +kubebuilder:validation:MaxItems=64
+    // +kubebuilder:validation:MinItems=1
+    TrustedCACertRefs []SecretObjectReference `json:”trustedCACertRefs”`
+
+    // Port is the network port that the implementation watches to
+    // know if the connection should be TLS and the targetRef’s
+    // certificate should be validated by the certs in TrustedCACertRefs
-    // Port is the network port that the implementation watches to
-    // know if the connection should be TLS and the targetRef’s
-    // certificate should be validated by the certs in TrustedCACertRefs
+    // Port is the network port of the target. When a target matches a BackendRef,
+    // this Policy should apply, resulting in the certificate served by the backend
+    // being validated by the certs in TrustedCACertRefs.
-    // Port is the network port that the implementation watches to
-    // know if the connection should be TLS and the targetRef’s
-    // certificate should be validated by the certs in TrustedCACertRefs
+    // Port is the network port of the target. When a target matches a BackendRef,
+    // this Policy should apply, resulting in the certificate served by the backend
+    // being validated by the certs in TrustedCACertRefs.
+    // If empty, then all ports for the targetRef are watched to know
+    // if the connection should be TLS, the targetRef’s certificate
+    // should be validated by the certs in TrustedCACertRefs, and a
+    // status delivered in the response for validation failures.
+    Port PortNumber `json:port,omitempty`
+}
+
+// TLSConnectionPolicyStatus defines the observed state of TLSConnectionPolicy.
+type TLSConnectionPolicyStatus struct {
+    // Conditions describe the current conditions of the TLSConnectionPolicy.
+    //
+    // Implementations should prefer to express TLSConnectionPolicy
+    // conditions using the `TLSConnectionPolicyConditionType` and
+    // `TLSConnectionPolicyConditionReason` constants so that 
+    // operators and tools can converge on a common vocabulary to 
+    // describe TLSConnectionPolicy state.
+    // Known condition types are:
+    // 
+    // * “Accepted”
+    // 
+    // +optional
+    // +listType=map
+    // +listMapKey=type
+    // +kubebuilder:validation:MaxItems=8
+    // +kubebuilder:default={type: "Accepted", status: "Unknown", reason:"Pending", message:"Waiting for validation", lastTransitionTime: "1970-01-01T00:00:00Z"}
+    Conditions []metav1.Condition `json:"conditions,omitempty"`
+}
+
+// TLSConnectionPolicyConditionType is the type of a condition used
+// as a signal by TLSConnectionPolicy.  This type should be used with
+// the TLSConnectionPolicyStatus.Conditions field.
+type TLSConnectionPolicyConditionType string
+
+//  TLSConnectionPolicyConditionReason is a reason that explains why a
+// particular TLSConnectionPolicyConditionType was generated.
+type TLSConnectionPolicyConditionReason string
+
+const (
+    // This condition indicates that the TLSConnectionPolicy has been
+    // accepted as valid.
+    // Possible reason for this condition to be True is:
+    // * “Validated”
+    // Possible reasons for this condition to be False are:
+    // * “Invalid”
+    // * “Pending”
+    TLSConnectionPolicyConditionAccepted TLSConnectionPolicyConditionType = “Accepted”
+
+	// This reason is used with the “Accepted” condition when the condition is true.
+    TLSConnectionPolicyReasonAccepted TLSConnectionPolicyConditionReason = “Valid”
+
+    // This reason is used with the “Accepted” condition when the TLSConnectionPolicy is invalid, e.g. crossing namespace boundaries.
+    TLSConnectionPolicyReasonInvalid TLSConnectionPolicyConditionReason = “Invalid”
+
+    // This reason is used with the “Accepted” condition when the TLSConnectionPolicy is pending validation.
+    TLSConnectionPolicyReasonPending TLSConnectionPolicyConditionReason = “Pending”
+)
+```
+
+## How a client behaves
+
+This table describes the effect that a TLSConnectionPolicy has on a Route.  There are only two cases where the
+TLSConnectionPolicy will signal a Route to connect to a backend using TLS, an HTTPRoute with a backend that is targeted
+by a TLSConnectionPolicy, either with or without listener TLS configured.  (There are a few other cases where it may be
+possible, but is purposely marked “not supported” due to a desire for less confusion on the assigned purpose of each of
+the protocol-affiliated types of Routes.)
+
+| Route Type | Gateway Config             | Backend is targeted by a TLSConnectionPolicy? | Connect to backend  with TLS? |
+|------------|----------------------------|-----------------------------------------------|-------------------------------|
+| HTTPRoute  | Listener tls               | Yes                                           | **Yes**                       |
+| HTTPRoute  | No listener tls            | Yes                                           | **Yes**                       |
+| HTTPRoute  | Listener tls               | No                                            | No                            |
+| HTTPRoute  | No listener tls            | No                                            | No                            |
+| TLSRoute   | Listener Mode: Passthrough | Yes                                           | No                            |
+| TLSRoute   | Listener Mode: Terminate   | Yes                                           | Not supported                 |
+| TLSRoute   | Listener Mode: Passthrough | No                                            | No                            |
+| TLSRoute   | Listener Mode: Terminate   | No                                            | No                            |
+| TCPRoute   | Listener TLS               | Yes                                           | Not supported                 |
+| TCPRoute   | No listener TLS            | Yes                                           | Not supported                 |
+| TCPRoute   | Listener TLS               | No                                            | No                            |
+| TCPRoute   | No listener TLS            | No                                            | No                            |
+| UDPRoute   | Listener TLS               | N/A                                           | No                            |
+| UDPRoute   | No listener TLS            | N/A                                           | No                            |
+| UDPRoute   | Listener TLS               | N/A                                           | No                            |
+| UDPRoute   | No listener TLS            | N/A                                           | No                            |
+| GRPCRoute  | N/A                        | N/A                                           | No                            |
+
+## Request Flow
+
+One additional step would be added to the typical client/gateway API request flow for a gateway implemented using a
+reverse proxy. This is shown as step 6 below.
+
+1. A client makes a request to http://foo.example.com.
+2. DNS resolves the name to a Gateway address.
+3. The reverse proxy receives the request on a Listener and uses the Host header to match an HTTPRoute.
+4. Optionally, the reverse proxy can perform request header and/or path matching based on match rules of the HTTPRoute.
+5. Optionally, the reverse proxy can modify the request, i.e. add/remove headers, based on filter rules of the HTTPRoute.
+6. __(New) Optionally, the reverse proxy can determine the outcome of verifying the cert served by the backend, based on
+backendRef rules of the HTTPRoute.__
+7. Lastly, the reverse proxy forwards the request to one or more objects, i.e. Service, in the cluster based on
+backendRefs rules of the HTTPRoute.
+
+## Alternatives
+Most alternatives are enumerated in the section on the history of backend TLS above.  A couple of additional
+alternatives are also listed here.
+
+1. Expand BackendRef, which is already an expansion point.  At first, it seems logical that since listeners are handling
+the client-gateway certs, BackendRefs could handle the gateway-backend certs.  However, when multiple Routes to target
+the same Service, there would be unnecessary copying of the BackendRef every time the Service was targeted.  As well,
+there could be multiple bBackendRefs with multiple rules on a rRoute, each of which might need the gateway-backend cert
+configuration so it is not the appropriate pattern.
+2. Extend HTTPRoute to indicate TLS backend support. Extending HTTPRoute would interfere with deployed implementations
+too much to be a practical solution.
+3. Add a new type of Route for backend TLS.  This is impractical because we might want to enable backend TLS on other
+route types in the future, and because we might want to have both TLS listeners and backend TLS on a single route.
 
 ## Prior Art
 
@@ -181,19 +408,34 @@ Ref: [Upstream.TLS](https://docs.nginx.com/nginx-ingress-controller/configuratio
 Ref: [EgressMTLS](https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#egressmtls)
 Ref: [IngressMTLS](https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#ingressmtls)
 
-## Open Questions (TODO)
+## Open Questions
 
-This section is to record issues that should be discussed in the implementation section before this GEP moves
+This section is to record issues that were warranted for discussion in the API section before this GEP moves
 out of `Provisional` status.
 
 1. Bowei recommended that we mention the approach of cross-namespace referencing between Route and Service.
-Be explicit about using the standard rules with respect to attaching policies to resources.
+Be explicit about using the standard rules with respect to attaching policies to resources.  This is mentioned in the
+API section.
 2. Costin recommended that Gateway SHOULD authenticate with either a JWT with audience or client cert
-or some other means - so gateway added headers can be trusted, etc.
+or some other means - so gateway added headers can be trusted, amongst other things.  This is out of scope for this
+proposal, which centers around application developer persona resources such as HTTPRoute and Service.
 3. Costin mentioned we need to answer the question - is configuring the connection to a backend and TLS
-something the route author decides - or the backend owner?  Same for SANs.  However, providing a mechanism
-for the cluster operator to override gateway to backend TLS settings is already listed as a Non-Goal.
+something the route author decides - or the backend owner?  Same for SAN (Subject Alternative Name) certificates.
+The use of SAN certificates and the use of SNI, as a part of TLS, can be implementation-dependent, though the
+application can still reject any request or certificate that it doesn’t support, including requests with SNI or a
+certificate with SANs.
 
 ## References
 
 [Gateway API TLS Use Cases](https://docs.google.com/document/d/17sctu2uMJtHmJTGtBi_awGB0YzoCLodtR6rUNmKMCs8/edit#heading=h.cxuq8vo8pcxm)
+https://gateway-api.sigs.k8s.io/geps/gep-713/
+https://gateway-api.sigs.k8s.io/references/policy-attachment/
+https://gateway-api.sigs.k8s.io/v1alpha2/guides/tls/
+https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#egressmtls
+[SIG-NET[Gateway API]: TLS to the K8s.Service/Backend](https://docs.google.com/document/d/1RTYh2brg_vLX9o3pTcrWxtZSsf8Y5NQvIG52lpFcZlo)
+https://serverfault.com/questions/807959/what-is-the-difference-between-san-and-sni-ssl-certificates
+[GEP-713: Metaresources and PolicyAttachment](https://gateway-api.sigs.k8s.io/geps/gep-713/)
+[Policy Attachment](https://gateway-api.sigs.k8s.io/references/policy-attachment/#direct-policy-attachment)
+[RFC4346 TLS 1.1](https://datatracker.ietf.org/doc/html/rfc4346#appendix-A.5)
+[RFC 5246 TLS 1.2](https://datatracker.ietf.org/doc/html/rfc5246#appendix-A.5)
+[RFC 8446 TLS 1.3](https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.4)