Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1897 Explicit Backend TLS Connection Configuration (was TLS from Gateway to Backend...) Update with API details #2113

Merged
Merged
258 changes: 250 additions & 8 deletions geps/gep-1897.md
Expand Up @@ -69,7 +69,7 @@ including:
* SANs for validating upstream service (server authentication)
* client certificate of the gateway (client authentication)

## Purpose - why do we want to this?
## Purpose - why do we want to do this?

This proposal is _very_ tightly scoped because we have tried and failed to address this well-known
gap in the API specification. The lack of support for this fundamental concept is holding back
Expand Down Expand Up @@ -125,7 +125,234 @@ This GEP is the outcome of the TLS use cases #4 and #5 in

## API
candita marked this conversation as resolved.
Show resolved Hide resolved

Details deferred until we reach consensus on what we want to do, and why we want to do this.
To allow the gateway client to know how to connect to the backend pod, when the backend pod has its own
certificate, we implement a metaresource that is already mentioned as an example in
[GEP-713: Metaresources and PolicyAttachment](https://gateway-api.sigs.k8s.io/geps/gep-713/), and as a hypothetical in the
[Policy Attachment](https://gateway-api.sigs.k8s.io/references/policy-attachment/#direct-policy-attachment)
candita marked this conversation as resolved.
Show resolved Hide resolved
documentation.

This metaresource is named TLSConnectionPolicy. In this document, because naming is hard, we chose to retain the
name TLSConnectionPolicy to advertise alignment with a previously discussed naming choice, but a new name may be
substituted without blocking acceptance of the content of the API change.

The selection of the applicable Gateway API persona is important in the design of this proposal, because each
persona is assigned a role which handles specific Gateway API resources. TLSConnectionPolicy is used by the application
developer Gateway API persona to convey client certificate settings used in the TLS handshake from Gateway to backend,
because this persona handles resources involved with application access and configuration, such as Routes and Services.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to at least list another option here. If we're saying that the Application Developer is the only one that can/should configure this policy, we're basically saying that the Application Developer is responsible for both:

  1. Configuring the cert the backend will use
  2. Configuring how the Gateway should validate the cert

For the validation to really be meaningful and useful, 1 and 2 feel like they should be done by different people. I get that in reality they're often configured by the same person, but I'd like to suggest that we should at least describe that potential problem here and recommend that another persona, such as the Cluster Operator may want to be involved here.

This whole line of thinking has made me wonder if any users would actually want to configure a set of trusted backend CAs per Gateway instead of per Service. I'm less sure of that one and am only raising this in case someone else has also had similar thoughts/interests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important part here is to consider whose intent we are representing in this API. I suggest something like this:

Suggested change
persona is assigned a role which handles specific Gateway API resources. TLSConnectionPolicy is used by the application
developer Gateway API persona to convey client certificate settings used in the TLS handshake from Gateway to backend,
because this persona handles resources involved with application access and configuration, such as Routes and Services.
persona is assigned a role which handles specific Gateway API resources. TLSConnectionPolicy is used by the application
developer Gateway API persona to signal what the application developer _expects_ connections to the application to look like from a TLS perspective.
Only the application developer can know what the application expects, so this it's important that this configuration be owned by that persona.

The important part here is that the application running inside the Pods expects connections to look a certain way - the TLSConnectionPolicy object is about having a way for the application developer to explicitly describe that expectation.

We can use this way of looking at things to make calls about what should be included in the TLSConnectionPolicy.

  • The backend should already be managing its own cert somehow, so we shouldn't need to represent that here.
  • The Gateway, as a TLS client, needs to know some things about how to connect to the service (most of these are handled automatically for end user clients by the browser):
    • that TLS should be attempted at all (a browser will use the scheme to determine this, but Gateways don't have that information available at an API level - it could be intuited from some config, but it should be explicitly configured.)
    • What CA should be used (a browser usually will use the system bundled CAs, and allow adding extras, it's much more common to use either individual self-signed certificates or an internal CA for Gateway-to-internal-backend use cases)
    • What version of TLS should be expected (this is not strictly necessary, because TLS is a negotiated protocol, but it is very nice to have because it will allow Gateway implementations to do things like disallow TLS 1.1, force only TLS 1.3, and so on).

While the Gateway as a TLS client does need to know if it should pass a client certificate, that should be configured on the Gateway, by the Gateway owner, so that the client and server have their chain of custody properly managed. (see https://github.com/kubernetes-sigs/gateway-api/pull/2113/files#r1268891063 for some more thoughts on this from me).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott I have added to the API to describe an idea that if the backend requires TLS and the Gateway hasn't been configured for it, then it should be possible to use system trusted certs. I think this addresses your concern. I don't feel it is a problem for 1 and 2 to be done by the same person.

@youngnick I agree with what you've written here and can update the API.

Copy link
Contributor Author

@candita candita Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first two are already there:
That TLS should be attempted at all -> the TLSConfigPolicy is there for the targetRef.
What CA to use -> TrustedCACertRefs

I will add the TLS versions and SAN names.

Choosing any other role would move the application-related responsibility from the application developer role to that
role, which violates the role-oriented design principle of Gateway API. As mentioned in Non-goal #7, providing a
mechanism for the cluster operator gateway role to override gateway to backend TLS settings is not covered here, but can
be addressed in a future update should the need arise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the great things about this being a separate resource is that multiple personas can be granted access to this resource. In some cases that might be the application developer, and in others, that might be another role like cluster operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on an update for this that ties in with updates for #2113 (comment) and #2113 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be adding something like this to explain a future approach.

One idea is to use two types: ApplicationTLSConnectionPolicy,
and GatewayTLSConnectionPolicy, where the application developer is responsible for the former, the cluster operator is responsible for the latter, and the cluster operator may configure whether certain settings may be overridden by application developers.


TLSConnectionPolicy is defined as a Direct Policy Attachment without defaults or overrides, applied to a Service that
accesses the backend in question, where the TLSConnectionPolicy resides in the same namespace as the Service it is
applied to. The TLSConnectionPolicy and the Service must reside in the same namespace in order to prevent the
complications involved with sharing trust across namespace boundaries. By choosing the Service resource rather than the
Route resource, we can reuse the same TLSConnectionPolicy for all the different Routes that might point to this Service.
For the use case where certificates are stored in their own namespace, users may create Secrets and use ReferenceGrants
robscott marked this conversation as resolved.
Show resolved Hide resolved
for a TLSConnectionPolicy-to-Secret binding.

In the API defined here, the definition of TrustedCACertRefs follows a convention established by TLSRoute in
candita marked this conversation as resolved.
Show resolved Hide resolved
https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1beta1/gateway_types.go#L340
candita marked this conversation as resolved.
Show resolved Hide resolved

One of the areas of concern for this API is that we need to indicate how and when the API implementations should use the
backend destination certificate authority. This solution proposes, as introduced in
[GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/), that the implementation
should watch the connections to a specific port on a specified targetRef (such as a Service), and if the port and
Service match a TLSConnectionPolicy, then assume the connection is TLS, and verify that the targetRef’s certificate can
be validated by the provided trusted CA certificates before the connection is made. On the question of how to signal
that there was a failure in the certificate validation, this is left up to the implementation to return a response error
that is appropriate, such as one of the HTTP error codes: 400 (Bad Request), 401 (Unauthorized), 403 (Forbidden), or
other signal that makes the failure sufficiently clear to the requester without revealing too much about the transaction,
based on established security requirements.

Not covered here, but possible to add would be additional configuration options mentioned in [SIG-NET Gateway API: TLS
to the K8s.Service/Backend](https://docs.google.com/document/d/1RTYh2brg_vLX9o3pTcrWxtZSsf8Y5NQvIG52lpFcZlo). All of
these are currently implementation-dependent, with the following recommended defaults:
- Visibility: the visibility level could be all or none, indicating that if the connection failed due to validation
failures, it would drop the connection silently if the visibility level were none, and report an error if the visibility
level were all. This is left as implementation-dependent.
- Server Name Indication: enables passing of the server name through server name indication, in the TLS transaction, to
assist with selection of certificates when several hosts share the same IP address. (default to enabled)
- Subject Alternative Name certificates: enable the use of a single certificate that can serve multiple domains.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "enabled"? this is a property of the verification. So does enabled mean we should verify it (we ought to by default or it's the equivalent of -k in curl).

If so what are we verifying? It's not just a bool.

Browsers handle this by asserting the DNS name matches the certificate, but a Kubernetes service doesn't really have as strict of a mapping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what "enabled" means is that a SAN (multi-domain) certificate will be accepted when true. When false, a certificate containing SANs will be rejected right away. The verification is always done (unless disabled otherwise with a different setting).

Regarding what we are verifying it's a good question: in Kubernetes (or service meshes like Istio) certificates may contain SPIFFE ids but the client needs to be provided with a mapping (service to valid SPIFFE id) to authorize the server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAN is the only non-deprecated way to encode a DNS name as far as I know (CN was deprecated 20 years ago). So it's not a question of if we verify San or not - we MUST.

The question is what SAN do we verify. Tou don't have to get into spife even, the pod could have any certificate..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the simplest case would be to assume svc.ns.svc.cluster.local but I am not sure that is sufficient, or even a common usage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that SAN certs have been around for long and CN-only certs have been deprecated so this setting is not needed/useful.

Regarding "what SAN do we verify": may be that needs to be an additional field here with defaults for inferred names like "svc.ns.svc.cluster.local"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTPRoute hostname is the public internet facing hostname, not the internal service. So I'm a route from example.com->example-svc, I would Not expect the workload to have a cert for example.com

it could be example-svc.ns.svc.cluster.local maybe - but there is no standard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that very service-mesh specific to not expect the pod to have a cert for example.com? Alternatively, with SANs, it could list example.com and example-svc.ns.svc.cluster.local.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not related to service mesh in any way. I wouldn't expect them to have a cert for example.com since the Gateway itself has that. And it's not clear why you would serve the same cert in 2 places?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair they could have any cert - even google.com (with their own ca, of course). So anything is possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal workload on the pod can serve requests either from a Gateway or another internal workload (the mesh scenario) and in both cases is likely to serve the same certificate (unless SNI is used?) which is more likely to be example-svc and not example.com.

(default to enabled)
- Version: specifies the minimum TLS version that the connection may use (default TLSv1.2)
- Ciphers: specifies enabled ciphers to be used in TLS exchanges. (default to align with TLS versions 1.1, 1.2, and/or
1.3 as described in [RFC4346 TLS 1.1](https://datatracker.ietf.org/doc/html/rfc4346#appendix-A.5),
[RFC 5246 TLS 1.2](https://datatracker.ietf.org/doc/html/rfc5246#appendix-A.5), and
[RFC 8446 TLS 1.3](https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.4).
- Version: specifies the minimum TLS version that the connection may use (default TLSv1.2)
- Visibility: the visibility level could be all or none, indicating that if the connection failed due to validation
candita marked this conversation as resolved.
Show resolved Hide resolved
failures, it would drop the connection silently if the visibility level were none, and report an error if the visibility
level were all. (default to all)

```go
// TLSConnectionPolicy provides a way to publish TLS configuration
// that enables a gateway client to connect to a backend pod.
candita marked this conversation as resolved.
Show resolved Hide resolved
type TLSConnectionPolicy struct {
candita marked this conversation as resolved.
Show resolved Hide resolved
candita marked this conversation as resolved.
Show resolved Hide resolved
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

// Spec defines the desired state of TLSConnectionPolicy.
Spec TLSConnectionPolicySpec `json:"spec"`

// Status defines the current state of TLSConnectionPolicy.
Status TLSConnectionPolicyStatus `json:"status,omitempty"`
}

// TLSConnectionPolicySpec defines the desired state of
// TLSConnectionPolicy.
// Note: there is no Override or Default policy configuration.
type TLSConnectionPolicySpec struct {
// TargetRef identifies an API object to apply policy to.
candita marked this conversation as resolved.
Show resolved Hide resolved
TargetRef gatewayv1a2.PolicyTargetReference `json:"targetRef"`
candita marked this conversation as resolved.
Show resolved Hide resolved
candita marked this conversation as resolved.
Show resolved Hide resolved

// TLS contains TLS connection policy configuration.
TLS *TLSConnectionPolicyConfig `json:”tls”`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better as a list that's keyed off the PortNumber

Then it's possible to have different TLS configurations for different ports in a single Service. Unless you were specifically looking to have those be distinct policies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guidelines are only one target ref per policy attachment. Maybe this also depends on whether we create a new type of target ref that includes port or ports.
x-ref https://github.com/kubernetes-sigs/gateway-api/pull/2113/files/5d84379d2aa570958ce6dbd3fb8cf187447e9f95#r1231594914

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To close this one out, we're adding sectionName to Policy TargetRef - which will allow you to target the port's name field.

}

// TLSConnectionPolicyConfig contains TLS connection policy configuration.
type TLSConnectionPolicyConfig struct {
// TrustedCACertRefs contains one or more references to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the meeting, can we also add the SNI to use ? SNI is used in virtually all https connections on the internet, and allows a backend to select what cert to provide.

If we do add the SNI - could we also modify the AllowedSAN to state "if no AllowedSAN is provided, the SNI will be used to check" ( since this is also the most common scenario on the internet )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the SNI has to be singleton on the TLSConnectionPolicy, and is sent by the Gateway in this process, in order for the backend application to choose which of several different certificates it would provide. I don't think this is use case we need to support, as we want the TLSConnectionPolicy to be able to serve several endpoints.

i did not hear alot of support for adding SNI at the meeting. i'll open a slack discussion and find out if there is widespread support, and whether TLSConnectionPolicy is the place to support it.

// Kubernetes objects that contain TLS certificates, which are
// used to establish a TLS handshake between the gateway and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more specificity on what we are supposed to do with it. Is it a client certificate? Is it a CA? Either? Both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, so it can be both. It is one or more reference to a certificate. When there is a chain of certificates, one or more can be CA/IntermediateCA.

// backend pod.
//
// A single TrustedCACertRef to a Kubernetes Secret has "Core"
// support. Implementations MAY choose to support attaching
// multiple certificates to a backend, but this behavior is
// implementation-specific.
//
// References to a resource in a different namespace are
// invalid.
//
// This field is required for any TLSConnectionPolicyConfig.
//
// Support: Core - A single reference to a Kubernetes Secret of type kubernetes.io/tls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik a kubernetes.io/tls forces the secret to contain tls.key and tls.crt, but the tls.key is not required for TLS origination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg do you have a suggestion or change for the Support:?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could try and relax this constraint, i.e. a kubernetes.io/tls is valid if a tls.key OR tls.crt key is set.
cc @robscott

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or the CA Cert can be inputted as a ConfigMap (local to the namespace), and in the case of mTLS, the client's key and cert can be inputted as a kubernetes.io/tls Secret

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere I commented that we should allow/support mTLS to be originated from the gateway so I like the idea of kubernetes.io/tls Secret

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I just realized an mTLS client requires 3 things: a private key, identity cert (public key) and a trust store to validate peer (server) cert. If kubernetes.io/tls contains tls.key and tls.crt then we need one more thing (may be the Cluster trust bundle @howardjohn mentioned?) for full mTLS client side cert config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott

I like the idea of using something other than secret to story a CA Cert since that's not really a secret. One of the simplest paths forward would be to use a ConfigMap for CA Cert and a kubernetes.io/tls Secret for the client key and cert for mTLS.

We could change this to ConfigMap for CA Certs now and add kubernetes.io/tls Secret to a different PolicyAttachment, like mTLSConnectionPolicy I mentioned earlier.

Alternatively, I'd chatted with @enj in the past about developing a different kind of resource under the scope of Gateway API that was meant for storing certs. This could potentially help with the issue we have today where most Ingress/Gateway controllers are deployed with read access to all secrets in the cluster by default. I think the ongoing ReferenceGrant work in sig-auth may make this idea irrelevant longer term, but thought it was at least mentioning in this context.

It feels out of scope for this proposal, but I can open another discussion or issue to develop a different kind of resource for certs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense to me - a well-defined field in a ConfigMap for now with a documented plan to move to a new resource type longer term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to ClusterTrustBundle in the long-term, but well-formatted configmap for now makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes for this.

//
// Support: Implementation-specific (More than one reference or other resource types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this could be done through implementation specific things, but is it possible to have an option to use the system trusted certificates? This is how most TLS configurations i have seen work - I don't have to tell chrome, curl, etc to use /etc/ssl/certs/ca-certificates.crt for example.

This is important since these do not exist as Secrets. Rather, they are... somewhere... Every client is going to have some, but they may be in different places (there are like 8 different well-known filepaths in https://golang.org/src/crypto/x509/root_linux.go for example, so systems do not use files. etc).

One option is to make this the default if the list is unset.

//
// +kubebuilder:validation:MaxItems=64
// +kubebuilder:validation:MinItems=1
TrustedCACertRefs []SecretObjectReference `json:”trustedCACertRefs”`

// Port is the network port that the implementation watches to
// know if the connection should be TLS and the targetRef’s
// certificate should be validated by the certs in TrustedCACertRefs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minor nit, I think implementations will essentially apply this policy to anywhere they're mapping a BackendRef that matches the combination of the TLSConnectionPolicy TargetRef and Port. This is not great, but maybe something like this would be a bit clearer, very open to improvements/changes though:

Suggested change
// Port is the network port that the implementation watches to
// know if the connection should be TLS and the targetRef’s
// certificate should be validated by the certs in TrustedCACertRefs
// Port is the network port of the target. When a target matches a BackendRef,
// this Policy should apply, resulting in the certificate served by the backend
// being validated by the certs in TrustedCACertRefs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending a decision on removing Port.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// If empty, then all ports for the targetRef are watched to know
// if the connection should be TLS, the targetRef’s certificate
// should be validated by the certs in TrustedCACertRefs, and a
// status delivered in the response for validation failures.
Port PortNumber `json:port,omitempty`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the target of this policy is a Service and as far as I know the Service will have a targetPort (or defaulted to service port?). Can we just use that instead so this port is not needed? Otherwise how does the implementation work when the targetPort in the Service and the Port here are different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Gateway API, a port reference is traditionally part of a Service reference (BackendRef is best example) and it refers to the Port the Service is listening on, not the targetPort (port the Service backends are listening on). The reason this is needed is that it's fairly common for Kubernetes Services to listen on multiple ports and protocols.

So given the following example, this would be referring to port 80, and specifically backendRefs that are targeting this Service on port 80.

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app.kubernetes.io/name: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...it's fairly common for Kubernetes Services to listen on multiple ports and protocols.

Got it. So in the case where a Kubernetes service listens on a single port, the Port field here (on line 243) has to match the service port. Put another way, the port can be inferred in such a case. Just confirming my understanding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about letting policy authors use port names and not just numbers?

Similar to how K8s Service ports.targetPort is IntOrString ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decide on whether we need a new object that includes Port, then we can decide on this.

}

// TLSConnectionPolicyStatus defines the observed state of TLSConnectionPolicy.
type TLSConnectionPolicyStatus struct {
// Conditions describe the current conditions of the TLSConnectionPolicy.
//
// Implementations should prefer to express TLSConnectionPolicy
// conditions using the `TLSConnectionPolicyConditionType` and
// `TLSConnectionPolicyConditionReason` constants so that
// operators and tools can converge on a common vocabulary to
// describe TLSConnectionPolicy state.
// Known condition types are:
//
// * “Accepted”
//
// +optional
// +listType=map
// +listMapKey=type
// +kubebuilder:validation:MaxItems=8
// +kubebuilder:default={type: "Accepted", status: "Unknown", reason:"Pending", message:"Waiting for validation", lastTransitionTime: "1970-01-01T00:00:00Z"}
Conditions []metav1.Condition `json:"conditions,omitempty"`
candita marked this conversation as resolved.
Show resolved Hide resolved
}

// TLSConnectionPolicyConditionType is the type of a condition used
// as a signal by TLSConnectionPolicy. This type should be used with
// the TLSConnectionPolicyStatus.Conditions field.
type TLSConnectionPolicyConditionType string

// TLSConnectionPolicyConditionReason is a reason that explains why a
// particular TLSConnectionPolicyConditionType was generated.
type TLSConnectionPolicyConditionReason string

const (
// This condition indicates that the TLSConnectionPolicy has been
// accepted as valid.
// Possible reason for this condition to be True is:
// * “Validated”
// Possible reasons for this condition to be False are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can ResolvedRefs condition be also used here? since the policy addresses external references - service, secrets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, unless there are privacy issues. As an implementor, what kind of condition reasons would you think should be there? Should there be a general validation on the certificate so it could have an invalid cert reason? Or does this store results of an analysis of reference grants between the policy and targets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most implementations would only look at this if they were implementing a Service that was targeted by one of these policies, so I don't think ResolvedRefs would be relevant in that case, but I agree that it likely would be very useful for the secret reference(s) that will be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I will pass on unless there is further interest.

// * “Invalid”
// * “Pending”
candita marked this conversation as resolved.
Show resolved Hide resolved
TLSConnectionPolicyConditionAccepted TLSConnectionPolicyConditionType = “Accepted”

// This reason is used with the “Accepted” condition when the condition is true.
TLSConnectionPolicyReasonAccepted TLSConnectionPolicyConditionReason = “Valid”

// This reason is used with the “Accepted” condition when the TLSConnectionPolicy is invalid, e.g. crossing namespace boundaries.
TLSConnectionPolicyReasonInvalid TLSConnectionPolicyConditionReason = “Invalid”

// This reason is used with the “Accepted” condition when the TLSConnectionPolicy is pending validation.
TLSConnectionPolicyReasonPending TLSConnectionPolicyConditionReason = “Pending”
)
```

## How a client behaves

This table describes the effect that a TLSConnectionPolicy has on a Route. There are only two cases where the
TLSConnectionPolicy will signal a Route to connect to a backend using TLS, an HTTPRoute with a backend that is targeted
by a TLSConnectionPolicy, either with or without listener TLS configured. (There are a few other cases where it may be
possible, but is purposely marked “not supported” due to a desire for less confusion on the assigned purpose of each of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does an implementation do for "not supported"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, these are invalid configurations, so they wouldn't pass validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can they fail validation? The configuration and the route are separate objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, I didn't think it through. So for "not supported", the intention really is that this is not supported, at least not in this proposal, where we're only covering HTTPRoute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something that would benefit from both nested conditions (as suggested in https://github.com/kubernetes-sigs/gateway-api/pull/2113/files#r1239031972) and the SupportedFeatures GEP from @LiorLieberman. We can say that every implementation that claims supports for this policy needs to populate status on the policy describing which Routes it is being implemented for (and maybe also which Routes it is not being implemented for).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about,

Every implementation that claims supports for TLSConnectionPolicy should document for which Routes it is being implemented.

the protocol-affiliated types of Routes.)

| Route Type | Gateway Config | Backend is targeted by a TLSConnectionPolicy? | Connect to backend with TLS? |
|------------|----------------------------|-----------------------------------------------|-------------------------------|
| HTTPRoute | Listener tls | Yes | **Yes** |
| HTTPRoute | No listener tls | Yes | **Yes** |
| HTTPRoute | Listener tls | No | No |
| HTTPRoute | No listener tls | No | No |
| TLSRoute | Listener Mode: Passthrough | Yes | No |
| TLSRoute | Listener Mode: Terminate | Yes | Not supported |
| TLSRoute | Listener Mode: Passthrough | No | No |
| TLSRoute | Listener Mode: Terminate | No | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to take mesh clients into account (or rather, another table for mesh). Should they be following TLSConnectionPolicy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the mesh case is not covered in this proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be accepting new APIs into this repo that don't take mesh (now a core use case of the project) into account. I am not talking about handling "mesh transport", but we can't just have no answer for what mesh client that is implementing the API should do when they API is there. Even if it's saying "should always ignore it"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spelled it out in Longer Term Goals that service mesh use cases are a worthy goal, but it's a goal that may need a different GEP for proper attention. I hope we don't get into a habit where we approve/merge a preliminary GEP with clearly stated goals and then request to change the goals later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small clarification to longer term goals to say that implementations that support those 3 use-cases should ignore TLSConnectionPolicy?

| TCPRoute | Listener TLS | Yes | Not supported |
| TCPRoute | No listener TLS | Yes | Not supported |
| TCPRoute | Listener TLS | No | No |
| TCPRoute | No listener TLS | No | No |
| UDPRoute | Listener TLS | N/A | No |
| UDPRoute | No listener TLS | N/A | No |
| UDPRoute | Listener TLS | N/A | No |
| UDPRoute | No listener TLS | N/A | No |
| GRPCRoute | N/A | N/A | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be the same as HTTP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not proposing any other changes in behavior except for HTTPRoute, so I'm not in a hurry to make GRPCRoute have the same behavior as HTTPRoute and be changed as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels fine for provisional but GRPCRoute should probably be supported for implementable. Not blocking right now though IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would request to wait until GRPCRoute itself is no longer experimental. https://gateway-api.sigs.k8s.io/api-types/grpcroute/

Is that acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may actually be pretty straightforward and identical to HTTP as @howardjohn suggested. @gnossen can correct me here, but I think we should expect the same config from TLS from Gateway -> Backend to apply equally to HTTPRoute and GRPCRoute.

As far as the experimental status of GRPCRoute, it's hard to get right. We've gotten into situations in upstream Kubernetes where we let in a bunch of incompatible alpha features and it became quite a pain to graduate any of them past alpha. In this case with GRPCRoute being fairly stable at this point, I'd rather cover it here unless it ends up adding significant complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will mark it "implementation-dependent" intstead of "Yes" in the "Connect to backend with TLS?" column.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @robscott . Disregarding any issues of experimental state, I think the values for GRPCRoute should be the same as for HTTPRoute in this table.


## Request Flow

One additional step would be added to the typical client/gateway API request flow for a gateway implemented using a
reverse proxy. This is shown as step 6 below.

1. A client makes a request to http://foo.example.com.
2. DNS resolves the name to a Gateway address.
3. The reverse proxy receives the request on a Listener and uses the Host header to match an HTTPRoute.
4. Optionally, the reverse proxy can perform request header and/or path matching based on match rules of the HTTPRoute.
5. Optionally, the reverse proxy can modify the request, i.e. add/remove headers, based on filter rules of the HTTPRoute.
6. __(New) Optionally, the reverse proxy can determine the outcome of verifying the cert served by the backend, based on
candita marked this conversation as resolved.
Show resolved Hide resolved
backendRef rules of the HTTPRoute.__
7. Lastly, the reverse proxy forwards the request to one or more objects, i.e. Service, in the cluster based on
backendRefs rules of the HTTPRoute.

## Alternatives
Most alternatives are enumerated in the section on the history of backend TLS above. A couple of additional
alternatives are also listed here.

1. Expand BackendRef, which is already an expansion point. At first, it seems logical that since listeners are handling
the client-gateway certs, BackendRefs could handle the gateway-backend certs. However, when multiple Routes to target
the same Service, there would be unnecessary copying of the BackendRef every time the Service was targeted. As well,
there could be multiple bBackendRefs with multiple rules on a rRoute, each of which might need the gateway-backend cert
configuration so it is not the appropriate pattern.
2. Extend HTTPRoute to indicate TLS backend support. Extending HTTPRoute would interfere with deployed implementations
too much to be a practical solution.
3. Add a new type of Route for backend TLS. This is impractical because we might want to enable backend TLS on other
route types in the future, and because we might want to have both TLS listeners and backend TLS on a single route.

## Prior Art

Expand Down Expand Up @@ -181,19 +408,34 @@ Ref: [Upstream.TLS](https://docs.nginx.com/nginx-ingress-controller/configuratio
Ref: [EgressMTLS](https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#egressmtls)
Ref: [IngressMTLS](https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#ingressmtls)

## Open Questions (TODO)
## Open Questions

This section is to record issues that should be discussed in the implementation section before this GEP moves
This section is to record issues that were warranted for discussion in the API section before this GEP moves
out of `Provisional` status.

1. Bowei recommended that we mention the approach of cross-namespace referencing between Route and Service.
Be explicit about using the standard rules with respect to attaching policies to resources.
Be explicit about using the standard rules with respect to attaching policies to resources. This is mentioned in the
API section.
2. Costin recommended that Gateway SHOULD authenticate with either a JWT with audience or client cert
or some other means - so gateway added headers can be trusted, etc.
or some other means - so gateway added headers can be trusted, amongst other things. This is out of scope for this
proposal, which centers around application developer persona resources such as HTTPRoute and Service.
3. Costin mentioned we need to answer the question - is configuring the connection to a backend and TLS
something the route author decides - or the backend owner? Same for SANs. However, providing a mechanism
for the cluster operator to override gateway to backend TLS settings is already listed as a Non-Goal.
something the route author decides - or the backend owner? Same for SAN (Subject Alternative Name) certificates.
The use of SAN certificates and the use of SNI, as a part of TLS, can be implementation-dependent, though the
application can still reject any request or certificate that it doesn’t support, including requests with SNI or a
certificate with SANs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Application can reject certificates with SANs" doesn't make sense (unless its mTLS).

The application is the one with a certificate, it wouldn't reject its own certificate.

The client is verifying the SAN in the application's certificate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if I replace the last sentence with:

"The backend owner is the application developer, and the route owner will have to collaborate with the application developer to provide the appropriate configuration for TLS. The implementation would need to take the certificate provided by the application and verify that it satisfies the requirements of the route-as-client, including SAN information. Sometimes the backend owner and route owner are the same entity."


## References

[Gateway API TLS Use Cases](https://docs.google.com/document/d/17sctu2uMJtHmJTGtBi_awGB0YzoCLodtR6rUNmKMCs8/edit#heading=h.cxuq8vo8pcxm)
https://gateway-api.sigs.k8s.io/geps/gep-713/
https://gateway-api.sigs.k8s.io/references/policy-attachment/
https://gateway-api.sigs.k8s.io/v1alpha2/guides/tls/
https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#egressmtls
[SIG-NET[Gateway API]: TLS to the K8s.Service/Backend](https://docs.google.com/document/d/1RTYh2brg_vLX9o3pTcrWxtZSsf8Y5NQvIG52lpFcZlo)
https://serverfault.com/questions/807959/what-is-the-difference-between-san-and-sni-ssl-certificates
[GEP-713: Metaresources and PolicyAttachment](https://gateway-api.sigs.k8s.io/geps/gep-713/)
[Policy Attachment](https://gateway-api.sigs.k8s.io/references/policy-attachment/#direct-policy-attachment)
[RFC4346 TLS 1.1](https://datatracker.ietf.org/doc/html/rfc4346#appendix-A.5)
[RFC 5246 TLS 1.2](https://datatracker.ietf.org/doc/html/rfc5246#appendix-A.5)
[RFC 8446 TLS 1.3](https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.4)