GEP-1282 Backend Properties - Update implementation #1430

candita · 2022-10-03T22:22:53Z

What type of PR is this?
/kind gep

What this PR does / why we need it:

Add proposed API implementation for Backend Properties.

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?:

NONE

k8s-ci-robot · 2022-10-03T22:23:02Z

Hi @candita. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

robscott · 2022-10-05T01:45:18Z

/ok-to-test

robscott

Thanks for the work on this @candita!

robscott · 2022-10-05T01:46:59Z

site-src/geps/gep-1282.md

+architectural choice is made because we would ideally like to locate this capabilities set within the Service but until
+that is possible in Kubernetes, we can start with a wrapper to the Service.
+
+BackendCapabilities contains a BackendCapabilitiesSpec object as mentioned in Option 2, defining each of the different


I may be missing something, but I think "Option 2" may be a holdover from the doc and not useful in this context.

robscott · 2022-10-05T01:48:22Z

site-src/geps/gep-1282.md

+that is possible in Kubernetes, we can start with a wrapper to the Service.
+
+BackendCapabilities contains a BackendCapabilitiesSpec object as mentioned in Option 2, defining each of the different
+capabilities.  In this discussion, the capabilities are TLSReencyrpt, Websocket, and Protocol, but additional


Suggested change

capabilities. In this discussion, the capabilities are TLSReencyrpt, Websocket, and Protocol, but additional

capabilities. In this GEP, the proposed capabilities are TLSReencyrpt, Websocket, and Protocol, but additional

robscott · 2022-10-05T01:50:06Z

site-src/geps/gep-1282.md

+    // +required
+    BackendType BackendTargetRef `json:”backendType”`
+
+	// Spec specifies the capabilities supported by this


Comment applies to full doc, the combination of markdown and go indentation is always painful, but in this case it looks like there's a mix of tabs and spaces. Mind updating this to use spaces (or tabs) everywhere?

robscott · 2022-10-05T01:52:52Z

site-src/geps/gep-1282.md

+    // BackendName is the name of the backend service to which these backend 
+    // capabilities apply.
+    // +required
+    BackendName string `json:”backendName”`
+
+    // BackendType is the type of backend to which these backend 
+    // capabilities apply.
+    //
+    // +required
+    BackendType BackendTargetRef `json:”backendType”`


I think these fields belong in spec. I'd recommend a single backendRef field that was a subset of our BackendObjectReference type. I know we don't want a namespace field, and we likely don't want a port field, but the rest of that object should be something we can copy verbatim here.

I made some changes. Lmk what you think.

robscott · 2022-10-05T01:53:37Z

site-src/geps/gep-1282.md

+	// Status represents the current state of the backend and its capabilities.
+    // +optional
+	Status *BackendCapabilitiesStatus `json:”status,omitempty”`


Maybe copy ReferenceGrant and don't include status until we know what it will look like?

I like how Gateway API has a status for nearly every object, and was hoping to use that from the beginning, even if there was just one condition listed.

What would be an example of a useful status condition that would apply to a BackendCapabilities and not to the HTTPRoute that targets the Service that the BackendCapabilities modifies?

robscott · 2022-10-05T01:58:19Z

site-src/geps/gep-1282.md

+	// +optional
+	Reencrypt *TLSReencryptType `json:”reencrypt,omitempty”`


I'd recommend starting with only this, since that seems to be the concept people are most interested in. As a broader note, I'd recommend trying to make these type definitions as close to final as possible, including as much detail in the comments as we have for other type definitions. For example, hostnames is quite verbose, but most fields have at least 2-3 sentences describing how we'd expect them to be implemented.

Also a tiny nit on naming - this is not necessarily reencryption. We really just want to provide configuration that enables the hop from the Gateway to this backend to be encrypted, regardless of if the initial hop was encrypted.

I see this was also discussed in #1285 (comment). To me reencrypt is quite misleading since, as said, it is just about TLS for backend/upstream hop. Personally, I would rather vote something neutral like tls, with a description that it can be used also for re-encrypt use case?

in https://gateway-api.sigs.k8s.io/guides/tls/?h=tls#clientserver-and-tls, since we call the first hop Terminate, how about calling the second hop Extend or Forward?

I hesitate to call it tls because we already use that in Gateway.spec.Listeners

gateway-api/apis/v1beta1/gateway_types.go

Line 207 in 96414f1

TLS *GatewayTLSConfig `json:"tls,omitempty"`

and documented in https://gateway-api.sigs.k8s.io/guides/tls/?h=tls#clientserver-and-tls

I used TLS, and updated some more of the godoc. Please take a look.

robscott · 2022-10-05T02:00:59Z

site-src/geps/gep-1282.md

+}
+
+type ReencryptType struct {
+    Version *TLSVersion


I think many backends support a range of TLS versions, it may be worth allowing room for that here. As a broader question, is this absolutely required? Could we live without it in a v1?

I also think having only TLSVersion is bit ambiguous. Usually I see something like MinTLSVersion, MaxTLSVersion and also a list of AllowedCipherSuites (and so far, ciphers have no effect for TLSv1.3).

Maybe for TLS clients, like we have here, these settings are bit "less visible" than for servers: servers immediately get caught when running TLS scanners towards them and users will request for configurability. In case of clients, the lack of configurability might go under the radar for a while. Regardless, having configurability at least for min/max would be very nice.

Good points. Will replace with min/max version. Optional, so if not specified then there is no check. If anyone feels strongly that we don't need it, we can remove it.

(and so far, ciphers have no effect for TLSv1.3).

@tsaarni, what do you mean by this?

robscott · 2022-10-05T02:02:39Z

site-src/geps/gep-1282.md

+
+type ReencryptType struct {
+    Version *TLSVersion
+    ClientCertFile string


Based on the discussion in the community meeting yesterday, this doesn't seem like the right place for this config. Maybe the Gateway is a better fit?

I was just commenting in old (closed) issue #622 (comment) that there might have been some confusion, but since I was not present at the meeting yesterday, I miss the context and maybe I might be just confused myself:

Since this GEP proposes encryption for the cluster-internal network hop from the gateway proxy to the backend, shouldn't the fields here be something like:

certificateRef that is a reference to Secret of type: kubernetes.io/tls, where the secret includes keys tls.crt and tls.key which are the client credentials that gateway should use when establishing TLS connection to said backend service

caCertificateRef that is a reference to Secret, where the secret includes key ca.crt which is a CA bundle for validating the backend server certificate.

Sidenote: there is no common API / agreement about the CA certificate key name. For example, certificate-manager can store it in type: kubernetes.io/tls with name ca.crt even though type: kubernetes.io/tls does not recognize key with that name.

I made some changes. Let me know what you think.

robscott · 2022-10-05T02:05:04Z

site-src/geps/gep-1282.md

+    PrivateKeyFile string
+    BackendCertFile string


What are these for? It may be helpful to see an e2e use case where all of these fields were used with an example of who populated each of these. As a broader note, these feel like something we should derive from Secret/Object refs, not file paths.

I'll make some changes on this, using SecretObjectReference from

gateway-api/apis/v1beta1/object_reference_types.go

Line 48 in 96414f1

type SecretObjectReference struct {

Suggested change

PrivateKeyFile string

BackendCertFile string

type ReencryptType struct {

// +optional

MinVersion *TLSVersion

// +optional

MaxVersion *TLSVersion

// +optional

ClientCerts []SecretObjectReference

// +optional

BackendCerts []SecretObjectReference

// +optional

CaCerts []SecretObjectReference

}

I changed it again.

robscott · 2022-10-05T02:05:42Z

site-src/geps/gep-1282.md

+    ClientCertFile string
+    PrivateKeyFile string
+    BackendCertFile string
+    CaCertsFile	  []string


Is this intended to be a list of CaCerts Files? Would it be sufficient to have a single CA Bundle referenced with an object ref?

Changed it to bundle.

tsaarni · 2022-10-05T06:01:13Z

site-src/geps/gep-1282.md

+backendCapabilities:
+  tls-reencrypt:
+    - name: version
+    value: 1.3
+    - name: clientCert
+    value: /path/to/cert.pem
+    - name: privateKey
+    value: /path/to/pkey.pem
+    - name: caCerts
+    value: /path/to/cacerts.pem
+    - name: backendCert
+    value: /path/to/backendCert.pem


Likely a strange question: what is the syntax used in the examples in this chapter? :-) I was rather expecting to see (partial) Kubernetes manifests matching the API / CRD proposal earlier in this document instead of general name/value records.

This is an Alternative option that was considered, a map of structs, offering the most minimal set of predefined fields. We will not likely choose this, but if anyone has a preference for Alternative 1, now would be the time to discuss that.

tsaarni · 2022-10-05T06:37:01Z

site-src/geps/gep-1282.md

+
+type ReencryptType struct {
+    Version *TLSVersion
+    ClientCertFile string


I was just commenting in old (closed) issue #622 (comment) that there might have been some confusion, but since I was not present at the meeting yesterday, I miss the context and maybe I might be just confused myself:

Since this GEP proposes encryption for the cluster-internal network hop from the gateway proxy to the backend, shouldn't the fields here be something like:

certificateRef that is a reference to Secret of type: kubernetes.io/tls, where the secret includes keys tls.crt and tls.key which are the client credentials that gateway should use when establishing TLS connection to said backend service

caCertificateRef that is a reference to Secret, where the secret includes key ca.crt which is a CA bundle for validating the backend server certificate.

Sidenote: there is no common API / agreement about the CA certificate key name. For example, certificate-manager can store it in type: kubernetes.io/tls with name ca.crt even though type: kubernetes.io/tls does not recognize key with that name.

tsaarni · 2022-10-05T06:50:49Z

site-src/geps/gep-1282.md

+    Protocol *ProtocolType `json:”protocol,omitempty”`
+}
+
+type ReencryptType struct {


Is there a need to add optional subjectName for server certificate validation? By default, HTTPS clients are expected to validate that the identity in server certificate matches with the server name in URI. Sometimes there might be a need to overwrite this behavior and compare the identity to a name that was provisioned manually, instead of using the backends DNS name.

I added subjectAlternativeName and serverNameIndication. One of these (or both) might provide what is needed.

I removed both and replaced with ServiceNames.

howardjohn · 2022-10-11T13:53:47Z

site-src/geps/gep-1282.md

+    // +optional    
+    MaxVersion   *TLSVersion  `json:"maxVersion,omitempty"`
+
+    // CipherSuite is the minimum cipher suite that the backend accepts.


There is no "minimum" cipher suite as far as I can tell; this should be a list?

Agreed, this should be []string so that it can store a list of ciphers in the usual format.

TLS 1.3 only accepts a subset of what 1.1/1.2 accept. I changed this to a list so the implementation can validate whatever cipher suites are included. However, if they were left empty, and there is no min or maxVersion, the implementation would need to determine which cipherSuites to use.

TLSv1.2 cipher suites and TLSv1.3 cipher suites don't map one-to-one. One option would be to have two separate fields for TLSv1.2 (and earlier) and for TLSv1.3. That's what HAProxy does:

http://cbonte.github.io/haproxy-dconv/2.6/configuration.html#3.1-ssl-default-bind-ciphers

http://cbonte.github.io/haproxy-dconv/2.6/configuration.html#3.1-ssl-default-bind-ciphersuites

howardjohn · 2022-10-11T13:54:03Z

site-src/geps/gep-1282.md

+    // GatewayCert is the certificate used by the Gateway to prove its identity to
+    // the backend, if the backend so requires.
+    // +optional   
+    GatewayCert  *SecretObjectReference `json:"gatewayCert,omitempty"`


This shouldn't be specific to gateway

Yes, this is more describing a client certificate, but I think that this one can probably be elided for now - the behavior we want is that the Gateway holds a client certificate that can be used with the CACert specified in this object.

howardjohn · 2022-10-11T13:54:45Z

site-src/geps/gep-1282.md

+
+    // SubjectAltNames allow multiple named domains to use a single certificate.
+    // +optional
+    SubjectAltNames []string `json:"subjectAltNames,omitempty"`


How does this match? Are prefix matches allowed?

If the certificate is for a wildcard domain like *.example.com then prefix matches like abc.example.com and def.example.com are allowed. Should I be specific in the godoc for this?

howardjohn · 2022-10-11T13:55:07Z

site-src/geps/gep-1282.md

+    // +optional   
+    GatewayCert  *SecretObjectReference `json:"gatewayCert,omitempty"`
+
+    // BackendCerts are the CA certificates used by the gateway to authenticate


I am confused how this differs from CACerts

I think this one is more intended to hold the serving certificate being used; it's probably not necessary.

howardjohn · 2022-10-11T13:55:23Z

site-src/geps/gep-1282.md

+    // BackendCerts are the CA certificates used by the gateway to authenticate
+    // with the backend. This certificate allows the Gateway to verify and trust the 
+    // backend certificate.
+    // +optional	


what happens if not set? We don't validate, or use system root certs?

For the CACerts key, I think that if that field is set, it should be the entire trust chain used for certificate authentication in both directions, and if not set, system root certs should be used. This will let users use Let's Encrypt certs from cert-manager or similar for their service if that's what they want.

howardjohn · 2022-10-11T13:58:16Z

site-src/geps/gep-1282.md

@@ -98,12 +106,297 @@ Of course, from the section above, you can see that there are many other feature

 ## API


I mentioned this on the call but want to re-iterate so its in writing: I think BackendCapabilities is the wrong abstraction for this API.

What BackendCapabilities says is "Here are the properties of the backend". This may be something like "Listens on TLS1.2 or TLS1.3 over HTTP/2" or "Supports HTTP/1.1 and Http/2", etc.

What we want to do, as implementors, is program a client to decide how to connect. With the information in this API, we don't have enough data to know how to program a client.

If a backend says they support HTTP/1.1 and HTTP/2, how do I know which one to send?

If a backend says it supports TLS, I have the same problem. As a gateway, I can either forward a request as is or add a layer of TLS. But in many cases, we won't actually know if the original request already is TLS encrypted, so we don't know whether we should add a layer of TLS or not.

Ultimately, this configuration is a consumer configuration; by treating it as a producer configuration we are not getting the functionality required to implement this effectively.

Note: you may look at DestinationRule and make the same argument, but actually it is a consumer configuration that allows the producer to set defaults for consumers; for reasons mentioned above, setting a default policy of "TLS encrypt" would almost certainly break users.

Thanks @howardjohn for writing this out, I understand the concern better.

My viewpoint, and I think @youngnick 's, is that by creating BackendProperties that reference a backend, the developer or admin is telling the Gateway how to interact with the backend. The intention is still to use BackendCapabilities as a wrapper or decorator for the backend.

I may miss some nuances here, but I thought what might happen is that the Gateway (acting in its client role) would lookup and use BackendCapabilites as its reference to the backend. For a HTTPRoute that is not a passthrough (assuming the Gateway knows which traffic is passthrough), if the Gateway finds a BackendCapabilitesSpec for the backend, then it uses that information to decide how to connect. If it doesn't find a spec, then it connects as it would have without knowing the backend capabilities.

Note I started with BackendTargetRef as a part of BackendCapabilities, not BackendCapabilitesSpec. Later I moved it into the spec by request from @robscott (#1430 (comment)). Does hiding the BackendTargetRef within BackendCapabilitesSpec make programming a client more difficult?

Is there any way to salvage the BackendCapabilies as an abstraction? We can definitely add more to make it work properly if it is still missing too much. Maybe we need to store more of the handshake information for TLS, there are a few pieces missing, like details on cipher suite, compression, curve, etc. And there are few pieces that still need tweaking.

@howardjohn, there are a few reasons why we're doing this with a separate resource:

To avoid adding fields to the backendRef that may not always make sense there

To avoid having to set fields individually for each backendRef - this cost increases for each time a backend is referenced in a Route config.

To make it so that this config is available to other things than HTTPRoute without reimplementing in every Route type

The basic idea is that there's an object, owned by the owner of the Service that tells consumers what they can do when they connect to the service. You're one hundred percent right that these are consumer rules, but the only person who knows what these rules should be is the person who sets up the Service - the Producer.. So this is a way for the service owner/producer to tell Service consumers about how to connect to the service, in a structured way.

It seems to me like having the Route/Gateway need to guess at these rules is not ideal, unless we assume that the Route owner always is the Service producer.

If that is the case, then we can easily do this by adding fields to backendRef. The tradeoff there is that now, everywhere that uses backendRef needs to have rules about what happens when you specify TLS settings, or websocket settings, or whatever. So if you use your backend in more than one backendRef, you now need to duplicate all these settings.

Is the problem just that calling this "backend properties/capabilities" is guiding towards wrong abstraction? For example, the description currently in the TLS fields says:

// The minimal TLSVersion that is accepted by the backend. ... // CACerts are the Certificate Authority, trusted by both Gateway and backend,

This wording makes sense if one would really describe capabilities of the backend. But since we are programming the client/gateway/proxy, speculating about the backend capabilities seems bit unexpected. We end up deriving client configuration from server capabilities?

What I was expecting to see was simply properties applied to Gateway's backend connection(s) when communicating towards specific backend(s). Something like "the minimum TLSVersion used by the Gateway when connecting to the backend" and "Certificate Authorities used by Gateway to verify backend server certificate". I'm not sure if the same holds to everything, but at least for backend TLS hop it seems so.

Edit: If one would use this same abstraction to configure cluster external properties as well (such as letting backend developer to provision external server certs #763), then "backend connection properties" would be misleading name as well.

The alternative is not to make every use in backendRef. The core relationship of consumer + producer rules, IMO, is "Producer sets defaults, can be overriden by consumer" (which is essentially what policy attachment is all about, as well). Its not necessarily either-or.

For a HTTPRoute that is not a passthrough (assuming the Gateway knows which traffic is passthrough), if the Gateway finds a BackendCapabilitesSpec for the backend, then it uses that information to decide how to connect.

Focusing overly on HTTPRoute (here and in other GEPs, we tend to) is not always the best path. What if the route is a TCPRoute? The contents are opaque - we cannot tell if its TLS or not.

Even if its a TLSRoute - you know it has a layer of TLS, but how many? How many does the backend expect? May sound crazy, but its not uncommon to have multiple layers of TLS at this point with transparent mTLS, things like onion routing, etc.

But most importantly - backendProperties, as proposed, are a property of the backend. That means there is no relationship to Gateway or Routes at all. These properties should be respected by any Kubernetes cluster that can support them, not just N/S traffic (mesh).

What I was expecting to see was simply properties applied to Gateway's backend connection(s) when communicating towards specific backend(s). Something like "the minimum TLSVersion used by the Gateway when connecting to the backend" and "Certificate Authorities used by Gateway to verify backend server certificate". I'm not sure if the same holds to everything, but at least for backend TLS hop it seems so.

Is that not what this proposal is doing? Is it just the name BackendCapabilities that is the sticking point? The idea here is to allow the owner of the service to say "When connecting to the thing that's running inside the pod. do these things" - have a minimum TLS version, use this CA cert and so on. If it's just that the word Capabilities is too overloaded, and we need to change the name, I don't think that's a big deal.

@tsaarni, in the quote you've given, I think it's equally correct to have that say:

// The minimal TLSVersion required to connect to the backend. ... // CACerts are the Certificate Authority, that the serving certificate for the backend service must be validated against.

Both of those properties are things that are set by the owner of the Service, but are needed by clients that connect to the Service. Similarly, the point of having websocket details in a resource like this is so that the owner of the Service can say "My service will accept websockets at the following paths", so that clients (like a Gateway proxy) know what to do for those paths.

Again, the point here is to take a whole bunch of configuration that, in the Ingress world, was often done via annotations on the Service, and make it structured. Ideally, we would put this inside the Service in structured fields, but we can't easily change the Service spec, so we've opted for a "decorator" style CRD.

If it's just that the name BackendCapabilities is creating the wrong abstraction, that's fine, but we need some guidance around what else to call it.

@howardjohn, I'm not quite sure I understand what you're asking for. Are you saying that this resource should be called BackendHTTPProperties or something, and only be relevant when it's used as a backendRef in a HTTPRoute? Or are you saying that we should define semantics for the other Route types?

It's clear that you disagree with the current approach, but I am not sure what actions we need to take to move towards something we can agree on.

Consumers need to be able to control the settings, not just the service producers.

I am definitely not suggesting making it tied to HTTP - my concern was the opposite. That this won't work for non-http clients because we don't know if we should encrypt or not. But the client knows which is why they need to be able to configure it

Okay, I agree that consumers also need a place for some settings like this. But don't we need to have the service producer's settings available at least, and then we can see what things make sense to allow in other places? I'll be honest, in my head I kind of thought that Policy Attachment could be used to set things from the Gateway side - so the producer can say "I accept TLS 1.2 or greater", and a Policy could say "All Routes attached to this Gateway will default to TLS 1.3" and then things would work. But without the bit where the producer says what they expect, then you can't even do simple cases, where there's no extra TLS wrapping etc.

If this needs explanation in the document as a place for future work, I think that's fine. But I don't know how we solve the basic problem of "My service is doing its own TLS, and proxy connections from a Gateway won't work unless it knows about it" without something like what we have here.

I agree with you but I don't think the GEP behaves that way? Or I read it wrong

I don't think we have anything in the GEP right now about Policy Attachment and consumer settings, but that's probably worth adding.

I'll review the GEP again and see if I can see any other changes, and then come back here.

youngnick

I spent some time thinking about this proposal some more after some back-and-forth in the comments.

I think it's important that we focus the fields in on "the things a client will need to know to connect to the wrapped service". Since a Gateway is a client, that will enable the Gateway to do reencrypt, and also allow other uses of this object.

This pattern - that the BackendCapabilities describes the information you need to connect to the backend is what I had in mind when I wrote up GEP-1282 originally. I think in the discussions, the original intent has got a bit muddled up - this is on me for not being clearer to everyone, sorry.

It could be that BackendCapabilites should really be called BackendConnectionInfo or the more simple BackendProperties, since it seems that Capabilities is a slightly overloaded term. I'd like to discuss this some more in the next community meeting.

For the actual spec though, the good news is that it's only a limited set of things that we need to put in the struct to describe the information you need to know to connect to a TLS backend.

MinVersion and MaxVersion are great and should stay.

Then, we just need:

CipherSuites as a list of strings, so that we can store it as a list of cipher names.
CACerts - you have this as a list, which I think is fine, but it could also be a single secret containing a PEM bundle. (That's how other Secret references in Gateway API work, if you want to put multiple certs in them, just use a PEM bundle).
SubjectAltNames - I think we should maybe leave this out for the first cut. If we do include it, we should call it ServiceNames and specify that it contains a list of all the names the service will respond to (that is, that it has certificates for.) That way, it can be used by a Gateway implementation to check if the Listener and/or Route hostnames match the backend. That does mean it will be a three-way hostname match, which could get pretty complicated.
I think we should leave out the other fields - there's no need to supply the serving cert (in the BackendCert field currently), that gets sent as part of the handshake, and the SNI is covered by the ServiceNames field if we put it in. The GatewayCert is configuration that should live somewhere on the Gateway - it's up to the Gateway owner and the implementation to wire a client certificate that matches the CACert to the Gateway's data plane, so that requests can be made with a correct client certificate. I also don't think that we should add in client SNI checking yet, this can be added later if we think that this model will work.

youngnick · 2022-10-14T04:15:54Z

site-src/geps/gep-1282.md

+    // +optional    
+    MaxVersion   *TLSVersion  `json:"maxVersion,omitempty"`
+
+    // CipherSuite is the minimum cipher suite that the backend accepts.


Agreed, this should be []string so that it can store a list of ciphers in the usual format.

youngnick · 2022-10-14T04:18:13Z

site-src/geps/gep-1282.md

+    // +optional   
+    GatewayCert  *SecretObjectReference `json:"gatewayCert,omitempty"`
+
+    // BackendCerts are the CA certificates used by the gateway to authenticate


I think this one is more intended to hold the serving certificate being used; it's probably not necessary.

youngnick · 2022-10-14T04:18:15Z

site-src/geps/gep-1282.md

+    // GatewayCert is the certificate used by the Gateway to prove its identity to
+    // the backend, if the backend so requires.
+    // +optional   
+    GatewayCert  *SecretObjectReference `json:"gatewayCert,omitempty"`


Yes, this is more describing a client certificate, but I think that this one can probably be elided for now - the behavior we want is that the Gateway holds a client certificate that can be used with the CACert specified in this object.

youngnick · 2022-10-14T04:19:29Z

site-src/geps/gep-1282.md

+    // BackendCerts are the CA certificates used by the gateway to authenticate
+    // with the backend. This certificate allows the Gateway to verify and trust the 
+    // backend certificate.
+    // +optional	


For the CACerts key, I think that if that field is set, it should be the entire trust chain used for certificate authentication in both directions, and if not set, system root certs should be used. This will let users use Let's Encrypt certs from cert-manager or similar for their service if that's what they want.

youngnick · 2022-10-14T04:45:00Z

Oh, lastly, I think that we probably need a bit of discussion about how this will interact with Policy Attachment.

In short, I think it should be something like:

BackendCapabilities sets what the service producer allows/wants for connections.
Policy Attachment may further default or limit the set of connection properties at a Route, Gateway, or GatewayClass level, using the defaults and overrides functionality.

I can explain this further in a meeting after I've thought it through some more.

tsaarni · 2022-10-14T07:20:08Z

May I ask one clarification for the GEP update? In the GEP is said that the initial list of properties to add includes "TLS information for connection from the Gateway to the backend service" which I thought would include client credentials. However, I see from comments from @howardjohn and @youngnick that this is not the case. @youngnick writes:

The GatewayCert is configuration that should live somewhere on the Gateway - it's up to the Gateway owner and the implementation to wire a client certificate that matches the CACert to the Gateway's data plane, so that requests can be made with a correct client certificate.

What I think is meant here is that Cluster Operator who creates Gateways would provision the client certificate. Cluster Administrator then gives (via some out-of-band communication mechanism) the CA certificate to the Application Developer (service provider) who needs to configure that to their backend to allow it to validate Gateway's client certificate. Is that the expected workflow?

My question is coming from the "self-service" perspective: There is a likelihood that the Application Developer (service provider) would like to provision their own client certificate for Gateway, without needing to coordinate this with the Cluster Administrator. Of course, I'm not sure what @candita's expectation was for the workflow?

Also, if it is Cluster Administrator who provisions the client certificate for Gateway, I'm wondering why Application Developer is in control of TLS versions that Gateway uses, or cipher suites. What drives the decision which TLS properties are managed by which persona? All these parameters go into client's SSL context.

youngnick · 2022-10-17T05:02:34Z

I think that a part that is confusing here is that it seems likely that we'll need to have two places where backend config can live.

Firstly, the config for the service itself. This is the service saying "I will accept TLS connections that look like this". This includes TLS version, the CA cert used for the communication, and so on. I think that these are a good fit for a BackendCapabilities/BackendProperties object.

Secondly, the config used to connect to the service. Depending on how the Gateway implementation is interacting with the services, it could be that the implementation owner wants the Gateway to have one client certificate, or many. It could also be that, as @tsaarni says, that the application owner wants every client to have their own, individual client certificate.

For the case that the Gateway implementation has one client certificate across all client connections, that's a implementation-level config.

For the case that the Gateway implementation needs individual client certificates for each backend, I think the right place is actually on the HTTPRoute, since that specifies how the Gateway impl connects to the backend.

In this model, the BackendCapabilities is the Service resource owner saying what the serving properties are (what TLS versions may possibly be used, what the serving cert CA is, and for cases like websockets, what the paths that support websockets are), and the HTTPRoute would hold the client properties (what version should actually be used by the Gateway impl to connect, what client certificate should be used to connect, and so on).

The difference between the two is that the BackendCapabilities information is (at least theoretically) useful to other things than a Gateway, and the HTTPRoute information is needed by things that want to connect.

I'm kind of making this up as I go along - this is a classic case that @candita had the very reasonable requirement that she wanted to be able to describe how to do a backend TLS connection in the API at all, and we've ended up stuck on definitional and semantic issues about where config belongs.

candita · 2022-10-17T21:35:54Z

I think that a part that is confusing here is that it seems likely that we'll need to have two places where backend config can live.

To me, that is a clear indication that the original design is wrong. We wanted to make sure that whatever we designed would be a wrapper to a Service, because we would really like to keep all of this information aligned with the Service itself.

Firstly, the config for the service itself. This is the service saying "I will accept TLS connections that look like this". This includes TLS version, the CA cert used for the communication, and so on. I think that these are a good fit for a BackendCapabilities/BackendProperties object.

Secondly, the config used to connect to the service. Depending on how the Gateway implementation is interacting with the services, it could be that the implementation owner wants the Gateway to have one client certificate, or many. It could also be that, as @tsaarni says, that the application owner wants every client to have their own, individual client certificate.

For the case that the Gateway implementation has one client certificate across all client connections, that's a implementation-level config.

Yes, we could call that the Gateway's default certificate, used by all clients when nothing else was specified.
We could also have a global default certificate, used by all clients that don't even have a Gateway default certificate.

For the case that the Gateway implementation needs individual client certificates for each backend, I think the right place is actually on the HTTPRoute, since that specifies how the Gateway impl connects to the backend.

No, we want this to apply to more than just HTTPRoute. Unless you want to put backend capabilities on each route type. Again, going back to the model of "wrapper on Service", this just doesn't fit in.

I see this is on the agenda tonight for discussion, so perhaps we can hash it out there.

…nt will need to know to connect to the wrapped service

k8s-ci-robot · 2022-10-17T21:39:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: candita
Once this PR has been reviewed and has the lgtm label, please assign hbagdi for approval by writing /assign @hbagdi in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

costinm · 2022-10-17T23:04:49Z

having a large object with all the possible properties of a backend seems very dangerous, will be hard to evolve it
for TLS1.3 - I don't think it's a common use case for a user to go to each backend and specify the settings. Usually TLS1.3 is required for compliance or because the gateway admin requires 1.3 - and that would be set in the gateway settings.
I'm concerned with client cert specification - it is pretty tricky for the producer to tell the consumer what to use. I would expect any mTLS from gateway to backend to default to the workload identity or be configured on the client side.

costinm · 2022-10-17T23:06:07Z

site-src/geps/gep-1282.md

@@ -90,6 +97,7 @@ We’re looking to add specific, structured extension points somewhere in the re
 Those specific, structured extension points need to be in a place where they can be owned by the person who owns the backend, since that could be different to the person who owns the Route. That is, whatever we choose should be something that extends a thing that is referred to by a `backendRef` in a Route, not inside the Route.

 The initial list includes, but is not limited to:
+
 * TLS information for connection from the Gateway to the backend service. Note that this doesn’t include any information used for service mesh encryption, just what a Gateway’s proxy would need to be able to connect to the backend service.
 * Websocket protocol information for the backend service.


I don't think mixing TLS info with protocol info is ideal. TLS is pretty advanced and scary for many users.
Also in many cases this is negotiated at protocol level, not config ( ALPN, etc)

costinm · 2022-10-17T23:09:43Z

site-src/geps/gep-1282.md

+
+    // Kind is kind of the target resource.
+    // Valid values include:
+    // * "Service" (the default if no Kind is specified)


How can it be a Service if it's a backend/workload capability ? Deployment/ReplicaSet/etc - or pod label selector maybe ?

If you want this to be a Service extension - call it ServiceProperties or ServiceTLS, etc.

At least in Istio we make a very big distinction between workloads - pods selected by label - and Services. I think many users treat Service as a frontend - and the selected Pods as backend.

This is absolutely the Service being used as a backend, as a shortcut for a selector. That's why we called it _Backend_Capabilities, not just _Service_Capabilities. Remember, a Service is both the frontend (what you spend most of your time thinking of it as, because that's what Istio needs), and the backend, a bucket of workload endpoints (which is how people who aren't doing service mesh tend to see it more).

Having this reference a Service resource is a way avoid having to store the workload selector in two places, as well as a way to meet people who are using Ingress controllers for north-south traffic more than they are Services for east-west traffic.

costinm · 2022-10-17T23:14:30Z

site-src/geps/gep-1282.md

+type EncryptType struct {
+    // The minimal TLSVersion that is accepted by the endpoint.
+    // If empty, there is no minimal backend TLSVersion.
+    // +optional	


I don't know if this is something each individual service owner should mess with and configure individually.

TLS compliance is an admin choice, may be configured on the Gateway or on cluster or implementation-specific config. In most cases we want to prevent service owners to downgrade security or break the compliance requirements by missconfiguration.

This is even a bigger problem for CipherSuites - do we expect each individual backend owner to understand what cipher suites are safe and configure them ? This is something very few people know or should mess with.

I think maybe there's some confusion - I don't think I've ever intended this to be the only way that TLS will be enabled to backends. This construct is to cover when the service owner wants to manage their own TLS, otherwise the Gateway would be able to contact the service directly. (I also include in that case the mesh-style magic infrastructure that will transparently encrypt connections for you).

Right now, there is no way to represent the (pretty common) case of someone saying "my service is running its own TLS, I need it exposed via a Gateway". That's the primary use case we're trying to solve here - the extra applicatibility is useful but not essential.

I guess another way to say this is - regardless of what service owners should be doing, they absolutely do want to do what this object is providing, already.

costinm · 2022-10-17T23:22:29Z

site-src/geps/gep-1282.md

+    // certificate chain of trust.  If empty, the system root certs should be used.
+    //
+    // +optional	
+    CACerts      *SecretObjectReference `json:"caCerts,omitempty"`


That's the biggest problem - we can't have a SecretObjectReference crossing a namespace boundary. We had CVEs related to this, it's not good.

Also not sure what it means 'trusted by both' - I suspect what you want to define here is the root CA in case the backend is using a private root. SPKI or the public key or cert - inline, like it is used in kubeconfig - may work.
However I don't know if this API is the proper place to define root key managed and distribution - there is a working
group I think working on certs and security.

I don't disagree that supporting serlf-signed or private root is important - just that PKI config is probably better as a separate concern, not part of backend properties - and can't be based on cross namespace Secret references ( or expectation that each namespace has a secret with that name).

This Secret reference is not for some other service to connect to this one, it's for the Gateway controller (which already needs cluster-wide access to secrets to be able to do its job). The main, relevant client here is the Gateway controller.

And even if users do need to do a cross-namespace Secret reference, that's exactly what ReferenceGrant is for, to enable the controlled use of Secrets or other resources across namespaces. It seems reasonable to assume that, for self-signed certs, a CA cert might live in its own namespace, with a ReferenceGrant then being required to allow access to it. Or, of course, a controller could just copy that into relevant namespaces too.

Lastly, this is a CA certificate chain, not a key, so it's intended to be public. We could use another object (maybe a configmap?) to store it if necessary.

costinm · 2022-10-17T23:24:56Z

site-src/geps/gep-1282.md

+    // ServiceNames provide multiple named domains which will share a certificate
+    // for the endpoint.
+    // +optional
+    ServiceNames []string `json:"serviceNames,omitempty"`


Why ? How does it help the caller to know ahead of time the list that will be returned in the cert anyways ?

You mean 'SANNames' - for supporting spiffee or equivalent ? That may be useful.
Do you mean this as an override of the normal TLS server verification - where gateway talk with foo.ns.svc.cluster.local but get a certificate for example.com - I think it's important to be very clear with this, security is tricky.

In this case, it's to help the Gateway implementation so that it can determine configuration validity before allowing any connections at all. I'm so-so on this one, because using it makes the hostname match a three-part one (Listener, HTTPRoute, BackendCapabilities), rather than a two-part one, which is already hard to do. The upside is that it means that implementations can easily check and make clear at a configuration level when something is not going to work, rather than waiting to have client requests fail and needing to troubleshoot that.

youngnick · 2022-10-21T05:16:37Z

having a large object with all the possible properties of a backend seems very dangerous, will be hard to evolve it

This is a fair point, but the other way to do it - having a separate resource for each type of config - also has problems.

Let's review:

Have one large object:
Pros:

The "add a nil-able struct for each config type" pattern lets each type of information be reasonably orthogonal to the others, and allows for straightforward expansion.
Having only one resource to watch means there's less code churn for Gateway implementations.
Having only one extra wrapper object to look for makes it easier for users to know where to look.
Cons:
Object could become unusably large.

Have more, smaller objects:
Pros:

Each object is tightly contained, and has a smaller failure domain.
Cons:
Each extra resource to add is another round of things for each implementation to add.
Each extra resource is another kubectl command for users looking for info as to what went wrong.

Personally, I'd prefer to stick with the single large object for a while, because I think the implementation code churn and user experience costs will rapidly increase over time.

for TLS1.3 - I don't think it's a common use case for a user to go to each backend and specify the settings. Usually TLS1.3 is required for compliance or because the gateway admin requires 1.3 - and that would be set in the gateway settings.

Again, the idea here is to make it so that for a cluster that requires TLS 1.3, but doesn't provide infrastructure to do it automatically, it's possible to use a Gateway implementation to both connect and to enforce those restrictions.

I'm concerned with client cert specification - it is pretty tricky for the producer to tell the consumer what to use. I would expect any mTLS from gateway to backend to default to the workload identity or be configured on the client side.

The idea here is to enable clients to give a Gateway some details about what to use to connect. Maybe we don't have it supply a certificate with a keypair, maybe we do. Having a magic keypair that allows you to connect to a service available in the same namespace as the service seems like a security problem, but the other problem is that, if you do want each client of the backend service to use an individual certificate, how do you tell the Gateway which one to use for this backend? It could be that this is the point at which we throw up our hands and say "if you want that, you want a service mesh, SPIFFE and SPIRE, or some other more advanced thing" though.

youngnick · 2022-11-15T01:12:21Z

Okay, after discussion in the community meeting today, we're working on collecting TLS use cases, along with owners etc, in this new document:
https://docs.google.com/document/d/17sctu2uMJtHmJTGtBi_awGB0YzoCLodtR6rUNmKMCs8/edit?usp=sharing

/hold
this until we figure out what the use cases mean.

candita · 2023-01-24T20:35:10Z

In yesterday's meeting we decided to officially drop GEP-1282.

GEP-1282 Backend Properties - Update implementation

b327b15

k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 3, 2022

k8s-ci-robot requested review from howardjohn and youngnick October 3, 2022 22:23

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 3, 2022

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 3, 2022

This was referenced Oct 4, 2022

Provide a way to configure TLS from a Gateway to Backends #1067

Closed

Document Self Service TLS Certificate Approaches #763

Closed

Allow application developer to specify TLS client certificate #622

Closed

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 5, 2022

robscott reviewed Oct 5, 2022

View reviewed changes

tsaarni reviewed Oct 5, 2022

View reviewed changes

candita added 2 commits October 10, 2022 17:59

101022 First round of changes from feedback

67366d5

101022 Second round of changes from community meeting feedback

db7b3f3

howardjohn reviewed Oct 11, 2022

View reviewed changes

youngnick requested changes Oct 14, 2022

View reviewed changes

GEP-1282 Backend Properties -focus the fields in on the things a clie…

07bf512

…nt will need to know to connect to the wrapped service

costinm reviewed Oct 17, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2022

candita closed this Jan 24, 2023

candita mentioned this pull request Sep 22, 2023

REQUEST: New membership for candita kubernetes/org#4477

Closed

9 tasks

	capabilities. In this discussion, the capabilities are TLSReencyrpt, Websocket, and Protocol, but additional
	capabilities. In this GEP, the proposed capabilities are TLSReencyrpt, Websocket, and Protocol, but additional

		// +optional
		Reencrypt *TLSReencryptType `json:”reencrypt,omitempty”`

		@@ -98,12 +106,297 @@ Of course, from the section above, you can see that there are many other feature

		## API

GEP-1282 Backend Properties - Update implementation #1430

GEP-1282 Backend Properties - Update implementation #1430

Conversation

candita commented Oct 3, 2022

k8s-ci-robot commented Oct 3, 2022

robscott commented Oct 5, 2022

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsaarni Oct 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsaarni Oct 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsaarni Oct 13, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

howardjohn Oct 14, 2022 • edited

Choose a reason for hiding this comment

youngnick Oct 14, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youngnick commented Oct 14, 2022

tsaarni commented Oct 14, 2022

youngnick commented Oct 17, 2022

candita commented Oct 17, 2022

tsaarni Oct 5, 2022 •

edited

tsaarni Oct 5, 2022 •

edited

tsaarni Oct 13, 2022 •

edited

howardjohn Oct 14, 2022 •

edited

youngnick Oct 14, 2022 •

edited