Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1282 Backend Properties - Update implementation #1430

Closed

Conversation

candita
Copy link
Contributor

@candita candita commented Oct 3, 2022

What type of PR is this?
/kind gep

What this PR does / why we need it:

Add proposed API implementation for Backend Properties.

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 3, 2022
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 3, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @candita. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@robscott
Copy link
Member

robscott commented Oct 5, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 5, 2022
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this @candita!

architectural choice is made because we would ideally like to locate this capabilities set within the Service but until
that is possible in Kubernetes, we can start with a wrapper to the Service.

BackendCapabilities contains a BackendCapabilitiesSpec object as mentioned in Option 2, defining each of the different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but I think "Option 2" may be a holdover from the doc and not useful in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

that is possible in Kubernetes, we can start with a wrapper to the Service.

BackendCapabilities contains a BackendCapabilitiesSpec object as mentioned in Option 2, defining each of the different
capabilities. In this discussion, the capabilities are TLSReencyrpt, Websocket, and Protocol, but additional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
capabilities. In this discussion, the capabilities are TLSReencyrpt, Websocket, and Protocol, but additional
capabilities. In this GEP, the proposed capabilities are TLSReencyrpt, Websocket, and Protocol, but additional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

// +required
BackendType BackendTargetRef `json:”backendType”`

// Spec specifies the capabilities supported by this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment applies to full doc, the combination of markdown and go indentation is always painful, but in this case it looks like there's a mix of tabs and spaces. Mind updating this to use spaces (or tabs) everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

Comment on lines 118 to 127
// BackendName is the name of the backend service to which these backend
// capabilities apply.
// +required
BackendName string `json:”backendName”`

// BackendType is the type of backend to which these backend
// capabilities apply.
//
// +required
BackendType BackendTargetRef `json:”backendType”`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these fields belong in spec. I'd recommend a single backendRef field that was a subset of our BackendObjectReference type. I know we don't want a namespace field, and we likely don't want a port field, but the rest of that object should be something we can copy verbatim here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes. Lmk what you think.

Comment on lines 142 to 144
// Status represents the current state of the backend and its capabilities.
// +optional
Status *BackendCapabilitiesStatus `json:”status,omitempty”`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe copy ReferenceGrant and don't include status until we know what it will look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how Gateway API has a status for nearly every object, and was hoping to use that from the beginning, even if there was just one condition listed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be an example of a useful status condition that would apply to a BackendCapabilities and not to the HTTPRoute that targets the Service that the BackendCapabilities modifies?

Comment on lines 184 to 185
// +optional
Reencrypt *TLSReencryptType `json:”reencrypt,omitempty”`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend starting with only this, since that seems to be the concept people are most interested in. As a broader note, I'd recommend trying to make these type definitions as close to final as possible, including as much detail in the comments as we have for other type definitions. For example, hostnames is quite verbose, but most fields have at least 2-3 sentences describing how we'd expect them to be implemented.

Also a tiny nit on naming - this is not necessarily reencryption. We really just want to provide configuration that enables the hop from the Gateway to this backend to be encrypted, regardless of if the initial hop was encrypted.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this was also discussed in #1285 (comment). To me reencrypt is quite misleading since, as said, it is just about TLS for backend/upstream hop. Personally, I would rather vote something neutral like tls, with a description that it can be used also for re-encrypt use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in https://gateway-api.sigs.k8s.io/guides/tls/?h=tls#clientserver-and-tls, since we call the first hop Terminate, how about calling the second hop Extend or Forward?

I hesitate to call it tls because we already use that in Gateway.spec.Listeners

TLS *GatewayTLSConfig `json:"tls,omitempty"`
and documented in https://gateway-api.sigs.k8s.io/guides/tls/?h=tls#clientserver-and-tls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used TLS, and updated some more of the godoc. Please take a look.

}

type ReencryptType struct {
Version *TLSVersion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think many backends support a range of TLS versions, it may be worth allowing room for that here. As a broader question, is this absolutely required? Could we live without it in a v1?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think having only TLSVersion is bit ambiguous. Usually I see something like MinTLSVersion, MaxTLSVersion and also a list of AllowedCipherSuites (and so far, ciphers have no effect for TLSv1.3).

Maybe for TLS clients, like we have here, these settings are bit "less visible" than for servers: servers immediately get caught when running TLS scanners towards them and users will request for configurability. In case of clients, the lack of configurability might go under the radar for a while. Regardless, having configurability at least for min/max would be very nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. Will replace with min/max version. Optional, so if not specified then there is no check. If anyone feels strongly that we don't need it, we can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and so far, ciphers have no effect for TLSv1.3).

@tsaarni, what do you mean by this?


type ReencryptType struct {
Version *TLSVersion
ClientCertFile string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion in the community meeting yesterday, this doesn't seem like the right place for this config. Maybe the Gateway is a better fit?

Copy link

@tsaarni tsaarni Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just commenting in old (closed) issue #622 (comment) that there might have been some confusion, but since I was not present at the meeting yesterday, I miss the context and maybe I might be just confused myself:

Since this GEP proposes encryption for the cluster-internal network hop from the gateway proxy to the backend, shouldn't the fields here be something like:

  • certificateRef that is a reference to Secret of type: kubernetes.io/tls, where the secret includes keys tls.crt and tls.key which are the client credentials that gateway should use when establishing TLS connection to said backend service
  • caCertificateRef that is a reference to Secret, where the secret includes key ca.crt which is a CA bundle for validating the backend server certificate.

Sidenote: there is no common API / agreement about the CA certificate key name. For example, certificate-manager can store it in type: kubernetes.io/tls with name ca.crt even though type: kubernetes.io/tls does not recognize key with that name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes. Let me know what you think.

Comment on lines 197 to 198
PrivateKeyFile string
BackendCertFile string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these for? It may be helpful to see an e2e use case where all of these fields were used with an example of who populated each of these. As a broader note, these feel like something we should derive from Secret/Object refs, not file paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make some changes on this, using SecretObjectReference from

type SecretObjectReference struct {

Suggested change
PrivateKeyFile string
BackendCertFile string
type ReencryptType struct {
// +optional
MinVersion *TLSVersion
// +optional
MaxVersion *TLSVersion
// +optional
ClientCerts []SecretObjectReference
// +optional
BackendCerts []SecretObjectReference
// +optional
CaCerts []SecretObjectReference
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it again.

ClientCertFile string
PrivateKeyFile string
BackendCertFile string
CaCertsFile []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be a list of CaCerts Files? Would it be sufficient to have a single CA Bundle referenced with an object ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to bundle.

Comment on lines 363 to 374
backendCapabilities:
tls-reencrypt:
- name: version
value: 1.3
- name: clientCert
value: /path/to/cert.pem
- name: privateKey
value: /path/to/pkey.pem
- name: caCerts
value: /path/to/cacerts.pem
- name: backendCert
value: /path/to/backendCert.pem
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely a strange question: what is the syntax used in the examples in this chapter? :-) I was rather expecting to see (partial) Kubernetes manifests matching the API / CRD proposal earlier in this document instead of general name/value records.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an Alternative option that was considered, a map of structs, offering the most minimal set of predefined fields. We will not likely choose this, but if anyone has a preference for Alternative 1, now would be the time to discuss that.


type ReencryptType struct {
Version *TLSVersion
ClientCertFile string
Copy link

@tsaarni tsaarni Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just commenting in old (closed) issue #622 (comment) that there might have been some confusion, but since I was not present at the meeting yesterday, I miss the context and maybe I might be just confused myself:

Since this GEP proposes encryption for the cluster-internal network hop from the gateway proxy to the backend, shouldn't the fields here be something like:

  • certificateRef that is a reference to Secret of type: kubernetes.io/tls, where the secret includes keys tls.crt and tls.key which are the client credentials that gateway should use when establishing TLS connection to said backend service
  • caCertificateRef that is a reference to Secret, where the secret includes key ca.crt which is a CA bundle for validating the backend server certificate.

Sidenote: there is no common API / agreement about the CA certificate key name. For example, certificate-manager can store it in type: kubernetes.io/tls with name ca.crt even though type: kubernetes.io/tls does not recognize key with that name.

Protocol *ProtocolType `json:”protocol,omitempty”`
}

type ReencryptType struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to add optional subjectName for server certificate validation? By default, HTTPS clients are expected to validate that the identity in server certificate matches with the server name in URI. Sometimes there might be a need to overwrite this behavior and compare the identity to a name that was provisioned manually, instead of using the backends DNS name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added subjectAlternativeName and serverNameIndication. One of these (or both) might provide what is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed both and replaced with ServiceNames.

// +optional
MaxVersion *TLSVersion `json:"maxVersion,omitempty"`

// CipherSuite is the minimum cipher suite that the backend accepts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no "minimum" cipher suite as far as I can tell; this should be a list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this should be []string so that it can store a list of ciphers in the usual format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLS 1.3 only accepts a subset of what 1.1/1.2 accept. I changed this to a list so the implementation can validate whatever cipher suites are included. However, if they were left empty, and there is no min or maxVersion, the implementation would need to determine which cipherSuites to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLSv1.2 cipher suites and TLSv1.3 cipher suites don't map one-to-one. One option would be to have two separate fields for TLSv1.2 (and earlier) and for TLSv1.3. That's what HAProxy does:

// GatewayCert is the certificate used by the Gateway to prove its identity to
// the backend, if the backend so requires.
// +optional
GatewayCert *SecretObjectReference `json:"gatewayCert,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be specific to gateway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is more describing a client certificate, but I think that this one can probably be elided for now - the behavior we want is that the Gateway holds a client certificate that can be used with the CACert specified in this object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


// SubjectAltNames allow multiple named domains to use a single certificate.
// +optional
SubjectAltNames []string `json:"subjectAltNames,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this match? Are prefix matches allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the certificate is for a wildcard domain like *.example.com then prefix matches like abc.example.com and def.example.com are allowed. Should I be specific in the godoc for this?

// +optional
GatewayCert *SecretObjectReference `json:"gatewayCert,omitempty"`

// BackendCerts are the CA certificates used by the gateway to authenticate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused how this differs from CACerts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is more intended to hold the serving certificate being used; it's probably not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

// BackendCerts are the CA certificates used by the gateway to authenticate
// with the backend. This certificate allows the Gateway to verify and trust the
// backend certificate.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if not set? We don't validate, or use system root certs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CACerts key, I think that if that field is set, it should be the entire trust chain used for certificate authentication in both directions, and if not set, system root certs should be used. This will let users use Let's Encrypt certs from cert-manager or similar for their service if that's what they want.

@@ -98,12 +106,297 @@ Of course, from the section above, you can see that there are many other feature

## API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this on the call but want to re-iterate so its in writing: I think BackendCapabilities is the wrong abstraction for this API.

What BackendCapabilities says is "Here are the properties of the backend". This may be something like "Listens on TLS1.2 or TLS1.3 over HTTP/2" or "Supports HTTP/1.1 and Http/2", etc.

What we want to do, as implementors, is program a client to decide how to connect. With the information in this API, we don't have enough data to know how to program a client.

If a backend says they support HTTP/1.1 and HTTP/2, how do I know which one to send?

If a backend says it supports TLS, I have the same problem. As a gateway, I can either forward a request as is or add a layer of TLS. But in many cases, we won't actually know if the original request already is TLS encrypted, so we don't know whether we should add a layer of TLS or not.

Ultimately, this configuration is a consumer configuration; by treating it as a producer configuration we are not getting the functionality required to implement this effectively.

Note: you may look at DestinationRule and make the same argument, but actually it is a consumer configuration that allows the producer to set defaults for consumers; for reasons mentioned above, setting a default policy of "TLS encrypt" would almost certainly break users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @howardjohn for writing this out, I understand the concern better.

My viewpoint, and I think @youngnick 's, is that by creating BackendProperties that reference a backend, the developer or admin is telling the Gateway how to interact with the backend. The intention is still to use BackendCapabilities as a wrapper or decorator for the backend.

I may miss some nuances here, but I thought what might happen is that the Gateway (acting in its client role) would lookup and use BackendCapabilites as its reference to the backend. For a HTTPRoute that is not a passthrough (assuming the Gateway knows which traffic is passthrough), if the Gateway finds a BackendCapabilitesSpec for the backend, then it uses that information to decide how to connect. If it doesn't find a spec, then it connects as it would have without knowing the backend capabilities.

Note I started with BackendTargetRef as a part of BackendCapabilities, not BackendCapabilitesSpec. Later I moved it into the spec by request from @robscott (#1430 (comment)). Does hiding the BackendTargetRef within BackendCapabilitesSpec make programming a client more difficult?

Is there any way to salvage the BackendCapabilies as an abstraction? We can definitely add more to make it work properly if it is still missing too much. Maybe we need to store more of the handshake information for TLS, there are a few pieces missing, like details on cipher suite, compression, curve, etc. And there are few pieces that still need tweaking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn, there are a few reasons why we're doing this with a separate resource:

  • To avoid adding fields to the backendRef that may not always make sense there
  • To avoid having to set fields individually for each backendRef - this cost increases for each time a backend is referenced in a Route config.
  • To make it so that this config is available to other things than HTTPRoute without reimplementing in every Route type

The basic idea is that there's an object, owned by the owner of the Service that tells consumers what they can do when they connect to the service. You're one hundred percent right that these are consumer rules, but the only person who knows what these rules should be is the person who sets up the Service - the Producer.. So this is a way for the service owner/producer to tell Service consumers about how to connect to the service, in a structured way.

It seems to me like having the Route/Gateway need to guess at these rules is not ideal, unless we assume that the Route owner always is the Service producer.

If that is the case, then we can easily do this by adding fields to backendRef. The tradeoff there is that now, everywhere that uses backendRef needs to have rules about what happens when you specify TLS settings, or websocket settings, or whatever. So if you use your backend in more than one backendRef, you now need to duplicate all these settings.

Copy link

@tsaarni tsaarni Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem just that calling this "backend properties/capabilities" is guiding towards wrong abstraction? For example, the description currently in the TLS fields says:

    // The minimal TLSVersion that is accepted by the backend.
    ...
    // CACerts are the Certificate Authority, trusted by both Gateway and backend,

This wording makes sense if one would really describe capabilities of the backend. But since we are programming the client/gateway/proxy, speculating about the backend capabilities seems bit unexpected. We end up deriving client configuration from server capabilities?

What I was expecting to see was simply properties applied to Gateway's backend connection(s) when communicating towards specific backend(s). Something like "the minimum TLSVersion used by the Gateway when connecting to the backend" and "Certificate Authorities used by Gateway to verify backend server certificate". I'm not sure if the same holds to everything, but at least for backend TLS hop it seems so.

Edit: If one would use this same abstraction to configure cluster external properties as well (such as letting backend developer to provision external server certs #763), then "backend connection properties" would be misleading name as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is not to make every use in backendRef. The core relationship of consumer + producer rules, IMO, is "Producer sets defaults, can be overriden by consumer" (which is essentially what policy attachment is all about, as well). Its not necessarily either-or.


For a HTTPRoute that is not a passthrough (assuming the Gateway knows which traffic is passthrough), if the Gateway finds a BackendCapabilitesSpec for the backend, then it uses that information to decide how to connect.

Focusing overly on HTTPRoute (here and in other GEPs, we tend to) is not always the best path. What if the route is a TCPRoute? The contents are opaque - we cannot tell if its TLS or not.

Even if its a TLSRoute - you know it has a layer of TLS, but how many? How many does the backend expect? May sound crazy, but its not uncommon to have multiple layers of TLS at this point with transparent mTLS, things like onion routing, etc.

But most importantly - backendProperties, as proposed, are a property of the backend. That means there is no relationship to Gateway or Routes at all. These properties should be respected by any Kubernetes cluster that can support them, not just N/S traffic (mesh).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was expecting to see was simply properties applied to Gateway's backend connection(s) when communicating towards specific backend(s). Something like "the minimum TLSVersion used by the Gateway when connecting to the backend" and "Certificate Authorities used by Gateway to verify backend server certificate". I'm not sure if the same holds to everything, but at least for backend TLS hop it seems so.

Is that not what this proposal is doing? Is it just the name BackendCapabilities that is the sticking point? The idea here is to allow the owner of the service to say "When connecting to the thing that's running inside the pod. do these things" - have a minimum TLS version, use this CA cert and so on. If it's just that the word Capabilities is too overloaded, and we need to change the name, I don't think that's a big deal.

@tsaarni, in the quote you've given, I think it's equally correct to have that say:

    // The minimal TLSVersion required to connect to the backend.
    ...
    // CACerts are the Certificate Authority, that the serving certificate for the backend service must be validated against.

Both of those properties are things that are set by the owner of the Service, but are needed by clients that connect to the Service. Similarly, the point of having websocket details in a resource like this is so that the owner of the Service can say "My service will accept websockets at the following paths", so that clients (like a Gateway proxy) know what to do for those paths.

Again, the point here is to take a whole bunch of configuration that, in the Ingress world, was often done via annotations on the Service, and make it structured. Ideally, we would put this inside the Service in structured fields, but we can't easily change the Service spec, so we've opted for a "decorator" style CRD.

If it's just that the name BackendCapabilities is creating the wrong abstraction, that's fine, but we need some guidance around what else to call it.

@howardjohn, I'm not quite sure I understand what you're asking for. Are you saying that this resource should be called BackendHTTPProperties or something, and only be relevant when it's used as a backendRef in a HTTPRoute? Or are you saying that we should define semantics for the other Route types?

It's clear that you disagree with the current approach, but I am not sure what actions we need to take to move towards something we can agree on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consumers need to be able to control the settings, not just the service producers.

I am definitely not suggesting making it tied to HTTP - my concern was the opposite. That this won't work for non-http clients because we don't know if we should encrypt or not. But the client knows which is why they need to be able to configure it

Copy link
Contributor

@youngnick youngnick Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I agree that consumers also need a place for some settings like this. But don't we need to have the service producer's settings available at least, and then we can see what things make sense to allow in other places? I'll be honest, in my head I kind of thought that Policy Attachment could be used to set things from the Gateway side - so the producer can say "I accept TLS 1.2 or greater", and a Policy could say "All Routes attached to this Gateway will default to TLS 1.3" and then things would work. But without the bit where the producer says what they expect, then you can't even do simple cases, where there's no extra TLS wrapping etc.

If this needs explanation in the document as a place for future work, I think that's fine. But I don't know how we solve the basic problem of "My service is doing its own TLS, and proxy connections from a Gateway won't work unless it knows about it" without something like what we have here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you but I don't think the GEP behaves that way? Or I read it wrong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have anything in the GEP right now about Policy Attachment and consumer settings, but that's probably worth adding.

I'll review the GEP again and see if I can see any other changes, and then come back here.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time thinking about this proposal some more after some back-and-forth in the comments.

I think it's important that we focus the fields in on "the things a client will need to know to connect to the wrapped service". Since a Gateway is a client, that will enable the Gateway to do reencrypt, and also allow other uses of this object.

This pattern - that the BackendCapabilities describes the information you need to connect to the backend is what I had in mind when I wrote up GEP-1282 originally. I think in the discussions, the original intent has got a bit muddled up - this is on me for not being clearer to everyone, sorry.

It could be that BackendCapabilites should really be called BackendConnectionInfo or the more simple BackendProperties, since it seems that Capabilities is a slightly overloaded term. I'd like to discuss this some more in the next community meeting.

For the actual spec though, the good news is that it's only a limited set of things that we need to put in the struct to describe the information you need to know to connect to a TLS backend.

  • MinVersion and MaxVersion are great and should stay.

Then, we just need:

  • CipherSuites as a list of strings, so that we can store it as a list of cipher names.
  • CACerts - you have this as a list, which I think is fine, but it could also be a single secret containing a PEM bundle. (That's how other Secret references in Gateway API work, if you want to put multiple certs in them, just use a PEM bundle).
  • SubjectAltNames - I think we should maybe leave this out for the first cut. If we do include it, we should call it ServiceNames and specify that it contains a list of all the names the service will respond to (that is, that it has certificates for.) That way, it can be used by a Gateway implementation to check if the Listener and/or Route hostnames match the backend. That does mean it will be a three-way hostname match, which could get pretty complicated.
  • I think we should leave out the other fields - there's no need to supply the serving cert (in the BackendCert field currently), that gets sent as part of the handshake, and the SNI is covered by the ServiceNames field if we put it in. The GatewayCert is configuration that should live somewhere on the Gateway - it's up to the Gateway owner and the implementation to wire a client certificate that matches the CACert to the Gateway's data plane, so that requests can be made with a correct client certificate. I also don't think that we should add in client SNI checking yet, this can be added later if we think that this model will work.

// +optional
MaxVersion *TLSVersion `json:"maxVersion,omitempty"`

// CipherSuite is the minimum cipher suite that the backend accepts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this should be []string so that it can store a list of ciphers in the usual format.

// +optional
GatewayCert *SecretObjectReference `json:"gatewayCert,omitempty"`

// BackendCerts are the CA certificates used by the gateway to authenticate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is more intended to hold the serving certificate being used; it's probably not necessary.

// GatewayCert is the certificate used by the Gateway to prove its identity to
// the backend, if the backend so requires.
// +optional
GatewayCert *SecretObjectReference `json:"gatewayCert,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is more describing a client certificate, but I think that this one can probably be elided for now - the behavior we want is that the Gateway holds a client certificate that can be used with the CACert specified in this object.

// BackendCerts are the CA certificates used by the gateway to authenticate
// with the backend. This certificate allows the Gateway to verify and trust the
// backend certificate.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CACerts key, I think that if that field is set, it should be the entire trust chain used for certificate authentication in both directions, and if not set, system root certs should be used. This will let users use Let's Encrypt certs from cert-manager or similar for their service if that's what they want.

@youngnick
Copy link
Contributor

Oh, lastly, I think that we probably need a bit of discussion about how this will interact with Policy Attachment.

In short, I think it should be something like:

  • BackendCapabilities sets what the service producer allows/wants for connections.
  • Policy Attachment may further default or limit the set of connection properties at a Route, Gateway, or GatewayClass level, using the defaults and overrides functionality.

I can explain this further in a meeting after I've thought it through some more.

@tsaarni
Copy link

tsaarni commented Oct 14, 2022

May I ask one clarification for the GEP update? In the GEP is said that the initial list of properties to add includes "TLS information for connection from the Gateway to the backend service" which I thought would include client credentials. However, I see from comments from @howardjohn and @youngnick that this is not the case. @youngnick writes:

The GatewayCert is configuration that should live somewhere on the Gateway - it's up to the Gateway owner and the implementation to wire a client certificate that matches the CACert to the Gateway's data plane, so that requests can be made with a correct client certificate.

What I think is meant here is that Cluster Operator who creates Gateways would provision the client certificate. Cluster Administrator then gives (via some out-of-band communication mechanism) the CA certificate to the Application Developer (service provider) who needs to configure that to their backend to allow it to validate Gateway's client certificate. Is that the expected workflow?

My question is coming from the "self-service" perspective: There is a likelihood that the Application Developer (service provider) would like to provision their own client certificate for Gateway, without needing to coordinate this with the Cluster Administrator. Of course, I'm not sure what @candita's expectation was for the workflow?

Also, if it is Cluster Administrator who provisions the client certificate for Gateway, I'm wondering why Application Developer is in control of TLS versions that Gateway uses, or cipher suites. What drives the decision which TLS properties are managed by which persona? All these parameters go into client's SSL context.

@youngnick
Copy link
Contributor

I think that a part that is confusing here is that it seems likely that we'll need to have two places where backend config can live.

Firstly, the config for the service itself. This is the service saying "I will accept TLS connections that look like this". This includes TLS version, the CA cert used for the communication, and so on. I think that these are a good fit for a BackendCapabilities/BackendProperties object.

Secondly, the config used to connect to the service. Depending on how the Gateway implementation is interacting with the services, it could be that the implementation owner wants the Gateway to have one client certificate, or many. It could also be that, as @tsaarni says, that the application owner wants every client to have their own, individual client certificate.

For the case that the Gateway implementation has one client certificate across all client connections, that's a implementation-level config.

For the case that the Gateway implementation needs individual client certificates for each backend, I think the right place is actually on the HTTPRoute, since that specifies how the Gateway impl connects to the backend.

In this model, the BackendCapabilities is the Service resource owner saying what the serving properties are (what TLS versions may possibly be used, what the serving cert CA is, and for cases like websockets, what the paths that support websockets are), and the HTTPRoute would hold the client properties (what version should actually be used by the Gateway impl to connect, what client certificate should be used to connect, and so on).

The difference between the two is that the BackendCapabilities information is (at least theoretically) useful to other things than a Gateway, and the HTTPRoute information is needed by things that want to connect.

I'm kind of making this up as I go along - this is a classic case that @candita had the very reasonable requirement that she wanted to be able to describe how to do a backend TLS connection in the API at all, and we've ended up stuck on definitional and semantic issues about where config belongs.

@candita
Copy link
Contributor Author

candita commented Oct 17, 2022

I think that a part that is confusing here is that it seems likely that we'll need to have two places where backend config can live.

To me, that is a clear indication that the original design is wrong. We wanted to make sure that whatever we designed would be a wrapper to a Service, because we would really like to keep all of this information aligned with the Service itself.

Firstly, the config for the service itself. This is the service saying "I will accept TLS connections that look like this". This includes TLS version, the CA cert used for the communication, and so on. I think that these are a good fit for a BackendCapabilities/BackendProperties object.

Secondly, the config used to connect to the service. Depending on how the Gateway implementation is interacting with the services, it could be that the implementation owner wants the Gateway to have one client certificate, or many. It could also be that, as @tsaarni says, that the application owner wants every client to have their own, individual client certificate.

For the case that the Gateway implementation has one client certificate across all client connections, that's a implementation-level config.

Yes, we could call that the Gateway's default certificate, used by all clients when nothing else was specified.
We could also have a global default certificate, used by all clients that don't even have a Gateway default certificate.

For the case that the Gateway implementation needs individual client certificates for each backend, I think the right place is actually on the HTTPRoute, since that specifies how the Gateway impl connects to the backend.

No, we want this to apply to more than just HTTPRoute. Unless you want to put backend capabilities on each route type. Again, going back to the model of "wrapper on Service", this just doesn't fit in.

I see this is on the agenda tonight for discussion, so perhaps we can hash it out there.

…nt will need to know to connect to the wrapped service
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: candita
Once this PR has been reviewed and has the lgtm label, please assign hbagdi for approval by writing /assign @hbagdi in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@costinm
Copy link

costinm commented Oct 17, 2022

  • having a large object with all the possible properties of a backend seems very dangerous, will be hard to evolve it
  • for TLS1.3 - I don't think it's a common use case for a user to go to each backend and specify the settings. Usually TLS1.3 is required for compliance or because the gateway admin requires 1.3 - and that would be set in the gateway settings.
  • I'm concerned with client cert specification - it is pretty tricky for the producer to tell the consumer what to use. I would expect any mTLS from gateway to backend to default to the workload identity or be configured on the client side.

@@ -90,6 +97,7 @@ We’re looking to add specific, structured extension points somewhere in the re
Those specific, structured extension points need to be in a place where they can be owned by the person who owns the backend, since that could be different to the person who owns the Route. That is, whatever we choose should be something that extends a thing that is referred to by a `backendRef` in a Route, not inside the Route.

The initial list includes, but is not limited to:

* TLS information for connection from the Gateway to the backend service. Note that this doesn’t include any information used for service mesh encryption, just what a Gateway’s proxy would need to be able to connect to the backend service.
* Websocket protocol information for the backend service.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mixing TLS info with protocol info is ideal. TLS is pretty advanced and scary for many users.
Also in many cases this is negotiated at protocol level, not config ( ALPN, etc)


// Kind is kind of the target resource.
// Valid values include:
// * "Service" (the default if no Kind is specified)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be a Service if it's a backend/workload capability ? Deployment/ReplicaSet/etc - or pod label selector maybe ?

If you want this to be a Service extension - call it ServiceProperties or ServiceTLS, etc.

At least in Istio we make a very big distinction between workloads - pods selected by label - and Services. I think many users treat Service as a frontend - and the selected Pods as backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely the Service being used as a backend, as a shortcut for a selector. That's why we called it _Backend_Capabilities, not just _Service_Capabilities. Remember, a Service is both the frontend (what you spend most of your time thinking of it as, because that's what Istio needs), and the backend, a bucket of workload endpoints (which is how people who aren't doing service mesh tend to see it more).

Having this reference a Service resource is a way avoid having to store the workload selector in two places, as well as a way to meet people who are using Ingress controllers for north-south traffic more than they are Services for east-west traffic.

type EncryptType struct {
// The minimal TLSVersion that is accepted by the endpoint.
// If empty, there is no minimal backend TLSVersion.
// +optional
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is something each individual service owner should mess with and configure individually.

TLS compliance is an admin choice, may be configured on the Gateway or on cluster or implementation-specific config. In most cases we want to prevent service owners to downgrade security or break the compliance requirements by missconfiguration.

This is even a bigger problem for CipherSuites - do we expect each individual backend owner to understand what cipher suites are safe and configure them ? This is something very few people know or should mess with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe there's some confusion - I don't think I've ever intended this to be the only way that TLS will be enabled to backends. This construct is to cover when the service owner wants to manage their own TLS, otherwise the Gateway would be able to contact the service directly. (I also include in that case the mesh-style magic infrastructure that will transparently encrypt connections for you).

Right now, there is no way to represent the (pretty common) case of someone saying "my service is running its own TLS, I need it exposed via a Gateway". That's the primary use case we're trying to solve here - the extra applicatibility is useful but not essential.

I guess another way to say this is - regardless of what service owners should be doing, they absolutely do want to do what this object is providing, already.

// certificate chain of trust. If empty, the system root certs should be used.
//
// +optional
CACerts *SecretObjectReference `json:"caCerts,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the biggest problem - we can't have a SecretObjectReference crossing a namespace boundary. We had CVEs related to this, it's not good.

Also not sure what it means 'trusted by both' - I suspect what you want to define here is the root CA in case the backend is using a private root. SPKI or the public key or cert - inline, like it is used in kubeconfig - may work.
However I don't know if this API is the proper place to define root key managed and distribution - there is a working
group I think working on certs and security.

I don't disagree that supporting serlf-signed or private root is important - just that PKI config is probably better as a separate concern, not part of backend properties - and can't be based on cross namespace Secret references ( or expectation that each namespace has a secret with that name).

Copy link
Contributor

@youngnick youngnick Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Secret reference is not for some other service to connect to this one, it's for the Gateway controller (which already needs cluster-wide access to secrets to be able to do its job). The main, relevant client here is the Gateway controller.

And even if users do need to do a cross-namespace Secret reference, that's exactly what ReferenceGrant is for, to enable the controlled use of Secrets or other resources across namespaces. It seems reasonable to assume that, for self-signed certs, a CA cert might live in its own namespace, with a ReferenceGrant then being required to allow access to it. Or, of course, a controller could just copy that into relevant namespaces too.

Lastly, this is a CA certificate chain, not a key, so it's intended to be public. We could use another object (maybe a configmap?) to store it if necessary.

// ServiceNames provide multiple named domains which will share a certificate
// for the endpoint.
// +optional
ServiceNames []string `json:"serviceNames,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ? How does it help the caller to know ahead of time the list that will be returned in the cert anyways ?

You mean 'SANNames' - for supporting spiffee or equivalent ? That may be useful.
Do you mean this as an override of the normal TLS server verification - where gateway talk with foo.ns.svc.cluster.local but get a certificate for example.com - I think it's important to be very clear with this, security is tricky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it's to help the Gateway implementation so that it can determine configuration validity before allowing any connections at all. I'm so-so on this one, because using it makes the hostname match a three-part one (Listener, HTTPRoute, BackendCapabilities), rather than a two-part one, which is already hard to do. The upside is that it means that implementations can easily check and make clear at a configuration level when something is not going to work, rather than waiting to have client requests fail and needing to troubleshoot that.

@youngnick
Copy link
Contributor

  • having a large object with all the possible properties of a backend seems very dangerous, will be hard to evolve it

This is a fair point, but the other way to do it - having a separate resource for each type of config - also has problems.

Let's review:

Have one large object:
Pros:

  • The "add a nil-able struct for each config type" pattern lets each type of information be reasonably orthogonal to the others, and allows for straightforward expansion.
  • Having only one resource to watch means there's less code churn for Gateway implementations.
  • Having only one extra wrapper object to look for makes it easier for users to know where to look.
    Cons:
  • Object could become unusably large.

Have more, smaller objects:
Pros:

  • Each object is tightly contained, and has a smaller failure domain.
    Cons:
  • Each extra resource to add is another round of things for each implementation to add.
  • Each extra resource is another kubectl command for users looking for info as to what went wrong.

Personally, I'd prefer to stick with the single large object for a while, because I think the implementation code churn and user experience costs will rapidly increase over time.

  • for TLS1.3 - I don't think it's a common use case for a user to go to each backend and specify the settings. Usually TLS1.3 is required for compliance or because the gateway admin requires 1.3 - and that would be set in the gateway settings.

Again, the idea here is to make it so that for a cluster that requires TLS 1.3, but doesn't provide infrastructure to do it automatically, it's possible to use a Gateway implementation to both connect and to enforce those restrictions.

  • I'm concerned with client cert specification - it is pretty tricky for the producer to tell the consumer what to use. I would expect any mTLS from gateway to backend to default to the workload identity or be configured on the client side.

The idea here is to enable clients to give a Gateway some details about what to use to connect. Maybe we don't have it supply a certificate with a keypair, maybe we do. Having a magic keypair that allows you to connect to a service available in the same namespace as the service seems like a security problem, but the other problem is that, if you do want each client of the backend service to use an individual certificate, how do you tell the Gateway which one to use for this backend? It could be that this is the point at which we throw up our hands and say "if you want that, you want a service mesh, SPIFFE and SPIRE, or some other more advanced thing" though.

@youngnick
Copy link
Contributor

Okay, after discussion in the community meeting today, we're working on collecting TLS use cases, along with owners etc, in this new document:
https://docs.google.com/document/d/17sctu2uMJtHmJTGtBi_awGB0YzoCLodtR6rUNmKMCs8/edit?usp=sharing

/hold
this until we figure out what the use cases mean.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2022
@candita
Copy link
Contributor Author

candita commented Jan 24, 2023

In yesterday's meeting we decided to officially drop GEP-1282.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants