Skip to content
This repository has been archived by the owner on Apr 30, 2024. It is now read-only.

Add the ability to create cluster-local certificates by an internal cert-issuer #538

Merged
merged 6 commits into from
Jul 5, 2023

Conversation

ReToCode
Copy link
Member

@ReToCode ReToCode commented May 26, 2023

Context

📝 Please note that this is only a partial view of the hole feature. Please refer to https://github.com/ReToCode/knative-encryption/blob/main/4-final-setup/HOW_DOES_IT_WORK.md for the bigger picture.

Visualization

Changes

  • 🎁 Add the ability to create cluster-local certificates by an internal cert-issuer
  • 🎁 Introduces an additional configuration flag to specify an issuer reference for cluster-local certificates

/kind enhancement

Release Note

`net-certmanager` is now able to generate cluster-local certificates using the new issuer reference `clusterInternalIssuerRef` in `config-certmanager`.
If no `issuerRef` or `clusterInternalIssuerRef` is configured in `config-certmanager` a new internal self-signed `ClusterIssuer` is used. 

Docs

Will be done for the completed feature (when serving and net-* changes are also done).

@knative-prow knative-prow bot added kind/enhancement size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 26, 2023
@codecov
Copy link

codecov bot commented May 26, 2023

Codecov Report

Merging #538 (46ffabb) into main (b6375a7) will increase coverage by 3.53%.
The diff coverage is 82.05%.

@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
+ Coverage   83.96%   87.50%   +3.53%     
==========================================
  Files           5        5              
  Lines         368      392      +24     
==========================================
+ Hits          309      343      +34     
+ Misses         44       34      -10     
  Partials       15       15              
Impacted Files Coverage Δ
pkg/reconciler/certificate/config/cert_manager.go 81.25% <57.14%> (-18.75%) ⬇️
.../certificate/resources/cert_manager_certificate.go 82.94% <87.50%> (+15.35%) ⬆️

@ReToCode
Copy link
Member Author

ReToCode commented May 26, 2023

I decided to go with these changes, regardless of the similarity of #353 and the discussion in https://cloud-native.slack.com/archives/C04LMU0AX60/p1684321868604549. Otherwise the work on full end-to-end internal encryption is kind of blocked until we find some common ground. Changing the configuration logic to a domain based one seems a bit unrealistic at the moment and a bit ambiguous from an users UX.

Please also take a look at https://github.com/ReToCode/knative-encryption/blob/main/4-final-setup/HOW_DOES_IT_WORK.md. I tried to summarise how this is going to look like, as there will be more PRs to networking, Serving and all net-* repos about this.

I'm not sure if we should include a "default" self-signed issuer -> config/knative-cluster-issuer.yaml or if the user just provides it from the docs. WDYT?

/assign @skonto , @dprotaso , @evankanderson , @davidhadas , @nak3 , @KauzClay

@davidhadas
Copy link

Does the readme file of this repo correctly represent what this repo is for?
If not, please fix it so the community can understand this repo.

@davidhadas
Copy link

/hold
Let's discuss this some more.

The result of having two config points where one of them, when used, makes the other one ignored, seems not great...
I would like to understand better if we cant do the config part better.

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 26, 2023
@ReToCode
Copy link
Member Author

Does the readme file of this repo correctly represent what this repo is for? If not, please fix it so the community can understand this repo.

I don't think it does. But I'd like this in a separate PR.

The result of having two config points where one of them, when used, makes the other one ignored, seems not great...
I would like to understand better if we cant do the config part better.

What do you mean by that?

  • The issuerRef is for the existing autoTLS feature of Knative which is TLS for external domains: https://knative.dev/docs/serving/using-auto-tls/
  • The internalIssuerRef is for the cluster-local domains inside the cluster, which we do not have yet.
  • So it seems valid to have two configurations, as a cluster-local CA will probably be different from the external one. E.g. use a self-signed CA locally and use lets-encrypt publicly.

@davidhadas
Copy link

  • The issuerRef is for the existing autoTLS feature of Knative which is TLS for external domains: https://knative.dev/docs/serving/using-auto-tls/
  • The internalIssuerRef is for the cluster-local domains inside the cluster, which we do not have yet.
  • So it seems valid to have two configurations, as a cluster-local CA will probably be different from the external one. E.g. use a self-signed CA locally and use lets-encrypt publicly.

To use Auto TLS before the PR the admin needed for example to do:

  • Create a ClusteIssuer resource in
  • Add a reference to the resource in issuerRef in config-certmanager ConfigMap
  • Set auto-tls: Enabled in config-network ConfigMap resource
  1. Before the PR when an admin was setting auto-tls: Enabled but was not setting issuerRef in config-certmanager ConfigMap, a certificate resource was created with empty strings as issuerRef. I assume that the result was an error in the certmanager logs - now we changed this behavior and we add an error in Knative logs. << I am simply recording this here and making sure this is intentional and not overlooked.
  2. Are we now moving to force admins to always set both issuerRef and issuerRefInternal whenever they are using Auto TLS? (the code now issues errors when any of the two is not configured). Are there no use cases in which only one will be needed? If so, is emitting errors the right approach?
  3. Are we ok with all admins using autoTls today to need to make sure to also set issuerRefInternal before they upgrade to 1.11? This is harsh! Should we not use issuerRef for local as well if local is not set?

@ReToCode
Copy link
Member Author

Ah I see what you mean, here my comments:

  1. Today if an admin does not provide the issuerRef it did not show an error. You are correct that the error is visible in the logs of cert-manager but not in Knative. This seems a bit weird to me (at least I did not find the root cause when I was fiddling around). Doesn't it make sense to have the error in the Knative resources? A user just created those and sees in the status why it breaks. Also it is closer to the actual configuration (in net-certmanager and not cert-manager) as it is not really an issue with cert-manager. WDYT?

  2. I'm missing that one. Where is this the case? AFAIK both are nil if not defined and it only errors when the reference would be used but was not specified:

  1. I also don't see that. AutoTLS is only related to issuerRef, it should still work if the other one is not defined. internalIssuerRef is only used when internal-encryption is enabled, otherwise you would not have a Knative Certificate with the label [networking.VisibilityLabelKey] == VisibilityClusterLocal.

@ReToCode
Copy link
Member Author

ReToCode commented May 30, 2023

Should we not use issuerRef for local as well if local is not set?

and also

I'm not sure if we should include a "default" self-signed issuer -> config/knative-cluster-issuer.yaml or if the user just provides it from the docs. WDYT?

Regarding the defaulting behaviour, I'm unsure what is the best approach. We could actually have "sane" defaults if both are not specified. To use the internal CA to create external domains might work technically, but makes not much sense form an user perspective, as nothing would trust the internal CA out of the box, making the routes unusable without --insecure-....

@davidhadas
Copy link

internalIssuerRef is only used when internal-encryption is enabled,

internal-encryption is deprecated,
So you are suggesting that trustConfig != Disabled controls if local ingress uses TLS and that whenever trustConfig != Disabled we use certificates with visibility of "local" and require that the user also set internalIssuerRef.

Is this what you mean?

@ReToCode
Copy link
Member Author

Is this what you mean?

Yes for net-certmanager. As said in our sync meeting, we could have a built-in provider that generates the certificates, if the user does not provide a specific certificate-class like net-certmanager.

If we do not do that (in general, not specifically here), we enable internal encryption and/or trust, but the service is still called via http.

@davidhadas
Copy link

davidhadas commented May 31, 2023

After this PR we will have two types of certificates: those with visibility=local and those without it.
If a user enables trustConfig != Disabled the user should not be also mandated to set internalIssuerRef instead we need a self-signed certificate to serve the user in this config. A built-in provider can do the job for us.

if the user also sets a certificate-class, for example sets certificate-class=net-certmanager, Ideally we will still have a default by creating a self-signed certificate created by net-certmanager and this default will be used if the user left the internalIssuerRef unconfigured.

I think we need to try and avoid having config points where one config requires the user to always do some other config and without it, the system is left in an error state.

Bottom line, I don't like knative logging errors instead of working with workable defaults when internalIssuerRef is not configured, at the same time, I don't like knative to log errors instead of working with workable defaults when IssuerRef is not configured.

Would like others to consider the above, yet unholding...
/unhold

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 31, 2023
@ReToCode
Copy link
Member Author

ReToCode commented Jun 1, 2023

After this PR we will have two types of certificates: those with visibility=local and those without it.
If a user enables trustConfig != Disabled the user should not be also mandated to set internalIssuerRef instead we need a self-signed certificate to serve the user in this config. A built-in provider can do the job for us.

I think we will end up there when we re-model the control-protocol stuff to use the abstraction and ship that by default.

if the user also sets a certificate-class, for example sets certificate-class=net-certmanager, Ideally we will still have a default by creating a self-signed certificate created by net-certmanager and this default will be used if the user left the internalIssuerRef unconfigured.

I think that approach is feasible. But it does mean that we change the current behaviour for external certificates as well and there its a bit tricky to say what the default issuer should be. Also I have some doubts that people willing to use net-certmanager want to have a self-signed external and internal issuer as then you would not need cert-manager.

Open for other input. @dprotaso, @skonto, @evankanderson , @nak3

@KauzClay
Copy link
Contributor

KauzClay commented Jun 1, 2023

So you are suggesting that trustConfig != Disabled controls if local ingress uses TLS and that whenever trustConfig != Disabled we use certificates with visibility of "local" and require that the user also set internalIssuerRef.
Is this what you mean?

Yes for net-certmanager. As said in our sync meeting, we could have a built-in provider that generates the certificates, if the user does not provide a specific certificate-class like net-certmanager.

Would you mind sharing why you want to tie this to the trustConfig flag?

Just curious, what do you think of linking it to the autoTLS config instead? As long as autoTLS is true, if an internal provider is set in config-certmanager, then ClusterLocal routes are https. It could either error if internalIssuerRef isn't set. Or just continue to do http.

I agree securing ClusterLocal routes falls under the broad category of internal encryption, but in my mind, the trustConfig options feel more like Knative nuts and bolts. Users aren't going to talk to the activator or the queue-proxy directly (at least I don't think).

But they will hit clusterLocal routes directly, just as they hit ExternalIP routes. To me, having same root on/off switch for both internal and external routes is clearer.

I admit this would still mean there are multiple things to turn on if you want to secure the path all the way from client to user-container, though. Not just one nice switch.

@davidhadas
Copy link

davidhadas commented Jun 1, 2023

Just curious, what do you think of linking it to the autoTLS config instead?

I was also struggling with the difference between the way we configure external and local and I agree that making both set by the same autoTls flag makes more sense.

It could either error if internalIssuerRef isn't set. Or just continue to do HTTP.

I would avoid dropping to HTTP after the admin had set autoTls + chose net-certmanager, but forgot to set internalIssuerRef ... Admins will not be able to figure that out if this is what we will do.

But they will hit clusterLocal routes directly, just as they hit ExternalIP routes. To me, having same root on/off switch for both internal and external routes is clearer.

I agree that making a divide between configuring "internal" (ingress to queue) that will be using trustConfig, and "external" ( any ingress, including the local one) that will be using autoTLS, is clearer from an admin point of view.

I admit this would still mean there are multiple things to turn on if you want to secure the path all the way from client to user-container, though. Not just one nice switch.

I agree this needs to be improved (not by this PR though).

Some of our options:

  1. We can decide to use TrustConfig=Enabled as default at some point (which would make more sense)
  2. We can decide that setting autoTLS will automatically set TrustConfig=Enabled
  3. We can log warnings when autoTLS is set but TrustConfig=Disabled.

@KauzClay
Copy link
Contributor

KauzClay commented Jun 1, 2023

I would avoid dropping to HTTP after the admin had set autoTls + chose net-certmanager, but forgot to set internalIssuerRef ... Admins will not be able to figure that out if this is what we will do.

I see, that makes sense. Okay, if I understand correctly, it seems like this PR intends to do that? And then, based on earlier comments on this PR, sounds like there would have to be some separate PR to make that error more visible within Knative.

@davidhadas
Copy link

I would avoid dropping to HTTP after the admin had set autoTls + chose net-certmanager, but forgot to set internalIssuerRef ... Admins will not be able to figure that out if this is what we will do.

I see, that makes sense. Okay, if I understand correctly, it seems like this PR intends to do that? And then, based on earlier comments on this PR, sounds like there would have to be some separate PR to make that error more visible within Knative.

Just to make sure we are in sync, I am advocating that we should not silently drop to HTTP when we discover inconsistent HTTPs ingress configuration (autoTLS=true). So if we find that internalIssuerRef is empty when net-certmanager is configured to be used, we should either use some default configuration (self-signed certificates by certmanager is the right one in my view), or we should instead log an error that we found an inconsistent configuration (as done in this PR).

@KauzClay
Copy link
Contributor

KauzClay commented Jun 1, 2023

So if we find that internalIssuerRef is empty when net-certmanager is configured to be used, we should either use some default configuration (self-signed certificates by certmanager is the right one in my view), or we should instead log an error that we found an inconsistent configuration (as done in this PR).

Ah okay, I follow. I like the default option of a self-signed cert from certmanager. You could probably even use it for both internal and external provider defaults, but that is outside the scope of this PR.

I think I misunderstood the changes. I took the presence of this file and the the example config block as evidence that the behavior was to default to the self-signed cert from certmanager.

@ReToCode
Copy link
Member Author

ReToCode commented Jun 8, 2023

I would strongly vote against mixing external certificates and internal encryption in general. For example on OCP we have a separate ingress solution in front of Knative that handles all external Domains and TLS for that, so we do not want autoTLS but we still want internal-encryption. Today, these two things (autoTLS and internal-encryption/or the new trust flags) are separated from each other and can be turned on/off independently.

I do get your point about the defaulting behaviour, although, I still do not think it makes sense to use net-certmanager and not configure your own issuer, as you could just stay with the control-protocol stuff. Same for external domains signed by the internal CA, as no client will trust them.

But I'm ok with always adding a self-signed CA and use them as a default. So turning on either autoTLS or internal-encryption/or the new trust flags will use our self-signed CA until configured otherwise. We just need to make sure that we regenerate all the certificates if the user decides to user another (internal)issuerRef.
WDYT?

@ReToCode
Copy link
Member Author

ReToCode commented Jun 8, 2023

We just need to make sure that we regenerate all the certificates if the user decides to user another (internal)issuerRef.
Just checked that, that does work. Certificates are automatically re-issued from cert-manager:

status:
  conditions:
  - lastTransitionTime: "2023-06-08T12:20:46Z"
    message: Issuing certificate as Secret was previously issued by ClusterIssuer.cert-manager.io/knative-internal-encryption-issuer
    observedGeneration: 2
    reason: IncorrectIssuer
    status: "True"
    type: Issuing

@ReToCode
Copy link
Member Author

ReToCode commented Jun 8, 2023

Updated the defaulting.

@ReToCode
Copy link
Member Author

@davidhadas , @KauzClay can you please take another look ^^.

@KauzClay
Copy link
Contributor

KauzClay commented Jun 12, 2023

@ReToCode

Updated the defaulting.

Nice, I like the idea of using the default issuer for both internal and external.

I would strongly vote against mixing external certificates and internal encryption in general. For example on OCP we have a separate ingress solution in front of Knative that handles all external Domains and TLS for that, so we do not want autoTLS but we still want internal-encryption. Today, these two things (autoTLS and internal-encryption/or the new trust flags) are separated from each other and can be turned on/off independently.

I see what you're saying about not wanting to attach it to autoTLS config.

And I can see how for all traffic internal to the cluster (whether that is from cluster-internal client -> Ingress, or Ingress to activator/queue), you probably want to use the same issuer.

I guess personally, I'm still not a huge fan of having the same on/off switch for both cases. Just throwing things out there, but what do you think about adding an autoTLSInternal config or something to serving?

net-certmanager would still only have the internal and external issuers like you have in this PR, but the trust config can be separate from deciding if you want your clusterlocal routes to be encrypted?

I suppose that branches out to a separate PR then...

@ReToCode
Copy link
Member Author

Hm I agree that the flags are a bit confusing and also not ideally named. For now we have:

autoTLS: enabled/disabled
Enable/disable external encryption. I'm working on making making this also work with the "knative internal" issuer from control-protocol. Other than that, it is not related to internal encryption in any kind.

internal-encryption: enabled/disabled (deprecated in favour of the trust flags below)
Enable/disable internal encryption. For now we only do this between our own components, but we should also have it on the edge (the ingress-gateway). So I think encryption on the ingress-gateway for local routes belongs to the same internal encryption flag. Without that part, we only partially secure the data-path.

dataplane-trust

  • disabled = same as internal-encryption disabled
  • minimal = same as internal-encryption enabled
  • and so on for the further mTLS stuff David is working on

But to clean that up would be the scope of another PR.

@KauzClay
Copy link
Contributor

KauzClay commented Jun 20, 2023

But to clean that up would be the scope of another PR.

agreed. Maybe we can start a discussion in slack or one of the working group meetings.

Anyway, I am cool with the changes here. Idk if I have the ability to approve it for real, but...

/approve

@ReToCode
Copy link
Member Author

ReToCode commented Jul 3, 2023

@davidhadas what is your latest take on this? ^^

@davidhadas
Copy link

@davidhadas what is your latest take on this? ^^

@ReToCode -
I am generally ok with this PR but still feel we can do more to make it more clear to the SE or DevOps.

The name used - internalIssuerRef adds confusion as it refers to the cluster local ingress rather than the serving "internal encryption". Can we change it to localIssuerRef or localIngressIssuerRef - any term not conflicting with the concept of internal-encryption? Best associate it with the local ingress...

Another change I suggest is replacing the existing issuerRef with externalIssueRef or externalIngressIssueRef or extIngressIssuerRef or some term that help to better distinguish that it relates to the externally exposed ingress only.

@ReToCode
Copy link
Member Author

ReToCode commented Jul 4, 2023

The name used - internalIssuerRef adds confusion as it refers to the cluster local ingress rather than the serving "internal encryption". Can we change it to localIssuerRef or localIngressIssuerRef - any term not conflicting with the concept of internal-encryption? Best associate it with the local ingress...

@davidhadas I thought the idea is to also use the abstraction in the future for the certs that we use between the Knative components and that we update the cert-generation in networking to also be able to provide the certs for cluster-local domains? Then the name internalIssuerRef makes sense. Doesn't it?

Another change I suggest is replacing the existing issuerRef with externalIssueRef or externalIngressIssueRef or extIngressIssuerRef or some term that help to better distinguish that it relates to the externally exposed ingress only.

Agreed, but that would break the existing config API. I suggest we do the renaming of those flags (as well as in Serving, Docs, Operator for the actual encryption configuration) separately. We should probably also rename the feature from autoTLS to something more meaningful which is aligned with the issuer here.

@ReToCode
Copy link
Member Author

ReToCode commented Jul 4, 2023

@davidhadas updated naming to ClusterInternalIssuerRef.

@davidhadas
Copy link

/lgtm
(I don't think I am a reviewer though)

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 4, 2023
@nak3
Copy link
Contributor

nak3 commented Jul 5, 2023

/approve

@knative-prow
Copy link

knative-prow bot commented Jul 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KauzClay, nak3, ReToCode

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 5, 2023
@knative-prow knative-prow bot merged commit d6805af into knative-extensions:main Jul 5, 2023
23 checks passed
@ReToCode ReToCode deleted the cluster-local-tls branch July 5, 2023 05:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/enhancement lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants