CA abstraction #46

tombentley · 2021-12-03T15:26:15Z

This PR is for proposing an abstraction of Strimzi's existing certificate handling.

It is currently a bit high level. We'll need to discuss a lot of the details (for example, I've not described in detail which Secrets would be used to store certificates in various states), but this describes a vision which we can start discussing the merits of, and whether is solves all the right use cases.

Signed-off-by: Tom Bentley <tbentley@redhat.com>

scholzj

I left some comments. But looks like a good start. I think elaborating a bit more on what secrets will be used to store what and how to store the state machines would be good for better understanding how it all fits together.

scholzj · 2021-12-03T15:43:27Z

035-ca-abstraction.md

+
+1. The issuing operator needs to figure out which EE certificates are missing (or in their renewal period, or otherwise need replacing, e.g. via annotation). This need for a certificate might exist independently of the end entity. For example, if we're going to scale up the Kafka cluster then we need a certificate for the new broker before the new broker exists. **The issuing operator puts the end entity in the `NEW` state.**
+2. (**See step 1 in the diagram.**) The issuing operator requests a new certificate. **The issuing operator puts the end entity in the `REQUESTED` state.**
+3. (**See step 2 in the diagram.**) The issuing operator cannot assume that issuance will happen quickly (i.e. that `pollForRequestedCertificate()` returns a non-empty `Optional` immediately), so it needs to be able persist the token obtained from `requestCertificate()` in a `Secret`, _and possibly complete the current reconciliation_ (i.e. issuance can be very asynchronous). 


It is not really clear what token are you talking about here. Maybe you should explain what certificate management systems you considered and what are their workflows.

I don't want to couple the operator's reconciliation loop to an external system. It's not so much of a problem when bootstrapping a cluster, but blocking reconciliation because we need to renew a certificate feels like a bad idea, since it prevents other actions on the cluster. So rather than polling pollForRequestedCertificate within a single reconciliation, waiting for a non-empty result, the idea is to allow the reconciliation to proceed and call pollForRequestedCertificate again in the later reconciliation. That way we don't have to reason about how quickly a given certificate management system might issue a certificate. Its availability wouldn't matter much. There could be a human in the loop checking each CSR and the operator would still function just fine. It would be possible to implement the external issuer mentioned in the rejected alternatives and the operator would still function correctly.

I would agree that we should not stop the reconcile but why not using the same approach we have with the Cruise Control rebalance process where, while waiting for a proposal Ready and it's still in PendingProposal, there is a polling on the CC REST API to check the status, instead of waiting for the next reconcile loop that could come after 2 mins by default, or sooner or later if user changes the default value.

With CC there's nothing else that might need to be done by that operator, so it's OK to poll the CC REST indefinitely. That's not the case with the Kafka operator.

scholzj · 2021-12-03T15:46:58Z

035-ca-abstraction.md

+
+#### Process for ensuring CA trust
+
+When an EE certificate is issued via a `CertificateIssuer` it might have been signed by a CA that is not yet trusted.


I'm not sure I like the idea how we trust any CA which comes via the certificate issuer. Trusting the CA means trusting all certificates issued by it. So it might be a way how to sneak in undesired CA. Have you considered separating the EE certificate process from the CA trust process and have users explicitly specify CAs they trust? The EE process can then just check if it is trusted or not and decide whether to continue or error.

We could certainly add an allowlist of trusted CA certificates, and not trust CA certificates not in that list. We'd need a NOT_ALLOWED state in the CA certificate trust state machine. Then the user would need to take some action, since requesting another certificate from that issuer is likely to result in the same unacceptable CA certificate being used. Would that be sufficent @scholzj?

I don't know. It seems more secure to require the trust to be explicitly configured rather the expect not-trust to be configured.

Does the person configuring the Kafka CR necessarily know the root CA certificate which their configured certificate issuer is going to use? If they do then they can configure the Kafka CR to trust EE certificates signed with that root CA certificate. But if they don't then they can't configure the Kafka CR at all.

Even when they do know, it creates extra problems: What should happen when that root CA certificate expires? The operator then is stuck. So we need extra management code to add warning conditions when that's going to happen etc. I'm sure we can do that, but I'm not sure we want the complication in the initial implementation.

It is not necessarily about the person knowing. The provider implementation should know. As explain in the main comment => you already have a chunk of this information in the CA configuration. In case of the internal issuer, it would be just the Kubernetes Secret as today. In case of the CertManager, it is the Issuer what defines what CA is used. So the provider implementation knows how to get it.

Imagine the difference like this:

If you take the CAs from the issued certificate, as a bad actor I need just to subvert one issued certificate to inject Strimzi with my own CA and gain access. The only way you would notice is by checking the CAs trusted by Strimzi Kafka cluster.

If you take the CA from the provider implementation instead of from the issued certificates, the bad actor would need to modify the Issuer to get Strimzi to trust some new CA. That has IMHO better chance of being noticed because the issuer might be used by other apps as well etc.

035-ca-abstraction.md

scholzj · 2021-12-03T15:52:36Z

035-ca-abstraction.md

+
+When an EE certificate is issued via a `CertificateIssuer` it might have been signed by a CA that is not yet trusted.
+The cluster operator (for the cluster CA), or the user operator (for the clients CA) needs to detect this lack of established trust and ask the cluster operator to trust the new CA. 
+The cluster operator and the user operator run in separate processes, so this will be done via a shared `Secret`, for example:


Adding a bit to the comment above -> there is some demand for having different trusted clients CA per listener. Going in that direction might IHMO require better separation of UO and CO. I think to keep it simple, you could for example establish trust to the clients CA via the Kafka CR. The User Operator does not really care about it, it can just request the EE from whatever issue is configred in the KafkaUser resource and leave it up to user to make sure the issuers CA is also trusted.

That assumes that the user knows beforehand which CA cert will be used by the issuer. Is that necessarily always true? They might know today, but what about some future time when that CA cert is close to expiring and the CA starts using a new one? If the person operating the cluster hasn't configured trust in that clients CA cert then the UO would happily issue user EE certs which won't work. Or should the CO pass the set of trusted certs to the UO, so the UO can check the issued certs will be trusted.

It may be that it's a mistake to try to have any common abstraction that covers the cluster CA and the clients CA. From the PoV of the Kafka cluster it would work for the user to specify a set of CA certs to trust (and roll the brokers when that changed). And for the UO just make it a requirement that there is a CA (possibly intermediate) cert and key that can be given to the UO to issue certificates. In other words, not use this CA abstraction at all for the UO. That would certainly simplify a number of things, like the need for the shared Secret. At the cost of making the UO less broadly applicable, since it would still be making this assumption about being able to issue certificates itself.

scholzj · 2021-12-03T16:01:23Z

035-ca-abstraction.md

+EE certificate expiry is driven by the cert path validation described above.
+The issuing operator will perform cert path validation and if the end entity certificate is currently expired, or will be expired on `now + renewal period` then a new EE certificate will be issued as described in the previous section.
+
+#### Intermediate or root CA cert expiry


I think this deserves a bit more detail on how would it work. For example in my past investigation, Cert Manager didn't really allow a smooth migration from old CA to new CA because it just replaced the CA. That makes this a bit complicated and that is also why as mentioned in one of the comments above you might need a clear separartion of the secrets.

Yes, I plan to update with which Secretss are updated in which steps.

035-ca-abstraction.md

katheris · 2021-12-07T09:36:13Z

Overall the proposal sounds good to me. The main section that I feel like has the most unknowns is using the secret state to drive the logic flow, if there are multiple certificate updates happening in quick succession it isn't immediately obvious to me how the system would cope with that.

ppatierno · 2021-12-09T10:12:26Z

035-ca-abstraction.md

+
+1. The issuing operator needs to figure out which EE certificates are missing (or in their renewal period, or otherwise need replacing, e.g. via annotation). This need for a certificate might exist independently of the end entity. For example, if we're going to scale up the Kafka cluster then we need a certificate for the new broker before the new broker exists. **The issuing operator puts the end entity in the `NEW` state.**
+2. (**See step 1 in the diagram.**) The issuing operator requests a new certificate. **The issuing operator puts the end entity in the `REQUESTED` state.**
+3. (**See step 2 in the diagram.**) The issuing operator cannot assume that issuance will happen quickly (i.e. that `pollForRequestedCertificate()` returns a non-empty `Optional` immediately), so it needs to be able persist the token obtained from `requestCertificate()` in a `Secret`, _and possibly complete the current reconciliation_ (i.e. issuance can be very asynchronous). 


I would agree that we should not stop the reconcile but why not using the same approach we have with the Cruise Control rebalance process where, while waiting for a proposal Ready and it's still in PendingProposal, there is a polling on the CC REST API to check the status, instead of waiting for the next reconcile loop that could come after 2 mins by default, or sooner or later if user changes the default value.

ppatierno · 2021-12-09T11:19:44Z

035-ca-abstraction.md

+### Interface for certificate issuance
+
+```java
+public interface CertificateIssuer {


AFAIU this is an interface that a component used by CO or UO should implement to issue certificate, right?
Can you make it clear so that in the next sections when you mention the "issuing operator" it's more clear?

ppatierno · 2021-12-09T11:30:04Z

035-ca-abstraction.md

+```java
+public interface CertificateIssuer {
+
+    byte[] requestCertificate(Subject subject, int requestedValidity) throws CertificateIssuanceException;


no information about how to generate the private key? key size, algorithm to use, ...

Not in this interface, no. The reason being is that different issuers (cert manager, vault) might not expose that.
This could be handled in the API (see later) where we select which implementation is going to be used. For example with the internal type we could have

spec: clusterCa: validityDays: <integer> generateCertificateAuthority: <boolean> generateSecretOwnerReference: <boolean> renewalDays: <integer> certificateExpirationPolicy: <renew-certificate|replace-key> keySize: 4096 keyType: RSA # (or EC)

The internal type could also support an implementation (e.g. openssl vs bouncycastle), but it's not part of this proposal.

Does the validity really differ from key size, type etc.? Different providers might have limited availabilities of that as well I guess.

Good point, maybe this should be configured away too. I can't see a reason why we need it in the interface for the cluster CA (it's not like we want ZK certs to have different validity to broker certs for example). For the clients CA I suppose it is possible that some KafkaUser could support an explicit validity period (possibly even end dates) which would allow time-limited accounts. Generally speaking there are better ways of achieving that with Kafka than doing it at the certificate level.

ppatierno · 2021-12-09T11:37:32Z

035-ca-abstraction.md

+Because of the current assumptions, the CO generates cluster CA certificates (if they're missing or expired) within a reconciliation, before it knows whether any EE certificate issuance is needed in that reconciliation. Then if it needs to issue EE certificates it just uses the CA certificates that must exist by that point.
+
+The steps required by the `CertificateIssuer` contract are more complicated, and will be described over the following sections. 
+The term "issuing operator" will be used to refer to the operator that's issuing the certificate. This could be the User or Cluster Operator.


A general point ... what is a really confusing is that we say something like "CO issues certificates" or "issuing operator". While it's true with the current openssl implementation, it's not so true with the overall proposal offloading the certificate generation outside of the CO. What CO is doing is kind of "adopting" the certificate: it asks for a certificate and an external component (or human) returns to it what it asked for.

Yeah, the language is tricky. I'm often using "issuing" to mean "using the CertificateIssuer interface to issue". I'll adjust the text here to make this clearer, but I'm not sure there's a better term that "issuing operator" which we could use throughout the text.

ppatierno · 2021-12-09T11:39:03Z

035-ca-abstraction.md

+  IN_USE // the certificate is in use
+   |
+   v
+  NOT_NEEDED // a certificate for the end entity is not needed any more


what does "is not needed any more" exactly mean? For example when a Kafka cluster is scaled down and you need to get rid of the EE certificate for the broker that doesn't run anymore?

This is a really good question.

For the CO it's relatively easy, because at the end of the day everything is in the same process and the set of required certificates is deterministic from the Kafka CR. So the CO can basically do a set subtraction to figure of the certs which exist but are not needed, transition then to NO_NEEDED and (the important thing) if those were the last which depended on some root CA certificate, then that root cert can be removed from the trust stores too (by transition its state to ZOMBIE).

For the UO it's a bit harder. Determining which root CA certs are no longer needed requires a full reconciliation of all the KafkaUsers. That doesn't really scale well. But if it can do that then it can also figure out which root certs might not be needed and transition them to ZOMBIE

In either case the root certs being in ZOMBIE would result in removal from the trust stores.

TBH the process around getting rid of EE certs and particularly CA certs when they're no longer required is the area which I didn't really cover enough in my PoC.

I am trying to figure out the interaction with the certs management system (certs-manager, vault, ..) and what does it mean in this case. Who decides that the certificate is not needed anymore? Is it something that the user would do directly on the certs management system and maybe he is allowed to delete the cert, or the interaction is always through some CRD in the CO?

The way I was thinking about this at the time was things like scale-down of #brokers.

Certificate revocation is listed as a non-goal of this proposal (I was trying to limit the scope), but I wonder whether we should at least consider how that would work in terms of the state machine, even if we didn't support it initially. I suspect we'd probably want distinct states (even if the handling turned out to be the same) to cover revocation. Taken to the extreme we could have an end state for every CRLReason.

ppatierno · 2021-12-09T11:42:58Z

035-ca-abstraction.md

+1. The issuing operator needs to figure out which EE certificates are missing (or in their renewal period, or otherwise need replacing, e.g. via annotation). This need for a certificate might exist independently of the end entity. For example, if we're going to scale up the Kafka cluster then we need a certificate for the new broker before the new broker exists. **The issuing operator puts the end entity in the `NEW` state.**
+2. (**See step 1 in the diagram.**) The issuing operator requests a new certificate. **The issuing operator puts the end entity in the `REQUESTED` state.**
+3. (**See step 2 in the diagram.**) The issuing operator cannot assume that issuance will happen quickly (i.e. that `pollForRequestedCertificate()` returns a non-empty `Optional` immediately), so it needs to be able persist the token obtained from `requestCertificate()` in a `Secret`, _and possibly complete the current reconciliation_ (i.e. issuance can be very asynchronous). 
+(An example of how this could happen is where a `CertificateIssuer` implementation uses cert-manager; `requestCertificate()` creates a cert-manager `CertificateRequest` CR but while the cert manager operator is not running no certificate will be issued).


isn't it just Certificate CR on the cert-manager side? [1]
From the cert-manager API doc it seems that CertificateRequest is created by the cert-manager but the user interaction happens via Certificate, or?
When I used cert-manager in the past I interacted with it by using Certificate CR.

[1] https://cert-manager.io/docs/usage/certificate/

[1] https://cert-manager.io/docs/usage/certificate/

You can use Certificate, in which case cert manager will keep a Secret up to date for you. I don't think that really helps us though, because we still need to orchestrate the roll out of the EE certificate (and possible the roll out of a new CA cert too). So we'd still need similar machinery in the CO, it's just we'd be needing to spot changes in the Secret which cert manager updates spontaneously, rather than being in control of renewal ourselves (and thus being able to respect the maintenance times, for example).

You can use Certificate, in which case cert manager will keep a Secret up to date for you.

Mhhh ... but by using an external certificates manager, so cert-manager for example, aren't we leveraging it to generate certificates and storing them in Secrets? At this point it's not clear to me how we are interacting with a cert-manager, how we get the generated certificates to put them in our own Secrets.

The difference, as I understand it, is that with Certificate cert manager will renew it automatically (basically creating a CertificateRequest behind the scenes) and update the Secret. Whereas CertificateRequest is a one time deal, no renewal on expiry.

@ppatierno did that clarify your question?

so aren't we leveraging to the external cert manager to do the renewal process for us automatically? we want to control the cert manager to do the renewal when we want to, is that right? And it means interacting with cert-manager, for example, at lower level (CertificateRequest vs Certificate)?

035-ca-abstraction.md

ppatierno · 2021-12-09T11:52:26Z

035-ca-abstraction.md

+      name: <string>  # used for CertificateRequest.spec.duration
+      kind: <string>
+      group: <string>
+    generateSecretOwnerReference: <boolean>


what is this?

We have this in the current API. It controls whether the Secrets containing the certs and keys get an owner reference generated or not, therefore whether they get GCd with the Kafka CR or not. I think a similar thing would make sense for other issuers, though I haven't really thought through precisely which Secrets would get the owner ref set when this was true. Ill mention this in the text.

Signed-off-by: Tom Bentley <tbentley@redhat.com>

tombentley · 2021-12-10T11:15:20Z

I've addresses many of the comments, and added a section the goes into the Secrets a bit, if you want to take another look?

ppatierno · 2021-12-10T11:28:49Z

035-ca-abstraction.md

+
+2. The issuing operator requests a new certificate. **The issuing operator puts the end entity in the `REQUESTED` state.**
+
+3. The issuing operator cannot assume that issuance will happen quickly (i.e. that `pollForRequestedCertificate()` returns a non-empty `Optional` immediately). In fact we explicitly decouple the operators reconciliation loop from the waiting for certificate issuance. If, after some number of retries within the current reconciliation, the certificate has not been issued, the token obtained from `requestCertificate()` is persisted in a `Secret` and the polling is completed in a subsequent reconciliation.


In fact we explicitly decouple the operators reconciliation loop from the waiting for certificate issuance.

Doesn't this sentence means a kind of polling (while waiting for certificate issuance) as I mentioned in another comment about CC REST API?

I don't recall the details of theCC interaction. What I'd trying to describe in this proposal is:

Operator calls issueCertificate

Operator calls pollForRequestedCertificate, it's not ready.

Operator completes the rest of the reconciliation, making do without the requested certs (e.g. for renewal continuing to use the existing ones, for creation then just completing the reconciliation without having a functional cluster)

Operator starts a new reconciliation (by default 2 minutes later).

Operator calls pollForRequestedCertificate, it's not ready.

Operator completes the rest of the reconciliation making do without the requested certs

Operator starts a new reconciliation (by default 2 minutes later).

Operator calls pollForRequestedCertificate, this time it is ready

Operator does what's necessary to use the newly issued certs, maybe orchestrating trust, and then making use of the issued certificates.

The point is that steps 4–6 could repeat many times, and in the mean time the user can update the CR and those changes get reflected, rather than cert renewal blocking changes to the CR for an arbitrarily long time.

@ppatierno did that clarify your question?

Maybe :-) ... the text says after some number of retries within the current reconciliation it seems that anyway the step 2, 5 and 8 so calling pollForRequestedCertificate are executed more times (retries) in the same reconcile. So we have retries in the same reconcile and if not ready, retries across reconciles.

That is what I was proposing (but it doesn't have to be that way). The reason: Often issuance will be quick, so it's worth trying once or twice within the reconcile. But we can't be sure it will always be quick, so we need to support the across-reconcile case too. If we had to only do one I would suggest keeping across-reconcile and dropping the within reconcile. I don't think there's a huge difference code-wise, but there might be test-wise.

scholzj · 2022-01-05T14:36:51Z

035-ca-abstraction.md

+Before describing the process let's introduce two state machines. First the "EE certificate state machine"
+
+```
+  NEW // a certificate is needed for the end entity, but doesn't exist


Would REQUIRED or NEEDED be less confusing than NEW? NEW might suggest it already exists.

scholzj · 2022-01-05T14:38:29Z

035-ca-abstraction.md

+  UNTRUSTED // not yet trusted everywhere it needs to be 
+   |
+   v
+  TRUSTED_NEW // trusted everywhere it needs to be, but no EE certificates yet issued


Maybe TRUSTED_UNUSED would be better?

scholzj · 2022-01-05T14:41:49Z

035-ca-abstraction.md

+  REQUESTED // a certificate has been requested; requestCertificate() has been called
+   |
+   v
+  PENDING // a certificate has been issued; pollForRequestedCertificate() has returned non-empty,


Maybe TRUST_PENDING would be more fitting?

scholzj · 2022-01-05T14:51:27Z

035-ca-abstraction.md

+The cluster operator and the user operator run in separate processes, so this will be done via a shared `Secret`, for example:
+
+2. The issuing operator has placed a root CA certificate in the shared `Secret` in the `UNTRUSTED` state.
+3. The cluster operator sees the root CA certificate orchestrates changes to the brokers (reconfig or restart) (and, for the Cluster CA the other components) so that the certificate is trusted by all components that will observe the EE certificates.


How intelligent you expect this to be? Following the logic described here which centers around the EE certs and not CAs ... if UO EE cert is issued and is signed by CA-X, the CA-X is needed really only in one place => in the truststore of the 9091 listener in the Kafka broker. The other compoenents such as ZooKeeper, TO, KE or CC will not really care about this CA if their EE certs use CA-Y. Do you expect it to have a detailed handing like this? Or will it simply roll the CA to all operands distinguishing only between Cluster CAs and Client CAs?

(Note: I can see why for simplicity we might want to just roll them everywhere => that is fine. I really raise this mainly for clarity)

035-ca-abstraction.md

scholzj

I think this technically works. My main concern is how it sees the CA just as a property of the EE certificate rather than a first level citizen. It basically assumes relation ship is:

--------------
|    Cert    |       -----------
| Management | ====> | EE Cert |  
|    Tool    |       -----------
--------------            |  
                          |
                          V
                       -------  
                       | CAs |
                       -------

But it seems to me that the actual relation ship is more like this:

--------------
|    Cert    |       ------      -----------
| Management | ====> | CA | ===> | EE Cert |
|    Tool    |       ------      -----------
--------------

(Where CA is not necessarily a single private and public key but a Virtual CA which might be renewed over the time etc.)

I think technically, this proposal would work fine while treating the CAs just as a property of the EE certificate. But I think there are practical consequences:

The fact that the role of the CA is different and it is not just property of the EE certificate leaks into the API design in the CA configuration examples for cert-manager where the Issuer largely determines the CAs used by the EE certificates => the Issuer is the CA in my drawings above.
The trust is on the CA level. So you do not trust the EE certificate. You trust the CA and by that you trust all the certificates issued by it. Having the trust to the CA kind of hidden might make that not clear enough to unexperienced users. This mechanism would be applied to the Cluster CA, but also to the Clients CA where you might expect that there is no network isolation and the client listeners might be exposed to much wider network areas.
How do you secure the certificate issuing chain? I think it needs a mechanism to verify that the certificate was really issued by a trusted counter party. Otherwise, how is the system protected against a bad actor who sees that a certificate is being requested and injects its own CA into the Kafka cluster by for example creating a secret before the CertManager would do so? Is this something what we should be worried about? Or is this a risk in any case with any other implementation as well?

I guess the security question is really important. If we do not think this is a bigger security threat than a model following more CA oriented approach, I'm fine with this proposal at the high level => it would at the end in most cases iterrate to the same result, just in a way wich is maybe a bit different than I would expect.

scholzj · 2022-01-05T15:13:40Z

035-ca-abstraction.md

+Validity is a property of the whole certificate chain, not just the EE certificate.
+For example it is possible that a root or intermediate certificate expires before an EE certificate that depends on it. 
+So for the operator to reason correctly about validity it needs to perform PKIX cert path validation.
+Fortunately Java comes with APIs for doing this in `java.security.cert`.


What does the API do? You made me curious by mentioning it. :-D

Does it give enough insights to understand that the full path will not be valid anymore in 30 days? Or does it just return true/false? That would be possibly too late for taking some action.

It's very configurable. You can configure it with your own effective date and it will give you detailed results about which cert is invalid and why. See this guide and the API docs.

035-ca-abstraction.md

scholzj · 2022-01-05T15:35:26Z

035-ca-abstraction.md

+  ${fingerprint}.${state}: <PEM encoded Secret>
+```
+
+The `fingerprint` is the fingerprint/thumbprint of the certificate. I.e. the SHA-1 has of the ASN.1 DER-encoded form of the X509 certificate. The `state` can be any of the states of the CA certificate trust state machine described above.


Typo I guess?

Suggested change

The `fingerprint` is the fingerprint/thumbprint of the certificate. I.e. the SHA-1 has of the ASN.1 DER-encoded form of the X509 certificate. The `state` can be any of the states of the CA certificate trust state machine described above.

The `fingerprint` is the fingerprint/thumbprint of the certificate. I.e. the SHA-1 hash of the ASN.1 DER-encoded form of the X509 certificate. The `state` can be any of the states of the CA certificate trust state machine described above.

Also, is the DER format safe enough in terms of how it uses whitespaces etc. to make sure the SHA-1 hash of it works well enough? Or does it need to use the fingerprint from the certificate itself (which I at least guess is unique)?

SHA-1 of the DER is precisely what the fingerprint of the certificate is. My understanding is that the DER encoding is deterministic so the hash should be completely stable.

Signed-off-by: Tom Bentley <tbentley@redhat.com>

scholzj · 2022-01-13T23:43:45Z

One quick thought to consider ... how would this affect the .status.listeners[].certificates part of the Kafka CR status.

ppatierno · 2022-01-17T15:43:53Z

035-ca-abstraction.md

+
+```java
+class IssuedCertificate {
+  private final private byte[] key;


too many private :-)

035-ca-abstraction.md

ppatierno · 2022-01-17T15:55:03Z

035-ca-abstraction.md

+  IN_USE // the certificate is in use
+   |
+   v
+  NOT_NEEDED // a certificate for the end entity is not needed any more


I am trying to figure out the interaction with the certs management system (certs-manager, vault, ..) and what does it mean in this case. Who decides that the certificate is not needed anymore? Is it something that the user would do directly on the certs management system and maybe he is allowed to delete the cert, or the interaction is always through some CRD in the CO?

ppatierno · 2022-01-17T16:01:01Z

035-ca-abstraction.md

+
+2. The issuing operator requests a new certificate. **The issuing operator puts the end entity in the `REQUESTED` state.**
+
+3. The issuing operator cannot assume that issuance will happen quickly (i.e. that `pollForRequestedCertificate()` returns a non-empty `Optional` immediately). In fact we explicitly decouple the operators reconciliation loop from the waiting for certificate issuance. If, after some number of retries within the current reconciliation, the certificate has not been issued, the token obtained from `requestCertificate()` is persisted in a `Secret` and the polling is completed in a subsequent reconciliation.


Maybe :-) ... the text says after some number of retries within the current reconciliation it seems that anyway the step 2, 5 and 8 so calling pollForRequestedCertificate are executed more times (retries) in the same reconcile. So we have retries in the same reconcile and if not ready, retries across reconciles.

ppatierno · 2022-01-17T16:34:20Z

035-ca-abstraction.md

+1. The issuing operator needs to figure out which EE certificates are missing (or in their renewal period, or otherwise need replacing, e.g. via annotation). This need for a certificate might exist independently of the end entity. For example, if we're going to scale up the Kafka cluster then we need a certificate for the new broker before the new broker exists. **The issuing operator puts the end entity in the `NEW` state.**
+2. (**See step 1 in the diagram.**) The issuing operator requests a new certificate. **The issuing operator puts the end entity in the `REQUESTED` state.**
+3. (**See step 2 in the diagram.**) The issuing operator cannot assume that issuance will happen quickly (i.e. that `pollForRequestedCertificate()` returns a non-empty `Optional` immediately), so it needs to be able persist the token obtained from `requestCertificate()` in a `Secret`, _and possibly complete the current reconciliation_ (i.e. issuance can be very asynchronous). 
+(An example of how this could happen is where a `CertificateIssuer` implementation uses cert-manager; `requestCertificate()` creates a cert-manager `CertificateRequest` CR but while the cert manager operator is not running no certificate will be issued).


so aren't we leveraging to the external cert manager to do the renewal process for us automatically? we want to control the cert manager to do the renewal when we want to, is that right? And it means interacting with cert-manager, for example, at lower level (CertificateRequest vs Certificate)?

ppatierno · 2022-01-17T16:47:22Z

035-ca-abstraction.md

+  ${fingerprint}.${state}: <PEM encoded Secret>
+```
+
+The `fingerprint` is the fingerprint/thumbprint of the certificate. I.e. the SHA-1 hash of the ASN.1 DER-encoded form of the X509 certificate. The `state` can be any of the states of the CA certificate trust state machine described above.


two concerns ... first are we sure all characters valid in a certificate fingerprint are valid for a key in the data section of a Secret? the second one is about how I can get use of this information just watching at the data ... I mean how can I recognize what's the CA cert just seeing a fingerprint and not a Subject or more human readable info like this? Or do you think that because it's all automated we don't really need human readibility of this?

mattia-crypto · 2022-03-03T19:37:16Z

Is there an estimated timeframe for this to be implemented?

cconnert · 2022-06-09T07:11:51Z

I stumbled across this proposal while trying to use cert-manager for client/cluster CA with strimzi. In my, layman, option there is no need to deeply integrated with third party certificate management tools (please correct me if I am wrong).

To my understanding strimzi puts strict requirements on the secrets and their structure (https://strimzi.io/docs/operators/latest/configuring.html#installing-your-own-ca-certificates-str) and does not provide a way to integrated a different structure like e.g. provided by cert-manager. Briefly it requires:

that key and certificates are provided by different secrets.
that the follow a naming convention (ca.key, ca.crt, ca.p12, ca.password)
that the certificate is provided in PEM and PKCS 12 format and the password of the PKCS 12 is provided as well.

So what we need is a way to map different structure to the required structure. This functionality is already implemented for the brokerCertChainAndKey configuration. There we are able to remap a secret into the internal structure and strimzi takes care of managing the PKCS 12 store for us.

Thus, naively, I would argue all it takes to address this issue is to reuse the functionality of brokerCertChainAndKey for clusterCa and clientsCa by e.g.: introducing a new configuration customCaSecret. When the new configuration is present it entails that generateCertificateAuthority: false. That way backwards compatibility can be ensured.

scholzj · 2022-06-09T08:01:33Z

@cconnert I do not think that on its own is sufficient for multiple reasons:

It is not just about the structure of the secrets but also about the information they provide. For example when migrating to a new CA etc.
For many people, Kubernetes Secrets are not the thing they want to use because they are not really that secret.

iMajna · 2022-06-09T08:04:35Z

For many people, Kubernetes Secrets are not the thing they want to use because they are not really that secret.

I second that. There was vulnerability in Grafana where you could get all secrets from k8s in plaintext. If you have external CA private key there as a k8s secret which Strimzi uses for generating certs you most certainly want to put Chinese wall in front of it - aka place it somewhere where it can be secure.

tombentley · 2022-06-10T09:56:54Z

@cconnert thanks for the input! I have also wondered whether we really need to go to these lengths, because this design is not simple.

I think Jakub's point is a good one:

It is not just about the structure of the secrets but also about the information they provide. For example when migrating to a new CA etc.

A new customCaSecret configuration could, I'm sure, be made to work in the basic case of standing up a cluster having configured customCaSecret. But I think we should be aiming to also solve the problem of how those certificates get renewed. Given the trust relationships between the components that the CO manages doing that without requiring downtime is not trivial. You could come up with some way to do it manually, with a documented procedure to follow, but it's the sort of thing people regularly get wrong. This is a cost borne by every (production) user. So I think there is significant value in having the operator do it for you in a way that tested and really hard to get wrong. It also enables other things that people sometime like, such as more frequent cert rotations, which are disincentivised by leaving the process to users.

cconnert · 2022-06-10T11:01:51Z

@scholzj, @iMajna,@tombentley thanks for getting back to me. I do see your points. Just wanted to raise the question if the solutions could be simplified. Also I wanted to understand what you aim for by this proposal so I can figure out if my use-case can be solved.

CA migration / renewal is definitely a difficult problem which likely requires a complex solution. cert-manager has an open issue on that cert-manager/cert-manager#2478 and I am looking forward to see it implemented. A zero-downtime constraint makes this task even harder. If "client" side communication is also secured by TLS I would argue that is something an operator cannot solve solely, but I guess that is out of scope of this proposal.

Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>

scholzj · 2024-01-25T16:23:55Z

Triaged on the Strimzi Community call on 25.1.2024: @katheris is getting back to work on this topic. However, the expectation is that there will be a new PR (although based on the content from this PR). This should be closed for the time being.

CA abstraction

087bd3b

Signed-off-by: Tom Bentley <tbentley@redhat.com>

scholzj reviewed Dec 3, 2021

View reviewed changes

katheris mentioned this pull request Dec 9, 2021

feat: New secure Connect API proposal #45

Closed

ppatierno reviewed Dec 9, 2021

View reviewed changes

Comments

13b4772

Signed-off-by: Tom Bentley <tbentley@redhat.com>

ppatierno reviewed Dec 10, 2021

View reviewed changes

sknot-rh mentioned this pull request Jan 3, 2022

Admin client should use PEM format of secrets (TO) strimzi/strimzi-kafka-operator#5941

Closed

8 tasks

scholzj reviewed Jan 5, 2022

View reviewed changes

tombentley added 3 commits January 7, 2022 08:23

Minor corrections

02a47b6

Signed-off-by: Tom Bentley <tbentley@redhat.com>

diagrams

66f306a

Signed-off-by: Tom Bentley <tbentley@redhat.com>

diagrams

8037260

Signed-off-by: Tom Bentley <tbentley@redhat.com>

ppatierno reviewed Jan 17, 2022

View reviewed changes

This was referenced Feb 22, 2022

Include Cert Manager for cert generation by the cluster operator strimzi/strimzi-kafka-operator#929

Open

Simplify manual certificate installation/renewal strimzi/strimzi-kafka-operator#1064

Closed

scholzj mentioned this pull request Apr 12, 2022

Hashicorp Vault as a secret store strimzi/strimzi-kafka-operator#5079

Closed

dttung2905 mentioned this pull request Sep 1, 2022

Support multiple CAs in the Clients CA secret strimzi/strimzi-kafka-operator#6566

Open

scholzj mentioned this pull request Jan 12, 2023

CA replacement not working in secondary components strimzi/strimzi-kafka-operator#7726

Closed

tinaselenge added a commit to tinaselenge/proposals that referenced this pull request Jul 17, 2023

Update strimzi#46 to reflect the recent change in Kafka

9707e4d

tinaselenge mentioned this pull request Jul 17, 2023

Update #046 to reflect the recent change in upstream Kafka #83

Closed

tinaselenge added a commit to tinaselenge/proposals that referenced this pull request Jul 17, 2023

Update strimzi#46 to reflect the recent change in Kafka

27dfad2

Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>

scholzj closed this Jan 25, 2024

katheris mentioned this pull request Apr 24, 2024

Decouple CA and end-entity certificate management #116

Open


		#### Process for ensuring CA trust

		When an EE certificate is issued via a `CertificateIssuer` it might have been signed by a CA that is not yet trusted.


		2. The issuing operator requests a new certificate. The issuing operator puts the end entity in the `REQUESTED` state.

		3. The issuing operator cannot assume that issuance will happen quickly (i.e. that `pollForRequestedCertificate()` returns a non-empty `Optional` immediately). In fact we explicitly decouple the operators reconciliation loop from the waiting for certificate issuance. If, after some number of retries within the current reconciliation, the certificate has not been issued, the token obtained from `requestCertificate()` is persisted in a `Secret` and the polling is completed in a subsequent reconciliation.

	The `fingerprint` is the fingerprint/thumbprint of the certificate. I.e. the SHA-1 has of the ASN.1 DER-encoded form of the X509 certificate. The `state` can be any of the states of the CA certificate trust state machine described above.
	The `fingerprint` is the fingerprint/thumbprint of the certificate. I.e. the SHA-1 hash of the ASN.1 DER-encoded form of the X509 certificate. The `state` can be any of the states of the CA certificate trust state machine described above.

CA abstraction #46

CA abstraction #46

Conversation

tombentley commented Dec 3, 2021

scholzj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katheris commented Dec 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tombentley commented Dec 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scholzj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scholzj commented Jan 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattia-crypto commented Mar 3, 2022

cconnert commented Jun 9, 2022

scholzj commented Jun 9, 2022

iMajna commented Jun 9, 2022 • edited Loading

tombentley commented Jun 10, 2022

cconnert commented Jun 10, 2022

scholzj commented Jan 25, 2024

iMajna commented Jun 9, 2022 •

edited

Loading