Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract client certificate handling #9545

Merged
merged 11 commits into from
Mar 26, 2024

Conversation

katheris
Copy link
Contributor

@katheris katheris commented Jan 11, 2024

Type of change

Select the type of your PR

  • Refactoring

Description

Addresses: #5630

  • Creates PemTrustSet, PemAuthIdentity, and ClusterOperatorPKSC12AuthIdentity to encapsulate the certificates used during mutual TLS authentication.
  • Updates KafkaReconciler and ZooKeeperReconciler to fetch clientSecrets once. Since the secret is updated as part of the CaReconciler there is no need to fetch it each time we do different parts of the reconcile loop.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

KeyStore trustStore = KeyStore.getInstance(KEYSTORE_TYPE_JKS);
trustStore.load(null);
trustStore.setCertificateEntry("ca", caCert);
return trustStore;
}

private KeyStore getKeyStore(char[] keyPassword) throws KeyStoreException, CertificateException, NoSuchAlgorithmException, InvalidKeySpecException, IOException {
private KeyStore getKeyStore() throws KeyStoreException, CertificateException, NoSuchAlgorithmException, InvalidKeySpecException, IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this code for getting a SecretKey and X509Certificate[] from a PemKeyStoreSupplier is likely to be needed in other places. Likewise going straight to a KeyStore. So if those are not methods on PemKeyStoreSupplier then they should probably belong in Util.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but given the size of the changes already we should do this in a followup PR, since it's done in a few places in subtly different ways

@katheris katheris force-pushed the abstractClientCertificateHandling branch from 97c989f to 18f4d2c Compare January 15, 2024 10:47
@katheris katheris force-pushed the abstractClientCertificateHandling branch 2 times, most recently from 184e57a to 33c5023 Compare January 25, 2024 11:55
});
pemPrivateKey = AbstractAuthIdentity.extractPemPrivateKey(secret, secretKey);
pemCertificateChainBytes = AbstractAuthIdentity.extractPemCertificateChain(secret, secretKey);
pemCertificateChain = validateCertificateChain(secretKey);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document what we're expecting here:

  • Should it contain the root CA certificate (I assume just intermediate CA certificates?),
  • What order are we expecting intermediates to be in?
  • Does this class validate that the chain is, in fact, a chain (i.e. that the issuer of one is the subject of the next)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the step to explicitly validate the certificates as it was proving very tricky to integrate with the existing tests. So it now only creates the Certificate object when needed

@katheris katheris force-pushed the abstractClientCertificateHandling branch 2 times, most recently from 1b809be to 58be831 Compare February 20, 2024 16:06
@katheris katheris force-pushed the abstractClientCertificateHandling branch from 31ef8d3 to 0e132bd Compare February 22, 2024 12:22
@katheris katheris marked this pull request as ready for review February 22, 2024 16:57
@katheris katheris force-pushed the abstractClientCertificateHandling branch from b90063d to 5fad4e0 Compare February 26, 2024 09:29
@katheris
Copy link
Contributor Author

@strimzi/maintainers this is ready for review if any of you have time to take a look

Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @katheris, I found a couple of nits, but I think this is a welcome refactoring.

* Represents the identity used by the cluster operator during TLS client authentication in the PKCS12 format.
* Can be used by clients that are unable to use the PEM format provided by PemAuthIdentity.
*/
public class ClusterOperatorPKCS12AuthIdentity {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit weird that this is not a sibling of PemAuthIdentity. I think they're in the same package, except this one itsn't in the operator-common module, which makes some sense, but I think I'd rather see them together, even though this happens to only see use in the cluster-operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently only used by the ZooKeeper reconcile loop, all other components are using PEM, which is also what I think we want moving forwards. So I chose to put it in cluster-operator to limit it's scope, since the norm should be to use the PEM version. Then hopefully it could be removed when ZooKeeper is no longer required. Happy to move it if others also think it is weird to place it here but that was my thought process behind it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ZooKeeper can use PEM files now. I had it working here: https://github.com/strimzi/strimzi-kafka-operator/compare/main...scholzj:strimzi-kafka-operator:use-pem-files?expand=1#diff-e5e5e71484b286fed9dc8f2d7cf2003356fb2f441d60521cb2da028026bb0a43

Should we first move the ZooKeeper clients to PEM and get rid of this? I can have a look into it @katheris if you want. Or you can do it your self if you want either as a separate PR or as part of this PR.

Copy link
Contributor Author

@katheris katheris Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scholzj that's useful to know. There's a follow up PR I want to do to align the way we create truststores and keystores as we do it a few different ways in different places, shall I include it in that change? Since the migration code was added there are now a few places we construct a ZK client and the code is duplicated currently I believe. I didn't add the change to this PR since the set of changes was already quite big

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to do that first to have this PR cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make this PR cleaner, but I'm hesitant do it first because this PR touches quite a few commonly changed classes so I've had to rebase it multiple times already, so it would be good to get it in sooner than later. Plus the changes in this PR will make the truststore/keystore refactor simpler to do and review, since in this PR I made some small changes to close the gap a little. For example in KafkaAgentClient using the X509Certificate class directly, rather than the Certificate class.

Copy link
Member

@scholzj scholzj Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, lets do it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering why it's not called just PKCS12AuthIdentity. Isn't it the PKCS12 counterpart of the PemAuthIdentity? Isn't it good for better abstraction removing that ClusterOperator prefix from the class name? I see this class hosts SECRET_KEY which is specific to the cluster operator but maybe it should be something in a corresponding derived class? So I mean a PKCS12 auth identity class specific for cluster operator (even if we don't have any more in the end). I see something similar in the PemAuthIdentity which makes that abstraction by getting a secretCertName parameter to know which key to access in the hosted Secret, or?

public static byte[] decodePemPrivateKeyFromSecret(Secret secret, String key) {
String privateKey = new String(decodeFromSecret(secret, key), StandardCharsets.UTF_8)
public static byte[] decodePemPrivateKey(String privateKey) {
String decodedPrivateKey = privateKey
.replace("-----BEGIN PRIVATE KEY-----", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If passed malformed input (containing multiple keys) then this method would remove all the BEGIN and END lines, possibly yielding a single lump of base64 which we then silently decode. It wouldn't ultimately be usable as a private key. But I wonder if we should try to catch this possible error by checking more carefully that the private key does have the structure we're expecting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we could have better validation here. However I think we should include it in a follow up PR where we better align the creation of keystores and truststores across the project, since I don't think this should stay in Util ultimately.

@katheris katheris force-pushed the abstractClientCertificateHandling branch from 5fad4e0 to b0022bf Compare March 4, 2024 14:12
CHANGELOG.md Outdated Show resolved Hide resolved
Comment on lines 192 to 206
return Optional.ofNullable(secret)
.map(Secret::getData)
.map(data -> data.get(field))
.map(value -> Base64.getDecoder().decode(value))
.orElseThrow(() -> {
String name = Optional.ofNullable(secret)
.map(Secret::getMetadata)
.map(ObjectMeta::getName)
.orElse("unknown");
String namespace = Optional.ofNullable(secret)
.map(Secret::getMetadata)
.map(ObjectMeta::getNamespace)
.orElse("unknown");
return Util.missingDataInSecretException(namespace, name, field);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really an efficient and readable way to handle this? I would find using if constructs much easier to read and understand what exactly does this actually do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a bit over the top, given that I'm assuming if the secret is not null then we can assume the name, namespace and data should also not be null, so can probably change to something like:

        if (secret == null) {
            throw new RuntimeException("Cannot fetch data from null secret");
        }
        String data = secret.getData().get(field);
        if (data != null) {
            return Base64.getDecoder().decode(data);
        } else {
            throw Util.missingDataInSecretException(secret.getMetadata().getNamespace(), secret.getMetadata().getName(), field);
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to be simpler and make use of Objects.requireNonNull

Comment on lines 133 to 156
/**
* Returns exception when secret data is missing a particular item. This is used from several different methods to provide identical exception.
*
* @param namespace Namespace of the Secret
* @param secretName Name of the Secret
* @param dataFieldName Name of the Secret field
* @return RuntimeException
*/
public static RuntimeException missingDataInSecretException(String namespace, String secretName, String dataFieldName) {
return new RuntimeException("The Secret " + namespace + "/" + secretName + " is missing the field " + dataFieldName);
}

/**
* Returns exception when certificate is corrupt. This is used from several different methods to provide identical exception.
*
* @param namespace Namespace of the Secret
* @param secretName Name of the Secret
* @param keyName Name of the Secret key
* @return RuntimeException
*/
public static RuntimeException corruptCertificateException(String namespace, String secretName, String keyName) {
return new RuntimeException("Bad/corrupt certificate found in data." + keyName + ".crt of Secret "
+ secretName + " in namespace " + namespace);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods seem strange. They do not seem to contain any real logic and seem to be used from a few places only - the places where they are used also do not seem to use the central facilities for whatever reason. Would it be better to just inline this or create a new exception types and use this in their constructors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote these methods here based on the method missingSecretException that was already in this class. I can create a new exception type if that is the preferred way to handle these kinds of exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved these inline

Comment on lines 383 to 384
PemTrustSet pemTrustSet = new PemTrustSet(clusterCa.caCertSecret());
PemAuthIdentity pemAuthIdentity = PemAuthIdentity.clusterOperator(coSecret);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can name these based on what they are? E.g. clusterCaTrustSet and coAuthIdentity? I think that can be in general applied to all the parameters and fields -> they should not just copy the type name but they should explain the expected value. E.g. in the BrokersInUseCheck above, you have some clear expectation that should be passed in. And I assume it will be the same in many places below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll review the variable names across the board.

Copy link
Contributor Author

@katheris katheris Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the variable names. I've used coAuthIdentity as you suggested and then clusterCaTrustSet if the variable is used for both ZooKeeper and Kafka clients, and kafkaCaTrustSet and zkCaTrustSet for the Kafka and ZooKeeper clients respectively, since it is the trustset for the certificate authority that issued and signed the Kafka/ZooKeeper end entity certificate

Comment on lines 26 to 27
private String secretName;
private String secretNamespace;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it for the exceptions if we fail to parse the data somehow later. It's because we lazily parse the data rather than doing it in the constructor, since I was struggling to get the tests to work if we validate upfront.

@scholzj
Copy link
Member

scholzj commented Mar 5, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@katheris
Copy link
Contributor Author

katheris commented Mar 7, 2024

@scholzj Thanks for the review, I think I've responded to or addressed all of your comments

@scholzj scholzj added this to the 0.41.0 milestone Mar 8, 2024
* Creates ClusterOperatorPKSC12AuthIdentity, PemAuthIdentity and
PemTrustSet to encapsulate the certificates used during
mutual TLS authentication.
* Updates reconcilers to fetch clientSecrets once.

Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
* Remove entry from changelog.md
* Rename variables to be more descriptive

Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
* Return specific Objects rather than CompositeFuture.
* Add ClusterOperatorAuthIdentity record to hold the
two formats of auth identity used by the cluster operator.

Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
@katheris katheris force-pushed the abstractClientCertificateHandling branch from 42794f1 to ebbbb50 Compare March 13, 2024 14:04
Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @katheris . I left a couple of comments.

CHANGELOG.md Outdated Show resolved Hide resolved
* Represents the identity used by the cluster operator during TLS client authentication in the PKCS12 format.
* Can be used by clients that are unable to use the PEM format provided by PemAuthIdentity.
*/
public class ClusterOperatorPKCS12AuthIdentity {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering why it's not called just PKCS12AuthIdentity. Isn't it the PKCS12 counterpart of the PemAuthIdentity? Isn't it good for better abstraction removing that ClusterOperator prefix from the class name? I see this class hosts SECRET_KEY which is specific to the cluster operator but maybe it should be something in a corresponding derived class? So I mean a PKCS12 auth identity class specific for cluster operator (even if we don't have any more in the end). I see something similar in the PemAuthIdentity which makes that abstraction by getting a secretCertName parameter to know which key to access in the hosted Secret, or?

@katheris katheris force-pushed the abstractClientCertificateHandling branch from 0c5046b to 8f38f54 Compare March 18, 2024 14:33
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
@katheris katheris force-pushed the abstractClientCertificateHandling branch from 8f38f54 to 08efe17 Compare March 18, 2024 14:49
@katheris
Copy link
Contributor Author

@ppatierno @scholzj I believe I've addressed all your comments so feel free to review again

@ppatierno
Copy link
Member

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. LGTM! I started the migration pipeline because of the changes to the migration utils, just to be sure everything still works.

* Update Javadoc for new classes
* Rename tlsPemIdentity variables
* Reduce duplication when fetching PEM credentials

Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
@katheris
Copy link
Contributor Author

@scholzj Thanks for the review, I've addressed your comments

@scholzj
Copy link
Member

scholzj commented Mar 25, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member

scholzj commented Mar 25, 2024

/azp run upgrade

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member

scholzj commented Mar 25, 2024

/azp run migration

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj
Copy link
Member

scholzj commented Mar 26, 2024

/azp run kraft-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@scholzj scholzj merged commit f105343 into strimzi:main Mar 26, 2024
35 checks passed
katheris added a commit to katheris/strimzi-kafka-operator that referenced this pull request Mar 26, 2024
The change made in PR strimzi#9545 means the Secret for
certificates does not need to be mocked in each
test case. This removes the unnecessary mock calls.

Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
steffen-karlsson pushed a commit to steffen-karlsson/strimzi-kafka-operator that referenced this pull request Apr 2, 2024
Signed-off-by: Katherine Stanley <11195226+katheris@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants