Incremental upgrade of the k8ssandra-operator #1035

dnugmanov · 2023-08-07T09:49:48Z

Hello,

Currently, I am facing an issue with the incremental upgrade of the k8ssandra-operator.

My use case is as follows:

The operator is running in cluster-scope and manages all entities across all namespaces (NS).
Namespaces and k8ssandraCluster entities are dynamically created through an external API based on client requests.
When a k8ssandraCluster is created via the external API, the operator reconciles the custom resource definition (CRD) and launches a Cassandra cluster along with its associated components.
I need to upgrade both the k8ssandra-operator and cass-operator from version A to version B. However, since upgrades often trigger a rolling restart of all k8ssandraClusters (mostly due to podTemplateSpec changes), this carries significant risks.
I want to reduce these risks by deploying version B of the operator in only one(or several) namespace. Meanwhile, I need to exclude all elements from namespace where version B of the operator is installed from reconciliation by version A of the cluster-scoped operator.

Available options:

Option 1:

Deploy version B of the operator in namespace scope.
For version A of the operator, set the WATCH_NAMESPACE environment variable with a list of all existing namespaces, excluding the test namespace.
However, a problem arises because namespaces and k8ssandraClusters are created dynamically through the external API, and there may be situations where a k8ssandraCluster is created in the new namespace but neither version A nor version B of the operator reconciles it.

Option 2:

???

Desired implementation of this use case:

Deploy the namespace-scoped version B operator in the test namespace.
For the cluster-scoped version A operator, add an exception to exclude the reconciliation of the test namespace.
All newly created namespaces and k8ssandraClusters should be handled by the version A operator (cluster-scoped).
To implement this approach, it is necessary to add a predicate function that will exclude specified namespaces from reconciliation, possibly through the env variable K8SSANDRA_SKIP_NS.

How do you feel about me submitting a PR for an optional ENV variable, which, when set, will add an additional predicate to implement the functionality described above? Or perhaps you could suggest alternative options to achieve the desired behavior described above without making any additional code changes

This way, we can achieve a safe operator upgrade approach on large production clusters.

dnugmanov added the enhancement New feature or request label Aug 7, 2023

dnugmanov mentioned this issue Aug 23, 2023

Add exlude namespace predicate #1041

Closed

5 tasks

adejanovski added ready-for-review Issues in the state 'ready-for-review' assess Issues in the state 'assess' and removed ready-for-review Issues in the state 'ready-for-review' labels Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental upgrade of the k8ssandra-operator #1035

Incremental upgrade of the k8ssandra-operator #1035

dnugmanov commented Aug 7, 2023

Incremental upgrade of the k8ssandra-operator #1035

Incremental upgrade of the k8ssandra-operator #1035

Comments

dnugmanov commented Aug 7, 2023