Zookeeper does not restart when topologyspreadconstraint and labels are updated #8343

maartengo · 2023-04-05T09:02:52Z

maartengo
Apr 5, 2023

We've configured topologyspreadconstraints in an effort to keep our setup balanced/spread out over multiple nodes. Part of the constraint is a label that contains a version of our deployment which increments on each update.

What we expect to happen

Zookeeper, kafka and cruisecontrol should get the new version label and be scheduled across different nodes.

What happens

Zookeeper goes offline (the pods are killed), and no new pods are recreated. The strimzi operator keeps trying to reconcile the pods, but gives a timeout. This is only resolved by manually restarting the operator deployment so that the operator gets a new 'strimzi revision' which triggers a successful reconciliation round.

The zookeeper strimzipodset has a status that says 1 pod is healthy even when no pods are present.

Configuration

We wrap external charts in a chart of our own. We also cache all images in a registry of our own.

Our operator setup is pretty much the default setup:

strimzi-kafka-operator:
  securityContext:
    runAsUser: 1001
    runAsGroup: 1001
  defaultImageRegistry: {{.Values.privateimageRegisstry}}/quay.io/
  image:
    imagePullSecrets: registry-image-pull-secret

Our zookeeper setup:

apiVersion: 'kafka.strimzi.io/v1beta2'
kind: 'Kafka'
metadata:
  name: 'ismcpsdev-dev2'
  labels:
    chartVersion: '1.7.0-master-44644'
spec:
  zookeeper:
    replicas: 1
    storage:
      type: 'persistent-claim'
      size: '10Gi'
      deleteClaim: false
    template:
      pod:
        securityContext:
          runAsUser: 1001
          runAsGroup: 1001
          fsGroup: 1001
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: 'kubernetes.io/hostname'
            whenUnsatisfiable: 'ScheduleAnyway'
            labelSelector:
              matchLabels:
                strimzi.io/name: 'ismcpsdev-dev2-zookeeper'
                chartVersion: '1.7.0-master-44644'
          - maxSkew: 1v
            topologyKey: 'topology.kubernetes.io/zone'
            whenUnsatisfiable: 'DoNotSchedule'
            labelSelector:
              matchLabels:
                strimzi.io/name: 'ismcpsdev-dev2-zookeeper'
                chartVersion: '1.7.0-master-44644'

The chartVersion can change from e.g. '1.7.0-master-44644' to '1.7.0-master-44700'.

Logs

Operator:

2023-04-04 07:56:12 INFO  ClusterOperator:139 - Triggering periodic reconciliation for namespace kafka
2023-04-04 07:56:12 INFO  AbstractOperator:239 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Kafka ismcpsdev-dev2 will be checked for creation or modification
2023-04-04 07:56:13 INFO  ZooKeeperRoller:142 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Rolling pod ismcpsdev-dev2-zookeeper-0 due to [Pod has old revision]
2023-04-04 07:56:13 INFO  PodOperator:54 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Rolling pod ismcpsdev-dev2-zookeeper-0
2023-04-04 07:57:12 INFO  AbstractOperator:380 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 07:58:12 INFO  ClusterOperator:139 - Triggering periodic reconciliation for namespace kafka
2023-04-04 07:58:12 INFO  AbstractOperator:380 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 07:59:12 INFO  AbstractOperator:380 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:00:12 INFO  ClusterOperator:139 - Triggering periodic reconciliation for namespace kafka
2023-04-04 08:00:12 INFO  AbstractOperator:380 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:01:12 INFO  AbstractOperator:380 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:01:15 ERROR Util:166 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Exceeded timeout of 300000ms while waiting for Pods resource ismcpsdev-dev2-zookeeper-0 in namespace kafka to be ready
2023-04-04 08:01:15 ERROR AbstractOperator:260 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): createOrUpdate failed
io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Pods resource ismcpsdev-dev2-zookeeper-0 in namespace kafka to be ready
	at io.strimzi.operator.common.Util$1.lambda$handle$1(Util.java:167) ~[io.strimzi.operator-common-0.34.0.jar:0.34.0]
	at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[io.vertx.vertx-core-4.3.8.jar:4.3.8]
	at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:69) ~[io.vertx.vertx-core-4.3.8.jar:4.3.8]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[io.netty.netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-04-04 08:01:15 INFO  CrdOperator:133 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Status of Kafka ismcpsdev-dev2 in namespace kafka has been updated
2023-04-04 08:01:15 WARN  AbstractOperator:525 - Reconciliation #1591(timer) Kafka(kafka/ismcpsdev-dev2): Failed to reconcile
io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Pods resource ismcpsdev-dev2-zookeeper-0 in namespace kafka to be ready
	at io.strimzi.operator.common.Util$1.lambda$handle$1(Util.java:167) ~[io.strimzi.operator-common-0.34.0.jar:0.34.0]
	at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:153) ~[io.vertx.vertx-core-4.3.8.jar:4.3.8]
	at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:69) ~[io.vertx.vertx-core-4.3.8.jar:4.3.8]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[io.netty.netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.87.Final.jar:4.1.87.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-04-04 08:02:12 INFO  ClusterOperator:139 - Triggering periodic reconciliation for namespace kafka
2023-04-04 08:02:13 INFO  AbstractOperator:239 - Reconciliation #1594(timer) Kafka(kafka/ismcpsdev-dev2): Kafka ismcpsdev-dev2 will be checked for creation or modification
2023-04-04 08:03:12 INFO  AbstractOperator:380 - Reconciliation #1594(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:04:12 INFO  ClusterOperator:139 - Triggering periodic reconciliation for namespace kafka
2023-04-04 08:04:12 INFO  AbstractOperator:380 - Reconciliation #1594(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:05:12 INFO  AbstractOperator:380 - Reconciliation #1594(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:06:12 INFO  ClusterOperator:139 - Triggering periodic reconciliation for namespace kafka
2023-04-04 08:06:12 INFO  AbstractOperator:380 - Reconciliation #1594(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:07:12 INFO  AbstractOperator:380 - Reconciliation #1594(timer) Kafka(kafka/ismcpsdev-dev2): Reconciliation is in progress
2023-04-04 08:07:13 ERROR Util:166 - Reconciliation #1594(timer) Kafka(kafka/ismcpsdev-dev2): Exceeded timeout of 300000ms while waiting for Pods resource ismcpsdev-dev2-zookeeper-0 in namespace kafka to be ready

Question

Is this expected behavior? What can we do to prevent the faulty state from happening on each update?

Edit:

This happens with both 1 and 3 replicas of zookeeper. When using 3 replicas only 2 pods are killed and the other is stuck trying to find the missing instances.
Zookeeper logs:

2023-04-04 06:44:16,813 WARN Cannot open channel to 3 at election address ismcpsdev-dev-zookeeper-2.ismcpsdev-dev-zookeeper-nodes.kafka.svc/<unresolved>:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [QuorumConnectionThread-[myid=1]-2]
java.net.UnknownHostException: ismcpsdev-dev-zookeeper-2.ismcpsdev-dev-zookeeper-nodes.kafka.svc
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:567)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
--- this repeats a few times until:
2023-04-04 06:44:24,695 INFO leaderConnectDelayDuringRetryMs: 100 (org.apache.zookeeper.server.quorum.Learner) [prometheus-http-1-1]
2023-04-04 06:44:24,695 INFO TCP NoDelay set to: true (org.apache.zookeeper.server.quorum.Learner) [prometheus-http-1-1]
2023-04-04 06:44:24,695 INFO zookeeper.learner.asyncSending = false (org.apache.zookeeper.server.quorum.Learner) [prometheus-http-1-1]
2023-04-04 06:44:24,695 INFO zookeeper.observer.reconnectDelayMs = 0 (org.apache.zookeeper.server.quorum.Observer) [prometheus-http-1-1]
2023-04-04 06:44:24,695 INFO zookeeper.observer.election.DelayMs = 200 (org.apache.zookeeper.server.quorum.Observer) [prometheus-http-1-1]
2023-04-04 06:44:29,068 WARN Cannot open channel to 2 at election address ismcpsdev-dev-zookeeper-1.ismcpsdev-dev-zookeeper-nodes.kafka.svc/<unresolved>:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) 
--- the earlier exception repeats again

scholzj · 2023-04-05T09:23:32Z

scholzj
Apr 5, 2023
Maintainer

TBH, I don't really follow what are you doing and mainly why. Maybe you should describe it step by step including the YAMLs and the corresponding logs.

But still ... why would you change the scheduling every time some chart version changes? The chart is not part of the Kafka deployment. Neither Strimzi nor Kafka care about the Helm chart and its version. So locking it into the deployment makes no sense to me. With things like Kafka, you want stability. That means not rolling them because some completely unrelevant changes.

The log you shared from the operator suggests that the operator tries to roll the ZooKeeper pods (that is expected as you changed the labels and the topology spread). So what you need to do is to look why the ZooKeeper pod is not getting ready. Does it start? Is it scheduled? Or what exactly is causing it to not be ready. That is not clear from what you provided and clearly it is what blocks the operator.

This is only resolved by manually restarting the operator deployment so that the operator gets a new 'strimzi revision' which triggers a successful reconciliation round.

I don't understand what this mean. What is strimzi revision? Again, provide full logs and explain what exactly you mean please.

The zookeeper strimzipodset has a status that says 1 pod is healthy even when no pods are present.

That is hard to comment on without seeing the actual YAMLs and the logs.

Two more things I noticed:

Is maxSkew: 1v a typo? Or is it actually some valid value? Does not seem to be anywhere in the Kube docs.
What sense do your topologySpreadConstraints rules make? You have ScheduleAnyway for hosts and DoNotSchedule for zone => given zone is superior to the host, it looks like this doe snot make much sense and the first rule will not. You have just one replica, so you IMHO don't need to care about this at all. Second, in a proper setup with e.g. 3 replicas, you would aim for 1 replica per zone and that would mean 1 replica per host anyway.

15 replies

scholzj Apr 5, 2023
Maintainer

How do I enable debug logs?

You can edit the Cluster Operator Config map and change the log level to DEBUG there. If the log does not cover reproducing the issue, it is not clear if it is needed. But it would definitely not hurt I guess. I think it is important to capture the rolling / deletion of the pod and it not being recreated.

scholzj Apr 6, 2023
Maintainer

Another user reported a similar problem on Slack. In his case, it seemed that the Pod informer died and that is why the operator didn't restart the pod. Not really clear what the cause was - it could be for example fabric8io/kubernetes-client#4781 but not sure TBH.

maartengo May 23, 2023
Author

It seems that 0.35.0 does not fix the issue for us. We've updated yesterday on one of our dev environments. Today I manually killed one of our three kafka pods (kafka-1), but it was not recreated.

Some log highlights (logs containing 'kafka-1') can be found below. The logs are combined from our 3 operator instances. If that's a problem then I can provide the individual pod logs as well.

2023-05-23 06:49:32 DEBUG Util:135 - Reconciliation #1497(timer) Kafka(kafka/envdev-dev3): Waiting for Pods resource envdev-dev3-kafka-1 in namespace kafka to get ready
2023-05-23 06:54:27 DEBUG StrimziPodSetController:414 - Reconciliation #1501(watch) StrimziPodSet(kafka/envdev-dev3-kafka): Pod envdev-dev3-kafka-1 in namespace kafka already exists => nothing to do right now
2023-05-23 06:54:27 DEBUG StrimziPodSetController:210 - Pod envdev-dev3-kafka-1 in namespace kafka was MODIFIED
2023-05-23 06:54:27 DEBUG StrimziPodSetController:414 - Reconciliation #1502(watch) StrimziPodSet(kafka/envdev-dev3-kafka): Pod envdev-dev3-kafka-1 in namespace kafka already exists => nothing to do right now
2023-05-23 06:54:27 DEBUG StrimziPodSetController:414 - Reconciliation #1503(watch) StrimziPodSet(kafka/envdev-dev3-kafka): Pod envdev-dev3-kafka-1 in namespace kafka already exists => nothing to do right now
2023-05-23 06:54:32 ERROR Util:166 - Reconciliation #1497(timer) Kafka(kafka/envdev-dev3): Exceeded timeout of 300000ms while waiting for Pods resource envdev-dev3-kafka-1 in namespace kafka to be ready
io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Pods resource envdev-dev3-kafka-1 in namespace kafka to be ready
2023-05-23 06:54:32 DEBUG AbstractOperator:568 - Reconciliation #1497(timer) Kafka(kafka/envdev-dev3): Updated metric strimzi.resource.state[tag(kind=Kafka),tag(name=envdev-dev3),tag(reason=Exceeded timeout of 300000ms while waiting for Pods resource envdev-dev3-kafka-1 in namespace kafka to be ready),tag(resource-namespace=kafka)] = 0
io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Pods resource envdev-dev3-kafka-1 in namespace kafka to be ready
2023-05-23 06:55:31 DEBUG AbstractNamespacedResourceOperator:110 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): PersistentVolumeClaim kafka/data-0-envdev-dev3-kafka-1 already exists, updating it
2023-05-23 06:55:31 DEBUG ResourceDiff:57 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Ignoring PersistentVolumeClaim data-0-envdev-dev3-kafka-1 diff {"op":"remove","path":"/metadata/annotations/pv.kubernetes.io~1bind-completed"}
2023-05-23 06:55:31 DEBUG ResourceDiff:57 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Ignoring PersistentVolumeClaim data-0-envdev-dev3-kafka-1 diff {"op":"remove","path":"/metadata/creationTimestamp"}
2023-05-23 06:55:31 DEBUG ResourceDiff:57 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Ignoring PersistentVolumeClaim data-0-envdev-dev3-kafka-1 diff {"op":"remove","path":"/metadata/finalizers"}
2023-05-23 06:55:31 DEBUG ResourceDiff:57 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Ignoring PersistentVolumeClaim data-0-envdev-dev3-kafka-1 diff {"op":"remove","path":"/metadata/managedFields"}
2023-05-23 06:55:31 DEBUG ResourceDiff:57 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Ignoring PersistentVolumeClaim data-0-envdev-dev3-kafka-1 diff {"op":"remove","path":"/metadata/resourceVersion"}
2023-05-23 06:55:31 DEBUG ResourceDiff:57 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Ignoring PersistentVolumeClaim data-0-envdev-dev3-kafka-1 diff {"op":"remove","path":"/metadata/uid"}
2023-05-23 06:55:31 DEBUG ResourceDiff:57 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Ignoring PersistentVolumeClaim data-0-envdev-dev3-kafka-1 diff {"op":"remove","path":"/status"}
2023-05-23 06:55:31 DEBUG AbstractNamespacedResourceOperator:248 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): PersistentVolumeClaim data-0-envdev-dev3-kafka-1 in namespace kafka did not changed and doesn't need patching
2023-05-23 06:55:32 DEBUG Ca:320 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Certificate for node envdev-dev3-kafka-1/1 already exists
2023-05-23 06:55:32 DEBUG AbstractNamespacedResourceOperator:110 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): ConfigMap kafka/envdev-dev3-kafka-1 already exists, updating it
2023-05-23 06:55:32 DEBUG ConfigMapOperator:52 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): ConfigMap envdev-dev3-kafka-1 in namespace kafka has not been patched because resources are equal
2023-05-23 06:55:32 DEBUG KafkaRoller:226 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Initial order for updating pods (rolling restart or dynamic update) is [envdev-dev3-kafka-1/1, envdev-dev3-kafka-0/0, envdev-dev3-kafka-2/2]
2023-05-23 06:55:32 DEBUG KafkaRoller:305 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Considering updating pod envdev-dev3-kafka-1/1 after a delay of 0 MILLISECONDS
2023-05-23 06:55:32 DEBUG KafkaRoller:357 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Pod envdev-dev3-kafka-1 doesn't exist. There seems to be some problem with the creation of pod by StrimziPodSets controller
2023-05-23 06:55:32 DEBUG KafkaRoller:764 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Creating AdminClient for envdev-dev3-kafka-0.envdev-dev3-kafka-brokers.kafka.svc.cluster.local:9091,envdev-dev3-kafka-1.envdev-dev3-kafka-brokers.kafka.svc.cluster.local:9091,envdev-dev3-kafka-2.envdev-dev3-kafka-brokers.kafka.svc.cluster.local:9091
2023-05-23 06:55:32 WARN  ClientUtils:75 - Couldn't resolve server envdev-dev3-kafka-1.envdev-dev3-kafka-brokers.kafka.svc.cluster.local:9091 from bootstrap.servers as DNS resolution failed for envdev-dev3-kafka-1.envdev-dev3-kafka-brokers.kafka.svc.cluster.local
2023-05-23 06:55:32 DEBUG ReconcilerUtils:84 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Checking readiness of pod envdev-dev3-kafka-1 in namespace kafka
2023-05-23 06:55:32 DEBUG Util:135 - Reconciliation #1505(timer) Kafka(kafka/envdev-dev3): Waiting for Pods resource envdev-dev3-kafka-1 in namespace kafka to get ready

I've sent the complete log file through Slack since it is too big for github.

maartengo May 23, 2023
Author

The operator logs:
strimzi-cluster-operator-6f7c887cb-dl7bn-1.txt
strimzi-cluster-operator-6f7c887cb-fdqzh-2.txt
strimzi-cluster-operator-6f7c887cb-qzq56-3.txt

maartengo May 23, 2023
Author

Logs from an operator that was recently (within 2 hours) restarted. A pod was deleted and was recreated at around 8:45. This was on another environment, also running on 0.35 and AKS 1.26.3.
strimzi-cluster-operator-dev1-problem-at-08-45.txt

rouke-broersma · 2023-05-04T11:57:09Z

rouke-broersma
May 4, 2023

@scholzj we are actually seeing this behavior significantly more since we stopped adding the labels.chartVersion to the kafka resource, for some reason. Have you been able to gather anything from the logs my coworker sent you on slack?

3 replies

scholzj May 23, 2023
Maintainer

Maybe you can share your environment at #8528 -> maybe that way we figure out some common ground between the environments having this issue and find out what is causing it. You can possibly also enable the debug logging for the io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener class as suggested in the related Kubernetes client discussion.

rouke-broersma May 23, 2023

@maartengo

@scholzj how do we enable debug logging for io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener?

scholzj May 23, 2023
Maintainer

It seems to be already enabled in the logs attached above.

maartengo · 2023-06-22T08:28:56Z

maartengo
Jun 22, 2023
Author

Issue #8528 with the 0.35.1 update has fixed the issue. Thanks for the help @scholzj

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strimzi

Zookeeper does not restart when topologyspreadconstraint and labels are updated #8343

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 18 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Strimzi

Zookeeper does not restart when topologyspreadconstraint and labels are updated #8343

maartengo Apr 5, 2023

What we expect to happen

What happens

Configuration

Logs

Question

Replies: 3 comments · 18 replies

scholzj Apr 5, 2023 Maintainer

scholzj Apr 5, 2023 Maintainer

scholzj Apr 6, 2023 Maintainer

maartengo May 23, 2023 Author

maartengo May 23, 2023 Author

maartengo May 23, 2023 Author

rouke-broersma May 4, 2023

scholzj May 23, 2023 Maintainer

rouke-broersma May 23, 2023

scholzj May 23, 2023 Maintainer

maartengo Jun 22, 2023 Author

maartengo
Apr 5, 2023

Replies: 3 comments 18 replies

scholzj
Apr 5, 2023
Maintainer

scholzj Apr 5, 2023
Maintainer

scholzj Apr 6, 2023
Maintainer

maartengo May 23, 2023
Author

maartengo May 23, 2023
Author

maartengo May 23, 2023
Author

rouke-broersma
May 4, 2023

scholzj May 23, 2023
Maintainer

scholzj May 23, 2023
Maintainer

maartengo
Jun 22, 2023
Author