Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Changing the replication factor for a topic prints an NPE #1847

Closed
ppatierno opened this issue Jul 26, 2019 · 5 comments · Fixed by #1974
Closed

[Bug] Changing the replication factor for a topic prints an NPE #1847

ppatierno opened this issue Jul 26, 2019 · 5 comments · Fixed by #1974

Comments

@ppatierno
Copy link
Member

ppatierno commented Jul 26, 2019

Describe the bug
When a topic is created with a replication factor and then you try to change that value, the topic operator prints the following log.


[2019-07-26 15:32:06,249] INFO  <TopicWatcher:36> [2.30.0.1/...] 64\|kube =my-topic\|27413: event MODIFIED on resource my-topic with labels {strimzi.io/cluster=my-cluster}
--
  | [2019-07-26 15:32:06,277] INFO  <opicOperator:460> [oop-thread-1] 64\|kube =my-topic\|27413: Reconciling topic my-topic, k8sTopic:nonnull, kafkaTopic:nonnull, privateTopic:nonnull
  | [2019-07-26 15:32:06,278] ERROR <opicOperator:612> [oop-thread-1] 64\|kube =my-topic\|27413: Changes replication factor
  | [2019-07-26 15:32:06,280] INFO  <nedKafkaImpl:89> [oop-thread-1] Changing replication factor of topic my-topic to 3
  | [2019-07-26 15:32:06,355] INFO  <rocessHelper:43> [ker-thread-3] Started process java.lang.UNIXProcess@11a2c5d5 with command line [/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/jre/bin/java, -cp, lib/io.strimzi.topic-operator-0.14.0-SNAPSHOT.jar:lib/io.netty.netty-codec-dns-4.1.34.Final.jar:lib/com.squareup.okhttp3.okhttp-3.12.0.jar:lib/com.101tec.zkclient-0.11.jar:lib/com.squareup.okio.okio-1.15.0.jar:lib/io.vertx.vertx-core-3.7.1.jar:lib/com.github.spotbugs.spotbugs-annotations-3.1.12.jar:lib/io.netty.netty-handler-proxy-4.1.34.Final.jar:lib/javax.validation.validation-api-2.0.1.Final.jar:lib/org.apache.zookeeper.zookeeper-3.4.14.jar:lib/com.fasterxml.jackson.core.jackson-core-2.9.9.jar:lib/io.netty.netty-transport-4.1.34.Final.jar:lib/org.xerial.snappy.snappy-java-1.1.7.3.jar:lib/org.apache.kafka.kafka-clients-2.3.0.jar:lib/io.strimzi.crd-annotations-0.14.0-SNAPSHOT.jar:lib/com.fasterxml.jackson.core.jackson-databind-2.9.9.1.jar:lib/org.apache.yetus.audience-annotations-0.5.0.jar:lib/io.strimzi.certificate-manager-0.14.0-SNAPSHOT.jar:lib/io.fabric8.kubernetes-client-4.3.0.jar:lib/io.netty.netty-codec-socks-4.1.34.Final.jar:lib/io.fabric8.kubernetes-model-common-4.3.0.jar:lib/io.netty.netty-resolver-4.1.34.Final.jar:lib/io.strimzi.operator-common-0.14.0-SNAPSHOT.jar:lib/io.netty.netty-codec-4.1.34.Final.jar:lib/org.apache.logging.log4j.log4j-slf4j-impl-2.11.1.jar:lib/com.github.mifmif.generex-1.0.2.jar:lib/com.fasterxml.jackson.dataformat.jackson-dataformat-yaml-2.9.9.jar:lib/org.apache.logging.log4j.log4j-api-2.11.1.jar:lib/com.google.code.findbugs.jsr305-3.0.2.jar:lib/org.slf4j.jul-to-slf4j-1.7.26.jar:lib/org.apache.logging.log4j.log4j-core-2.11.1.jar:lib/dk.brics.automaton.automaton-1.11-8.jar:lib/com.github.luben.zstd-jni-1.4.0-1.jar:lib/io.netty.netty-handler-4.1.34.Final.jar:lib/io.fabric8.openshift-client-4.3.0.jar:lib/io.netty.netty-resolver-dns-4.1.34.Final.jar:lib/io.fabric8.zjsonpatch-0.3.0.jar:lib/org.lz4.lz4-java-1.6.0.jar:lib/io.netty.netty-buffer-4.1.34.Final.jar:lib/io.netty.netty-codec-http-4.1.34.Final.jar:lib/com.squareup.okhttp3.logging-interceptor-3.12.0.jar:lib/io.netty.netty-common-4.1.34.Final.jar:lib/io.fabric8.kubernetes-model-4.3.0.jar:lib/com.fasterxml.jackson.module.jackson-module-jaxb-annotations-2.9.8.jar:lib/org.slf4j.slf4j-api-1.7.25.jar:lib/com.fasterxml.jackson.core.jackson-annotations-2.9.8.jar:lib/io.netty.netty-codec-http2-4.1.34.Final.jar:lib/io.strimzi.api-0.14.0-SNAPSHOT.jar:lib/org.yaml.snakeyaml-1.23.jar:lib/org.glassfish.javax.el-3.0.1-b11.jar, kafka.admin.ReassignPartitionsCommand, --zookeeper, localhost:2181, --topics-to-move-json-file, /tmp/io.strimzi.operator.common.process.ProcessHelper204478978290644623-topics-to-move.json, --broker-list, 0,1,2, --generate]
  | [2019-07-26 15:32:06,620] INFO  <rocessHelper:48> [ker-thread-3] Process java.lang.UNIXProcess@11a2c5d5: exited with status 1
  | [2019-07-26 15:32:06,650] ERROR <opicOperator:613> [oop-thread-1] Changing replication factor is not supported
  | [2019-07-26 15:32:06,690] WARN  <opicOperator:106> [oop-thread-1] java.lang.NullPointerException
  | [2019-07-26 15:32:06,694] INFO  <TopicWatcher:39> [oop-thread-1] 64\|kube =my-topic\|27413: Success processing event MODIFIED on resource my-topic with labels {strimzi.io/cluster=my-cluster}


To Reproduce
Deploy the kafka-topic.yaml from examples folder and then edit it changing replicas from 1 to 3.

Expected behavior
We know that changing replicas is not supported but the above log should not be printed at least, just the warning message about the not supported feature.

Environment (please complete the following information):

  • Strimzi version: master
  • Installation method: YAML files
  • Kubernetes cluster: OpenShift 3.11
  • Infrastructure: minishift
@tombentley
Copy link
Member

If a user changed the spec.replicas should the KafkaTopic be considered ready after the reconciliation? I would argue the KafkaTopic should be unready with the message explaining that the KafkaTopic.spec.replicas should be reverted. Thoughts @scholzj @ppatierno?

@scholzj
Copy link
Member

scholzj commented Sep 10, 2019

I agree ... if something needs to be reverted it should be in the status ... I do not think it necessarily has to be NotReady because of that ... I think we should use different condition maybe. But it should be noted there one way or another.

@ppatierno
Copy link
Member Author

The topic is still usable which is what Ready means to me. If someone tries to send to the topic the operation will success. I would stick with Ready but adding a warning message somehow.

@tombentley
Copy link
Member

I understand that argument, but I wonder how many users would notice an alternative condition. And while it's true that the topic would be usable it would not have the availability semantics the user had requested, so the user might not consider it ready to receive their data which requires high availability. So I'm inclined to stick with NotReady because I think it aligns with user expectations and also keeps thing simple for them because they only have one condition to check for.

@scholzj
Copy link
Member

scholzj commented Sep 11, 2019 via email

tombentley added a commit to tombentley/barnabas that referenced this issue Sep 11, 2019
This changes the Topic Operator's handling of Kube-side changes to
KafkaTopic.spec.replicas by ensuring the reconciliation results in a
NotReady status. If the replication factor is changed on the Kafka side
then the reconciliation updates KafkaTopic.spec.replicas and results in
Ready status.

Note that, because the TO doesn't watch the relevent ZK nodes, the
KafkaTopic.spec.replicas only gets updated during a periodic reconciliation.

An integration test is added for the revised semantics.

Because change in replicas has never been supported I have also removed
unused the code which notionally allowed for the TO to perform partition
reassignment, but in fact was never used.

Fix strimzi#1847, fix strimzi#691

Signed-off-by: Tom Bentley <tbentley@redhat.com>
tombentley added a commit that referenced this issue Sep 11, 2019
This changes the Topic Operator's handling of Kube-side changes to
KafkaTopic.spec.replicas by ensuring the reconciliation results in a
NotReady status. If the replication factor is changed on the Kafka side
then the reconciliation updates KafkaTopic.spec.replicas and results in
Ready status.

Note that, because the TO doesn't watch the relevent ZK nodes, the
KafkaTopic.spec.replicas only gets updated during a periodic reconciliation.

An integration test is added for the revised semantics.

Because change in replicas has never been supported I have also removed
unused the code which notionally allowed for the TO to perform partition
reassignment, but in fact was never used.

Fix #1847, fix #691

Signed-off-by: Tom Bentley <tbentley@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants