Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UTO] needs to use log levels in a better way #9465

Closed
scholzj opened this issue Dec 15, 2023 · 3 comments · Fixed by #9718
Closed

[UTO] needs to use log levels in a better way #9465

scholzj opened this issue Dec 15, 2023 · 3 comments · Fixed by #9718

Comments

@scholzj
Copy link
Member

scholzj commented Dec 15, 2023

When using Unidirectional Topic Operator (UTO), the way the log levels are used seems to be very strange. At an INFO level, it logs something like this while running for 2 hours with 11 topics:

Preparing trust store certificates for internal communication
Adding /etc/tls-sidecar/cluster-ca-certs/ca.crt to truststore /tmp/topic-operator/replication.truststore.p12 with alias ca
Certificate was added to keystore
Preparing trust store certificates for internal communication is completed
Preparing key store certificates for internal communication
Preparing key store certificates for internal communication is completed
+ exec /usr/bin/tini -w -e 143 -- java -Dlog4j2.configurationFile=file:/opt/topic-operator/custom-config/log4j2.properties -Dvertx.cacheDirBase=/tmp/vertx-cache -Djava.security.egd=file:/dev/./urandom -XX:+ExitOnOutOfMemoryError -XX:MinRAMPercentage=15 -XX:MaxRAMPercentage=20 -classpath :lib/io.strimzi.topic-operator-0.39.0.jar:lib/io.strimzi.api-0.39.0.jar:lib/io.fabric8.kubernetes-model-networking-6.9.2.jar:lib/io.fabric8.kubernetes-model-common-6.9.2.jar:lib/io.fabric8.kubernetes-model-apiextensions-6.9.2.jar:lib/com.fasterxml.jackson.core.jackson-annotations-2.15.3.jar:lib/com.fasterxml.jackson.dataformat.jackson-dataformat-yaml-2.15.3.jar:lib/org.yaml.snakeyaml-2.1.jar:lib/io.fabric8.kubernetes-client-6.9.2.jar:lib/io.fabric8.zjsonpatch-0.3.0.jar:lib/io.fabric8.kubernetes-httpclient-jdk-6.9.2.jar:lib/io.fabric8.kubernetes-client-api-6.9.2.jar:lib/io.fabric8.kubernetes-model-gatewayapi-6.9.2.jar:lib/io.fabric8.kubernetes-model-resource-6.9.2.jar:lib/io.fabric8.kubernetes-model-rbac-6.9.2.jar:lib/io.fabric8.kubernetes-model-admissionregistration-6.9.2.jar:lib/io.fabric8.kubernetes-model-apps-6.9.2.jar:lib/io.fabric8.kubernetes-model-autoscaling-6.9.2.jar:lib/io.fabric8.kubernetes-model-batch-6.9.2.jar:lib/io.fabric8.kubernetes-model-certificates-6.9.2.jar:lib/io.fabric8.kubernetes-model-coordination-6.9.2.jar:lib/io.fabric8.kubernetes-model-discovery-6.9.2.jar:lib/io.fabric8.kubernetes-model-events-6.9.2.jar:lib/io.fabric8.kubernetes-model-extensions-6.9.2.jar:lib/io.fabric8.kubernetes-model-flowcontrol-6.9.2.jar:lib/io.fabric8.kubernetes-model-metrics-6.9.2.jar:lib/io.fabric8.kubernetes-model-policy-6.9.2.jar:lib/io.fabric8.kubernetes-model-scheduling-6.9.2.jar:lib/io.fabric8.kubernetes-model-storageclass-6.9.2.jar:lib/io.fabric8.kubernetes-model-node-6.9.2.jar:lib/org.snakeyaml.snakeyaml-engine-2.7.jar:lib/com.fasterxml.jackson.datatype.jackson-datatype-jsr310-2.15.3.jar:lib/io.fabric8.kubernetes-model-core-6.9.2.jar:lib/org.apache.logging.log4j.log4j-api-2.17.2.jar:lib/org.apache.logging.log4j.log4j-core-2.17.2.jar:lib/org.apache.logging.log4j.log4j-slf4j-impl-2.17.2.jar:lib/org.apache.kafka.kafka-clients-3.6.1.jar:lib/com.github.luben.zstd-jni-1.5.5-1.jar:lib/org.lz4.lz4-java-1.8.0.jar:lib/org.xerial.snappy.snappy-java-1.1.10.5.jar:lib/org.slf4j.slf4j-api-1.7.36.jar:lib/com.101tec.zkclient-0.11.jar:lib/org.apache.zookeeper.zookeeper-3.8.3.jar:lib/org.apache.yetus.audience-annotations-0.12.0.jar:lib/commons-io.commons-io-2.11.0.jar:lib/org.apache.zookeeper.zookeeper-jute-3.8.3.jar:lib/com.fasterxml.jackson.core.jackson-core-2.15.3.jar:lib/com.fasterxml.jackson.core.jackson-databind-2.15.3.jar:lib/io.vertx.vertx-core-4.5.0.jar:lib/io.netty.netty-common-4.1.100.Final.jar:lib/io.netty.netty-buffer-4.1.100.Final.jar:lib/io.netty.netty-transport-4.1.100.Final.jar:lib/io.netty.netty-handler-4.1.100.Final.jar:lib/io.netty.netty-transport-native-unix-common-4.1.100.Final.jar:lib/io.netty.netty-codec-4.1.100.Final.jar:lib/io.netty.netty-handler-proxy-4.1.100.Final.jar:lib/io.netty.netty-codec-socks-4.1.100.Final.jar:lib/io.netty.netty-codec-http-4.1.100.Final.jar:lib/io.netty.netty-codec-http2-4.1.100.Final.jar:lib/io.netty.netty-resolver-4.1.100.Final.jar:lib/io.netty.netty-resolver-dns-4.1.100.Final.jar:lib/io.netty.netty-codec-dns-4.1.100.Final.jar:lib/io.apicurio.apicurio-registry-utils-streams-1.3.2.Final.jar:lib/jakarta.annotation.jakarta.annotation-api-1.3.5.jar:lib/io.apicurio.apicurio-registry-utils-kafka-1.3.2.Final.jar:lib/org.eclipse.microprofile.config.microprofile-config-api-1.4.jar:lib/org.apache.kafka.kafka-streams-3.6.1.jar:lib/org.rocksdb.rocksdbjni-7.9.2.jar:lib/io.fabric8.openshift-client-api-6.9.2.jar:lib/io.fabric8.openshift-model-clusterautoscaling-6.9.2.jar:lib/io.fabric8.openshift-model-operator-6.9.2.jar:lib/io.fabric8.openshift-model-machine-6.9.2.jar:lib/io.fabric8.openshift-model-whereabouts-6.9.2.jar:lib/io.fabric8.openshift-model-monitoring-6.9.2.jar:lib/io.fabric8.openshift-model-storageversionmigrator-6.9.2.jar:lib/io.fabric8.openshift-model-tuned-6.9.2.jar:lib/io.fabric8.openshift-model-console-6.9.2.jar:lib/io.fabric8.openshift-model-config-6.9.2.jar:lib/io.fabric8.openshift-model-machineconfig-6.9.2.jar:lib/io.fabric8.openshift-model-miscellaneous-6.9.2.jar:lib/io.fabric8.openshift-model-hive-6.9.2.jar:lib/io.fabric8.openshift-model-installer-6.9.2.jar:lib/io.fabric8.openshift-model-operatorhub-6.9.2.jar:lib/io.strimzi.operator-common-0.39.0.jar:lib/io.strimzi.crd-annotations-0.39.0.jar:lib/io.strimzi.certificate-manager-0.39.0.jar:lib/io.fabric8.openshift-model-6.9.2.jar:lib/org.quartz-scheduler.quartz-2.3.2.jar:lib/org.eclipse.jetty.jetty-server-9.4.53.v20231009.jar:lib/org.eclipse.jetty.jetty-http-9.4.53.v20231009.jar:lib/org.eclipse.jetty.jetty-util-9.4.53.v20231009.jar:lib/org.eclipse.jetty.jetty-io-9.4.53.v20231009.jar:lib/javax.servlet.javax.servlet-api-3.1.0.jar:lib/io.micrometer.micrometer-core-1.9.5.jar:lib/org.hdrhistogram.HdrHistogram-2.1.12.jar:lib/org.latencyutils.LatencyUtils-2.0.3.jar:lib/io.micrometer.micrometer-registry-prometheus-1.9.5.jar:lib/io.prometheus.simpleclient_common-0.15.0.jar:lib/io.prometheus.simpleclient-0.15.0.jar:lib/io.prometheus.simpleclient_tracer_otel-0.15.0.jar:lib/io.prometheus.simpleclient_tracer_common-0.15.0.jar:lib/io.prometheus.simpleclient_tracer_otel_agent-0.15.0.jar:lib/io.vertx.vertx-micrometer-metrics-4.5.0.jar io.strimzi.operator.topic.v2.TopicOperatorMain
2023-12-14 22:54:53,27309 INFO  [main] TopicOperatorConfig:119 - Configuration TopicOperatorConfig{
	namespace='myproject'
	labelSelector=Labels{strimzi.io/cluster=my-cluster}
	bootstrapServers='my-cluster-kafka-bootstrap:9091'
	clientId='strimzi-topic-operator-237db431-db8d-4dff-a73e-9fd5f182cf5d'
	fullReconciliationIntervalMs=120000
	tlsEnabled=true
	truststoreLocation='/tmp/topic-operator/replication.truststore.p12'
	truststorePassword='********'
	keystoreLocation='/tmp/topic-operator/replication.keystore.p12'
	keystorePassword='********'
	sslEndpointIdentificationAlgorithm='HTTPS'
	saslEnabled=false
	saslMechanism=''
	saslUsername=''
	saslPassword='********'
	securityProtocol='SSL'
	useFinalizer=true
	maxQueueSize=1024
	maxBatchSize=100
	maxBatchLingerMs=100
	enableAdditionalMetrics=false}
2023-12-14 22:54:55,86999 INFO  [main] AppInfoParser:119 - Kafka version: 3.6.1
2023-12-14 22:54:55,87228 INFO  [main] AppInfoParser:120 - Kafka commitId: 5e3c2b738d253ff5
2023-12-14 22:54:55,87291 INFO  [main] AppInfoParser:121 - Kafka startTimeMs: 1702594495868
2023-12-14 22:54:58,47119 INFO  [main] log:170 - Logging initialized @7883ms to org.eclipse.jetty.util.log.Slf4jLog
2023-12-14 22:54:58,66198 INFO  [main] TopicOperatorMain:78 - Starting operator
2023-12-14 22:54:58,66592 INFO  [main] TopicOperatorMain:84 - Installing shutdown hook
2023-12-14 22:54:58,66675 INFO  [main] TopicOperatorMain:86 - Starting health and metrics
2023-12-14 22:54:58,66977 INFO  [main] Server:375 - jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 17.0.9+9-LTS
2023-12-14 22:54:58,75891 INFO  [main] ContextHandler:921 - Started o.e.j.s.h.ContextHandler@5183d589{/metrics,null,AVAILABLE}
2023-12-14 22:54:58,75947 INFO  [main] ContextHandler:921 - Started o.e.j.s.h.ContextHandler@68e62ca4{/healthy,null,AVAILABLE}
2023-12-14 22:54:58,75980 INFO  [main] ContextHandler:921 - Started o.e.j.s.h.ContextHandler@11653e3b{/ready,null,AVAILABLE}
2023-12-14 22:54:58,76738 INFO  [main] AbstractConnector:333 - Started ServerConnector@736ac09a{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2023-12-14 22:54:58,76765 INFO  [main] Server:415 - Started @8180ms
2023-12-14 22:54:58,76781 INFO  [main] TopicOperatorMain:88 - Starting queue
2023-12-14 22:54:58,85691 INFO  [main] TopicOperatorMain:98 - Starting informer
2023-12-14 22:54:59,17652 WARN  [main] VersionUsageUtils:60 - The client is using resource type 'kafkatopics' with unstable version 'v1beta2'
2023-12-14 22:55:00,36302 INFO  [main] TopicOperatorMain:148 - Returning from main()

During this time, the topics moved into unready state and out of it - something for sure worth logging at least at a WARN level. Also showing some signs of live such as mentioning done reconciliations would be useful similarly to other Strimzi operators.


When you switch to DEBUG level, it suddenly starts logging like crazy. First, it starts logging these messages many times per second:

2023-12-15 00:29:23,06391 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56534] Empty batch
2023-12-15 00:29:23,06408 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56534] Removing batch from inflight
2023-12-15 00:29:23,16437 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56535] Empty batch
2023-12-15 00:29:23,16457 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56535] Removing batch from inflight
2023-12-15 00:29:23,26500 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56536] Empty batch
2023-12-15 00:29:23,26523 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56536] Removing batch from inflight
2023-12-15 00:29:23,36567 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56537] Empty batch
2023-12-15 00:29:23,36585 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56537] Removing batch from inflight
2023-12-15 00:29:23,46612 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56538] Empty batch
2023-12-15 00:29:23,46626 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56538] Removing batch from inflight
2023-12-15 00:29:23,56663 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56539] Empty batch
2023-12-15 00:29:23,56684 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56539] Removing batch from inflight
2023-12-15 00:29:23,66717 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56540] Empty batch
2023-12-15 00:29:23,66731 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56540] Removing batch from inflight
2023-12-15 00:29:23,76757 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56541] Empty batch
2023-12-15 00:29:23,76771 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56541] Removing batch from inflight
2023-12-15 00:29:23,86797 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56542] Empty batch
2023-12-15 00:29:23,86811 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56542] Removing batch from inflight
2023-12-15 00:29:23,96851 DEBUG [LoopRunnable-0] LoopRunnable:213 - [Batch #56543] Empty batch
2023-12-15 00:29:23,96871 DEBUG [LoopRunnable-0] LoopRunnable:194 - [Batch #56543] Removing batch from inflight

These surely can be logged at a trace level. It also seems to dump a lot of DEBUG messages that seem to come from Jetty? But there does not seem to be any mention of the actual topic reconciliations. The only time when it starts to log something is when you change a topic. But that seems to be including those terrible YAML dumps of the topic that surely have to be super hard to parse by any logging stacks as well:

2023-12-15 00:34:27,86362 DEBUG [-1459016715-pool-3-thread-83] TopicOperatorEventHandler:45 - Informed of update ---
apiVersion: "kafka.strimzi.io/v1beta2"
kind: "KafkaTopic"
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: "{\"apiVersion\":\"kafka.strimzi.io/v1beta1\"\
      ,\"kind\":\"KafkaTopic\",\"metadata\":{\"annotations\":{},\"labels\":{\"strimzi.io/cluster\"\
      :\"my-cluster\"},\"name\":\"kafka-test-apps\",\"namespace\":\"myproject\"},\"\
      spec\":{\"config\":{\"retention.bytes\":1073741824,\"retention.ms\":86400000,\"\
      segment.bytes\":1073741824},\"partitions\":3,\"replicas\":3}}\n"
  creationTimestamp: "2023-12-14T22:46:03Z"
  finalizers:
  - "strimzi.io/topic-operator"
  generation: 2
  labels:
    strimzi.io/cluster: "my-cluster"
  managedFields:
  - apiVersion: "kafka.strimzi.io/v1beta1"
    fieldsType: "FieldsV1"
    fieldsV1:
      f:metadata:
        f:annotations:
          ".": {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:labels:
          ".": {}
          f:strimzi.io/cluster: {}
      f:spec:
        ".": {}
        f:config:
          ".": {}
          f:retention.bytes: {}
          f:segment.bytes: {}
        f:partitions: {}
        f:replicas: {}
    manager: "kubectl-client-side-apply"
    operation: "Update"
    time: "2023-12-14T22:46:03Z"
  - apiVersion: "kafka.strimzi.io/v1beta2"
    fieldsType: "FieldsV1"
    fieldsV1:
      f:metadata:
        f:finalizers:
          ".": {}
          v:"strimzi.io/topic-operator": {}
    manager: "strimzi-topic-operator"
    operation: "Update"
    time: "2023-12-14T22:55:00Z"
  - apiVersion: "kafka.strimzi.io/v1beta2"
    fieldsType: "FieldsV1"
    fieldsV1:
      f:spec:
        f:config:
          f:retention.ms: {}
    manager: "kubectl-edit"
    operation: "Update"
    time: "2023-12-15T00:34:27Z"
  - apiVersion: "kafka.strimzi.io/v1beta2"
    fieldsType: "FieldsV1"
    fieldsV1:
      f:status:
        ".": {}
        f:conditions: {}
        f:observedGeneration: {}
        f:topicName: {}
    manager: "strimzi-topic-operator"
    operation: "Update"
    subresource: "status"
    time: "2023-12-15T00:34:27Z"
  name: "kafka-test-apps"
  namespace: "myproject"
  resourceVersion: "84817"
  uid: "1eb91b73-945f-4314-8a4b-527c080a562c"
spec:
  partitions: 3
  replicas: 3
  config:
    retention.bytes: 1073741824
    retention.ms: 864000010
    segment.bytes: 1073741824
status:
  conditions:
  - type: "Ready"
    status: "True"
    lastTransitionTime: "2023-12-15T00:34:27.821940208Z"
  observedGeneration: 2
  topicName: "kafka-test-apps"

I think we need to put some effort into:

  • Supressing the messages that are not useful such as some Jetty thread handling by using separate logger for Jetty defaulting to info so that users can easily use DEBUG for the operator
  • Moving some of the logs repeated many times per second to tracing
  • Adding some logs for the reconciliation to INFO level as we do in other operators
@scholzj
Copy link
Member Author

scholzj commented Jan 11, 2024

Triaged on the community call on 11.1.2024: This should be improved.

@scholzj
Copy link
Member Author

scholzj commented Feb 19, 2024

@fvaleri @tombentley Can you please look at this? It would be great to have this fixed in 0.40.

@fvaleri
Copy link
Contributor

fvaleri commented Feb 19, 2024

@scholzj this is next on my backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants