Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add default error cb to Admin, tweak producer and consumer error cb #343

Merged
merged 4 commits into from
May 14, 2024

Conversation

tim-quix
Copy link
Contributor

@tim-quix tim-quix commented May 1, 2024

PR Goal

Stop the following "error" via the confluent_kafka AdminClient from logging (it's normal behavior, but is logged like or sounds like a problematic error).

%3|1714511676.261|FAIL|rdkafka#producer-6| [thrd:sasl_ssl://devkafka-k3.quix.io:9093/bootstrap]: sasl_ssl://devkafka-k3.quix.io:9093/3: SASL authentication error: SaslAuthenticateRequest failed: Local: Broker handle destroyed (after 0ms in state DOWN)

While my solution currently works, right now the AdminClient logger is broken to where defining any logger on it removes all its logged output (known issue), so I cannot prove this solution will continue to work when it's fixed, but it should be correct!

Experimental change/idea

One thing I noticed was KafkaError has a fatal() method which doesn't really explain well what it means, but I thought it might be useful for figuring out better logging later, so I went ahead and included a minor change to log those slightly differently, and maybe we can see some patterns emerge and better hide errors that don't matter. Just an idea.

I also changed the error log to warning with the non-"fatal" errors, though IDK if they still basically come across as a "problem" regardless, but maybe it would help?

Other notes from investigating this

  • log_level producer/consumer config does not seem to alter logging at all.
  • Could not conclude what log.connection.close does...maybe it's more relevant on the consumer side? Or the related logs show up in particular circumstances?
  • Need to include the debug argument to producer/consumer/admin config in order to see confluent_kafka logs (confluent_kafka config values).
    • we could include them by default, they do not add to processing time when in INFO mode
  • the "3/3 broker down" error only seemed to happen at lower connections.max.idle.ms (10s, 20s...never saw it at 3min)

Producer timeouts

I considered including handling the logs that show up WRT a common (and normal) producer timeout error (since it also does not necessarily mean there's a problem):

sasl_ssl://kafka-k5.quix.io:9093/5: Disconnected (after 154139ms in state UP, 1 identical error(s) suppressed) code="-195"

Though it uses a fairly generic _TRANSPORT (-195 AKA "Broker transport failure") error code, which I think might be harder to safely capture (it seems more generic compared to some of the other error types).

It might be safe to capture them all, as another somewhat related one I encountered when tweaking connections.max.idle.ms was:

[2024-05-01 00:01:08,143] [DEBUG] : FAIL [rdkafka#producer-3] [thrd:sasl_ssl://devkafka-k2.quix.io:9093/bootstrap]: sasl_ssl://devkafka-k2.quix.io:9093/2: Connection max idle time exceeded (20096ms since last activity) (after 20096ms in state UP) (_TRANSPORT)

But I think more research should be done on this.

quixstreams/kafka/consumer.py Outdated Show resolved Hide resolved
quixstreams/models/topics/admin.py Outdated Show resolved Hide resolved
@daniil-quix daniil-quix merged commit 8b7a805 into main May 14, 2024
4 checks passed
@daniil-quix daniil-quix deleted the confluent-kafka-logging branch May 14, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants