-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to recover from broker unavailable using kafka_franz #2705
Comments
Hey @filippog 👋 Thank you for the detailed report! If you still have that instance running and you can poke at it, does it continue to print If it keeps printing those errors, then I'm wondering if it may be some sort of DNS issue on the host where Benthos is running and, if that's the case, it would be handy to know if you can telnet to the seed broker addresses from that host. I don't know how the |
Hi @mihaitodor ! thank you for following up, unfortunately the instance is no longer running as it was restarted to restore service. It was indeed stuck and not emitting logs, these are the last lines (value interpolation error is known and something emitted regularly and we're addressing it separatedly, those regular logs stopped too). You can see logs stop at 15:16 and then back at 18:33 when SIGQUIT was issued
And the service was then restarted and came back up here:
From the host perspective and given the network maintenance the outage manifested itself with temporary unavailability of some broker hosts (i.e. DNS kept working as usual). For comparison and in case it is useful, here's what another benthos instance on the same host logged, this instance kept working as usual, and it is talking to a different kafka cluster, of which one broker was equally affected by the maintenance:
I'm hoping this is a one-off and IIRC the first time we've seen this, however I'm reporting it in case other folks run into the same and/or you have seen this before. Thank you ! |
Hello,
we're using
benthredpanda connect 4.15 with great success, for the biggest part to read/write to/from kafka usingkafka_franz
component.Yesterday there's been a network maintenance which temporarily made a broker unavailable, which made
kafka_franz
lament a failure to commit. This has happened before and things self-recovered, not this time though where consuming was stopped (and observed consumer lag didn't increase, or we would have noticed sooner).Please find below the logs from benthos
The full goroutine dump which we got via SIGQUIT: https://phabricator.wikimedia.org/P66726
And the configuration which was running at the time: https://phabricator.wikimedia.org/P66727
Finally, the metrics showing outage + recovery (all times UTC everywhere)
Interestingly enough, a similar but different benthos instance, acting on a different kafka cluster / broker, which also was affected by the maintenance, was able to recover on its own as it's been our experience with similar maintenance in the past.
The text was updated successfully, but these errors were encountered: