-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transaction stuck #8236
Comments
Is this a transient issue or does it persist for long period? Helps if you have a simple repro. A quick scan of the code suggests this may be possible if the consumer offsets topic is rebalancing / changing leadership etc in which case we could throw a retryable error, but we need to validate if that is the case. Helps to have lower log levels or a simpler repro. I think we may have an issue open for this.. @rystsov may know? |
@bharathv it looks like the consumer groups elected new leader and it either started processing new request before it had replayed the log or there is something with replaying the log which doesn't update some state, looking... |
Ivan, we're going to dig to the bottom of this and cover this edge case. However even when we fix it there is always a slight chance of running into the fatal errors with the indecisive outcome (unknown server error, invalid txn state or timeout). The application using Kafka client should anticipate it and be ready to recreate a producer when it happens. Upon It's easy to illustrate the need of handling the fatal errors with the timeouts: because network isn't 100% reliable a commit request may time and it may happen on the request path as well as on the response path so the outcome of the operation is unknown. Unknown server error (USE) sounds scary but RP returns it when it runs into a rare non critical unexpected situation. And it isn't specific to Redpanda, Kafka too may return USE for any API request: it intersects all exceptions and when it isn't mapped to any Kafka error they return USE. |
@rystsov It could be this too, no? A temporary blip, seems we translate it to USE. or am I missing something?
|
@bharathv It's a problem (good catch, we should retry
and it definitely causes USE |
@rystsov this is technically not a ci-failure right? I'm removing it, but feel free to add it back if I'm wrong. |
@bharathv : More a tracking thing than anything. Treat it as a test failure -- chaos or CI is kinda moot. |
@rystsov should we mark this issue with |
Fixes #8236 (stuck transactions) by aborting expired transactions
/backport v22.3.x |
[v22.3.x] Fixes #8236 (stuck transactions) by aborting expired transactions
Version & Environment
Redpanda version: (use
rpk version
):v22.3.10 (rev 1f78ad9)
Ubuntu 22.04.1 LTS
franz-go v1.9.0
What went wrong?
Producer can't start, gives error
What should have happened instead?
Producer should have been started.
How to reproduce the issue?
Additional information
Server logs:
The text was updated successfully, but these errors were encountered: