Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexers should be tolerant of a Kafka broker offset rollback #906

Closed
bryanlb opened this issue Apr 30, 2024 · 1 comment
Closed

Indexers should be tolerant of a Kafka broker offset rollback #906

bryanlb opened this issue Apr 30, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@bryanlb
Copy link
Contributor

bryanlb commented Apr 30, 2024

Describe the bug

The indexer currently has logic that prevents running when the Kafka offset is rolled back. While this seemed like a good idea at the time, in the course of operating Kafka administrators can take actions that cause the offset to be reset. When this happens today, it results in an indexer becoming completely stuck, and can require deleting all of the data for that indexer (recovery tasks, and snapshots).

if (currentEndOffsetForPartition < highestDurableOffsetForPartition) {
final String message =
String.format(
"The current head for the partition %d can't "
+ "be lower than the highest durable offset for that partition %d",
currentEndOffsetForPartition, highestDurableOffsetForPartition);
LOG.error(message);
throw new IllegalStateException(message);

We should rework this so rollbacks are supported. Additionally we may consider moving from calculating the offset from previous snapshots + pending recovery tasks to a separate ZK node that is only responsible for storing the offset for the indexer. This would allow easier resetting, and wouldn't require the indexer to load all of the snapshots into memory on boot (allowing quicker boot times).

@bryanlb bryanlb added the bug Something isn't working label Apr 30, 2024
@bryanlb
Copy link
Contributor Author

bryanlb commented Sep 30, 2024

Closing this as resolved in #1064. That PR added new manager functionality to enable operators to manually reset the stored offset, which is likely the best approach here. Automatically resetting the offset could have unintended consequences with dual indexing, so keeping this an operator-controlled action is likely the best option currently.

@bryanlb bryanlb closed this as completed Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant