Indexers should be tolerant of a Kafka broker offset rollback #906

bryanlb · 2024-04-30T20:21:52Z

Describe the bug

The indexer currently has logic that prevents running when the Kafka offset is rolled back. While this seemed like a good idea at the time, in the course of operating Kafka administrators can take actions that cause the offset to be reset. When this happens today, it results in an indexer becoming completely stuck, and can require deleting all of the data for that indexer (recovery tasks, and snapshots).

astra/astra/src/main/java/com/slack/astra/server/RecoveryTaskCreator.java

Lines 244 to 251 in 9838d38

    
           if (currentEndOffsetForPartition < highestDurableOffsetForPartition) { 
        
             final String message = 
        
                 String.format( 
        
                     "The current head for the partition %d can't " 
        
                         + "be lower than the highest durable offset for that partition %d", 
        
                     currentEndOffsetForPartition, highestDurableOffsetForPartition); 
        
             LOG.error(message); 
        
             throw new IllegalStateException(message);

We should rework this so rollbacks are supported. Additionally we may consider moving from calculating the offset from previous snapshots + pending recovery tasks to a separate ZK node that is only responsible for storing the offset for the indexer. This would allow easier resetting, and wouldn't require the indexer to load all of the snapshots into memory on boot (allowing quicker boot times).

bryanlb · 2024-09-30T17:26:42Z

Closing this as resolved in #1064. That PR added new manager functionality to enable operators to manually reset the stored offset, which is likely the best approach here. Automatically resetting the offset could have unintended consequences with dual indexing, so keeping this an operator-controlled action is likely the best option currently.

bryanlb added the bug Something isn't working label Apr 30, 2024

bryanlb mentioned this issue Sep 3, 2024

Add manager offset reset functionality #1064

Merged

bryanlb closed this as completed Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexers should be tolerant of a Kafka broker offset rollback #906

Indexers should be tolerant of a Kafka broker offset rollback #906

bryanlb commented Apr 30, 2024

bryanlb commented Sep 30, 2024

Indexers should be tolerant of a Kafka broker offset rollback #906

Indexers should be tolerant of a Kafka broker offset rollback #906

Comments

bryanlb commented Apr 30, 2024

Describe the bug

bryanlb commented Sep 30, 2024