responsibility range and error propagation#21
Merged
Conversation
- Add ResponsibilityRange struct to track block ranges processed - Add BatchWithRange to pair RecordBatch with its range - Implement bincode serialization for metadata encoding - Add dependencies: bincode, arrow-ipc, async-stream This infrastructure enables communicating which blocks were processed even when they contain zero rows, fixing the zero-batch problem for early Ethereum blocks (0-46146).
- Modify segment_worker to yield BatchWithRange instead of RecordBatch - Update process_transactions/logs_with_segments return types - Manually construct FlightData with app_metadata containing ranges - Skip duplicate schema messages from batches_to_flight_data - Wrap blocks stream output with BatchWithRange Arrow Flight doesn't transmit 0-row RecordBatches, so we encode which block ranges were processed in app_metadata. This allows phaser-query to distinguish 'no data received' from 'range processed but empty'.
- Add method that returns stream of (RecordBatch, Option<(u64, u64)>) - Access raw FlightData messages to preserve app_metadata - Decode responsibility ranges from app_metadata field - Handle schema message separately from data messages This method enables reading responsibility range metadata that communicates which blocks were processed, even for empty ranges.
- Change blocks handler from error to success for 0 batches - Matches existing transactions/logs behavior - Add info log explaining zero batches are valid Arrow Flight doesn't transmit 0-row RecordBatches, so receiving zero batches for a range means it was processed successfully but contained no data. This fixes infinite retry loops for blocks 0-46146 which have no transactions. Tested: Successfully synced blocks 0-46146 with retry_count=0
- Add bincode 1.3 - Add arrow-ipc 56.1 - Add async-stream 0.3
- Remove unused import in blockdata_converter.rs - Change try_into() to into() for infallible conversions - Remove unnecessary mut qualifiers - Use idiomatic for loop instead of while let
- Add StreamError type for bridge stream errors - Add SyncError type for structured sync error handling - Add ErrorCategory and DataType enums for metrics These were referenced in previous commits but not included in the repository.
- Add rustfmt.toml with ignore pattern for generated/ directory - Prevents CI format check failures on protobuf-generated code - Generated files should not be manually formatted
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Empty ranges were not returning well from bridge. Goal here is to move these changes up into reusable lib components to be reused. Assumes block contiguity.